Stability of scores for the Slosson Full-Range Intelligence Test.
Williams, Thomas O; Eaves, Ronald C; Woods-Groves, Suzanne; Mariano, Gina
2007-08-01
The test-retest stability of the Slosson Full-Range Intelligence Test by Algozzine, Eaves, Mann, and Vance was investigated with test scores from a sample of 103 students. With a mean interval of 13.7 mo. and different examiners for each of the two test administrations, the test-retest reliability coefficients for the Full-Range IQ, Verbal Reasoning, Abstract Reasoning, Quantitative Reasoning, and Memory were .93, .85, .80, .80, and .83, respectively. Mean differences from the test-retest scores were not statistically significantly different for any of the scales. Results suggest that Slosson scores are stable over time even when different examiners administer the test.
The Probability of Obtaining Two Statistically Different Test Scores as a Test Index
ERIC Educational Resources Information Center
Muller, Jorg M.
2006-01-01
A new test index is defined as the probability of obtaining two randomly selected test scores (PDTS) as statistically different. After giving a concept definition of the test index, two simulation studies are presented. The first analyzes the influence of the distribution of test scores, test reliability, and sample size on PDTS within classical…
Ocular dominance stability and reading skill: a controversial relationship.
Zeri, Fabrizio; De Luca, Maria; Spinelli, Donatella; Zoccolotti, Pierluigi
2011-11-01
Evidence is mixed concerning the relationship between stability of ocular dominance and reading deficits. Contrasting results may be due to the use of different tests of dominance, different samples of readers, and different scoring methods. The aim of this study was to investigate the relationship among ocular dominance, general visual abilities, and reading performance, and to evaluate the consistency and reliability of different tests of ocular dominance and the effects of different types of eye dominance scoring. In a group of young adults, we measured: (a) main optometric parameters; (b) reading time and accuracy; and (c) ocular dominance in two sighting and four motor tests. Dominance was determined using different scoring methods (relative, absolute, and binary scores). All dominance tests showed good levels of internal reliability. Sighting tests were consistent regardless of the scoring method, and all participants had stable dominance. Three of four motor tests were moderately consistent when dominance was measured with relative scores but not when it was measured with absolute or binary scores. No relationship was found between stability of dominance and reading performance, regardless of the type of test or scoring method. No systematic pattern of correlation was found between binocular vision variables and dominance measures. Choosing the type of motor test to measure ocular dominance is crucial, because the level of consistency among tests is low to moderate. Furthermore, motor tests were not correlated with reading performances. Present results suggest caution when trying to link reading difficulties with specific profiles of ocular dominance.
ERIC Educational Resources Information Center
Zou, Xiao-Ling; Chen, Yan-Min
2016-01-01
The effects of computer and paper test media on EFL test-takers with different computer familiarity in writing scores and in the cognitive writing process have been comprehensively explored from the learners' aspect as well as on the basis of related theories and practice. The results indicate significant differences in test scores among the…
Validating Test Score Meaning and Defending Test Score Use: Different Aims, Different Methods
ERIC Educational Resources Information Center
Cizek, Gregory J.
2016-01-01
Advances in validity theory and alacrity in validation practice have suffered because the term "validity" has been used to refer to two incompatible concerns: (1) the degree of support for specified interpretations of test scores (i.e. intended score meaning) and (2) the degree of support for specified applications (i.e. intended test…
Jones, Nathaniel S; Walter, Kevin D; Caplinger, Roger; Wright, Daniel; Raasch, William G; Young, Craig
2014-07-01
The purpose of the present study was to investigate the possible effects of sociocultural influences, specifically pertaining to language and education, on baseline neuropsychological concussion testing as obtained via immediate postconcussion assessment and cognitive testing (ImPACT) of players from a professional baseball team. A retrospective chart review. Baseline testing of a professional baseball organization. Four hundred five professional baseball players. Age, languages spoken, hometown country location (United States/Canada vs overseas), and years of education. The 5 ImPACT composite scores (verbal memory, visual memory, visual motor speed, reaction time, impulse control) and ImPACT total symptom score from the initial baseline testing. The result of t tests revealed significant differences (P < 0.05) when comparing native English to native Spanish speakers in many scores. Even when corrected for education, the significant differences (P < 0.05) remained in some scores. Sociocultural differences may result in differences in computer-based neuropsychological testing scores.
The gender difference on the Mental Rotations test is not due to performance factors.
Masters, M S
1998-05-01
Men score higher than women on the Mental Rotations test (MRT), and the magnitude of this gender difference is the largest of that on any spatial test. Goldstein, Haldane, and Mitchell (1990) reported finding that the gender difference on the MRT disappears when "performance factors" are controlled--specifically, when subjects are allowed sufficient time to attempt all items on the test or when a scoring procedure that controls for the number of items attempted is used. The present experiment also explored whether eliminating these performance factors results in a disappearance of the gender difference on the test. Male and female college students were allowed a short time period or unlimited time on the MRT. The tests were scored according to three different procedures. The results showed no evidence that the gender difference on the MRT was affected by the scoring method or the time limit. Regardless of the scoring procedure, men scored higher than women, and the magnitude of the gender difference persisted undiminished when subjects completed all items on the test. Thus there was no evidence that performance factors produced the gender difference on the MRT. These results are consistent with the results of other investigators who have attempted to replicate Goldstein et al.'s findings.
Sex Differences in Objective and Projective Dependency Tests: A Meta-Analytic Review.
ERIC Educational Resources Information Center
Bornstein, Robert F.
1995-01-01
A meta-analysis of 97 studies published since 1950 that assessed sex differences in scores on objective and projective dependency tests indicated that women consistently obtained higher dependency scores on objective tests, and men obtained higher scores on projective tests. Findings are discussed in terms of sex role socialization. (SLD)
The Role of Test Scores in Explaining Race and Gender Differences in Wages
ERIC Educational Resources Information Center
Blackburn, McKinley L.
2004-01-01
Previous research has suggested that skills reflected in test-score performance on tests such as the Armed Forces Qualification Test (AFQT) can account for some of the racial differences in average wages. I use a more complete set of test scores available with the National Longitudinal Survey of Youth 1979 Cohort to reconsider this evidence, and…
Interpreting Linked Psychomotor Performance Scores
ERIC Educational Resources Information Center
Looney, Marilyn A.
2013-01-01
Given that equating/linking applications are now appearing in kinesiology literature, this article provides an overview of the different types of linked test scores: equated, concordant, and predicted. It also addresses the different types of evidence required to determine whether the scores from two different field tests (measuring the same…
Caruso, J C
2001-06-01
The unreliability of difference scores is a well documented phenomenon in the social sciences and has led researchers and practitioners to interpret differences cautiously, if at all. In the case of the Kaufman Adult and Adolescent Intelligence Test (KAIT), the unreliability of the difference between the Fluid IQ and the Crystallized IQ is due to the high correlation between the two scales. The consequences of the lack of precision with which differences are identified are wide confidence intervals and unpowerful significance tests (i.e., large differences are required to be declared statistically significant). Reliable component analysis (RCA) was performed on the subtests of the KAIT in order to address these problems. RCA is a new data reduction technique that results in uncorrelated component scores with maximum proportions of reliable variance. Results indicate that the scores defined by RCA have discriminant and convergent validity (with respect to the equally weighted scores) and that differences between the scores, derived from a single testing session, were more reliable than differences derived from equal weighting for each age group (11-14 years, 15-34 years, 35-85+ years). This reliability advantage results in narrower confidence intervals around difference scores and smaller differences required for statistical significance.
Explaining the black-white gap in cognitive test scores: Toward a theory of adverse impact.
Cottrell, Jonathan M; Newman, Daniel A; Roisman, Glenn I
2015-11-01
In understanding the causes of adverse impact, a key parameter is the Black-White difference in cognitive test scores. To advance theory on why Black-White cognitive ability/knowledge test score gaps exist, and on how these gaps develop over time, the current article proposes an inductive explanatory model derived from past empirical findings. According to this theoretical model, Black-White group mean differences in cognitive test scores arise from the following racially disparate conditions: family income, maternal education, maternal verbal ability/knowledge, learning materials in the home, parenting factors (maternal sensitivity, maternal warmth and acceptance, and safe physical environment), child birth order, and child birth weight. Results from a 5-wave longitudinal growth model estimated on children in the NICHD Study of Early Child Care and Youth Development from ages 4 through 15 years show significant Black-White cognitive test score gaps throughout early development that did not grow significantly over time (i.e., significant intercept differences, but not slope differences). Importantly, the racially disparate conditions listed above can account for the relation between race and cognitive test scores. We propose a parsimonious 3-Step Model that explains how cognitive test score gaps arise, in which race relates to maternal disadvantage, which in turn relates to parenting factors, which in turn relate to cognitive test scores. This model and results offer to fill a need for theory on the etiology of the Black-White ethnic group gap in cognitive test scores, and attempt to address a missing link in the theory of adverse impact. (c) 2015 APA, all rights reserved).
Linking the Smarter Balanced Assessments to NWEA MAP Assessments
ERIC Educational Resources Information Center
Northwest Evaluation Association, 2015
2015-01-01
Concordance tables have been used for decades to relate scores on different tests measuring similar but distinct constructs. These tables, typically derived from statistical linking procedures, provide a direct link between scores on different tests and serve various purposes. Aside from describing how a score on one test relates to performance on…
Weiler, Richard; van Mechelen, Willem; Fuller, Colin; Ahmed, Osman Hassan; Verhagen, Evert
2018-01-01
To determine if baseline Sport Concussion Assessment Tool, third Edition (SCAT3) scores differ between athletes with and without disability. Cross-sectional comparison of preseason baseline SCAT3 scores for a range of England international footballers. Team doctors and physiotherapists supporting England football teams recorded players' SCAT 3 baseline tests from August 1, 2013 to July 31, 2014. A convenience sample of 249 England footballers, of whom 185 were players without disability (male: 119; female: 66) and 64 were players with disability (male learning disability: 17; male cerebral palsy: 28; male blind: 10; female deaf: 9). Between-group comparisons of median SCAT3 total and section scores were made using nonparametric Mann-Whitney-Wilcoxon ranked-sum test. All footballers with disability scored higher symptom severity scores compared with male players without disability. Male footballers with learning disability demonstrated no significant difference in the total number of symptoms, but recorded significantly lower scores on immediate memory and delayed recall compared with male players without disability. Male blind footballers' scored significantly higher for total concentration and delayed recall, and male footballers with cerebral palsy scored significantly higher on balance testing and immediate memory, when compared with male players without disability. Female footballers with deafness scored significantly higher for total concentration and balance testing than female footballers without disability. This study suggests that significant differences exist between SCAT3 baseline section scores for footballers with and without disability. Concussion consensus guidelines should recognize these differences and produce guidelines that are specific for the growing number of athletes living with disability.
Sex Differences in Cognitive Abilities Test Scores: A UK National Picture
ERIC Educational Resources Information Center
Strand, Steve; Deary, Ian J.; Smith, Pauline
2006-01-01
Background and aims: There is uncertainty about the extent or even existence of sex differences in the mean and variability of reasoning test scores ( Jensen, 1998; Lynn, 1994, ; Mackintosh, 1996). This paper analyses the Cognitive Abilities Test (CAT) scores of a large and representative sample of UK pupils to determine the extent of any sex…
The Effects of Process Oriented Guided Inquiry Learning on Secondary Student ACT Science Scores
NASA Astrophysics Data System (ADS)
Judd, William Lindsey
The purpose of this study was to examine any significant difference on secondary school chemistry students' ACT Science Test scores between students taught by the Process Oriented Guided Inquiry Learning (POGIL) method versus students taught by traditional, teacher-centered pedagogy. This study also examined any difference between students taught by the POGIL method versus students taught by traditional, teacher-centered pedagogy in regard to the three different types of questions on the ACT Science Test: data representation, research summaries, and conflicting viewpoints. The sample consisted of sophomore-level students at two private, suburban Christian schools. A pretest-posttest design was used to compare the mean difference in scores from ACT issued sample test booklets before and after each group had received instruction via the POGIL method or more traditional methods. This study found that there was no significant difference in the mean difference of test scores between the two groups. This study also found that there was not a significant difference in the mean difference of scores in regard to the three different types of questions on the ACT Science Test. Further implications of this study are discussed.
A Practical Method for Identifying Significant Change Scores
ERIC Educational Resources Information Center
Cascio, Wayne F.; Kurtines, William M.
1977-01-01
A test of significance for identifying individuals who are most influenced by an experimental treatment as measured by pre-post test change score is presented. The technique requires true difference scores, the reliability of obtained differences, and their standard error of measurement. (Author/JKS)
ERIC Educational Resources Information Center
Cascallar, Alicia S.; Dorans, Neil J.
2005-01-01
This study compares two methods commonly used (concordance and prediction) to establish linkages between scores from tests of similar content given in different languages. Score linkages between the Verbal and Math sections of the SAT I and the corresponding sections of the Spanish-language admissions test, the Prueba de Aptitud Academica (PAA),…
Larner, A J
2016-01-01
Calculation of correlation coefficients is often undertaken as a way of comparing different cognitive screening instruments (CSIs). However, test scores may correlate but not agree, and high correlation may mask lack of agreement between scores. The aim of this study was to use the methodology of Bland and Altman to calculate limits of agreement between the scores of selected CSIs and contrast the findings with Pearson's product moment correlation coefficients between the test scores of the same instruments. Datasets from three pragmatic diagnostic accuracy studies which examined the Mini-Mental State Examination (MMSE) vs. the Montreal Cognitive Assessment (MoCA), the MMSE vs. the Mini-Addenbrooke's Cognitive Examination (M-ACE), and the M-ACE vs. the MoCA were analysed to calculate correlation coefficients and limits of agreement between test scores. Although test scores were highly correlated (all >0.8), calculated limits of agreement were broad (all >10 points), and in one case, MMSE vs. M-ACE, was >15 points. Correlation is not agreement. Highly correlated test scores may conceal broad limits of agreement, consistent with the different emphases of different tests with respect to the cognitive domains examined. Routine incorporation of limits of agreement into diagnostic accuracy studies which compare different tests merits consideration, to enable clinicians to judge whether or not their agreement is close. © 2016 S. Karger AG, Basel.
Impact of Accumulated Error on Item Response Theory Pre-Equating with Mixed Format Tests
ERIC Educational Resources Information Center
Keller, Lisa A.; Keller, Robert; Cook, Robert J.; Colvin, Kimberly F.
2016-01-01
The equating of tests is an essential process in high-stakes, large-scale testing conducted over multiple forms or administrations. By adjusting for differences in difficulty and placing scores from different administrations of a test on a common scale, equating allows scores from these different forms and administrations to be directly compared…
Sex Differences in the Tendency to Omit Items on Multiple-Choice Tests: 1980-2000
ERIC Educational Resources Information Center
von Schrader, Sarah; Ansley, Timothy
2006-01-01
Much has been written concerning the potential group differences in responding to multiple-choice achievement test items. This discussion has included references to possible disparities in tendency to omit such test items. When test scores are used for high-stakes decision making, even small differences in scores and rankings that arise from male…
Rosselli, M; Ardila, A; Bateman, J R; Guzmán, M
2001-01-01
Limited information is currently available about performance of Spanish-speaking children on different neuropsychological tests. This study was designed to (a) analyze the effects of age and sex on different neuropsychological test scores of a randomly selected sample of Spanish-speaking children, (b) analyze the value of neuropsychological test scores for predicting school performance, and (c) describe the neuropsychological profile of Spanish-speaking children with learning disabilities (LD). Two hundred ninety (141 boys, 149 girls) 6- to 11-year-old children were selected from a school in Bogotá, Colombia. Three age groups were distinguished: 6- to 7-, 8- to 9-, and 10- to 11-year-olds. Performance was measured utilizing the following neuropsychological tests: Seashore Rhythm Test, Finger Tapping Test (FTT), Grooved Pegboard Test, Children's Category Test (CCT), California Verbal Learning Test-Children's Version (CVLT-C), Benton Visual Retention Test (BVRT), and Bateria Woodcock Psicoeducativa en Español (Woodcock, 1982). Normative scores were calculated. Age effect was significant for most of the test scores. A significant sex effect was observed for 3 test scores. Intercorrelations were performed between neuropsychological test scores and academic areas (science, mathematics, Spanish, social studies, and music). In a post hoc analysis, children presenting very low scores on the reading, writing, and arithmetic achievement scales of the Woodcock battery were identified in the sample, and their neuropsychological test scores were compared with a matched normal group. Finally, a comparison was made between Colombian and American norms.
EDUCATION AND PSYCHOLOGICAL TEST SCORES
Pershad, Dwarka; Verma, S. K.
1980-01-01
Education, a long neglected variable affecting psychological test score, is in search of reemphasis. Some evidence for this has accumulated on the psychological tests constructed and standardized here at the department of Psychiatry, P.G.I., Chandigarh. Tentative norms prepared education wise on WAIS-Verbal section, PGI-Memory Scale, Proverb and Similarity Tests, Psychoticism Questionnaire, and PGI MQN 2, for adults, in the age range of 16-50, are reported. The results showed marked difference in the mean scores of different educational categories and thus stressed the need for reporting norms separately for different educational levels. PMID:22064617
Ha, Seunghee; Jung, Seungeun; Koh, Kyung S
2018-06-01
The purpose of this study was to determine whether test-retest nasalance score variability differs between Korean children with and without cleft palate (CP) and vowel context influences variability in nasalance score. Thirty-four 3-to-5-year-old children with and without CP participated in the study. Three 8-syllable speech stimuli devoid of nasal consonants were used for data collection. Each stimulus was loaded with high, low, or mixed vowels, respectively. All participants were asked to repeat the speech stimuli twice after the examiner, and an immediate test-retest nasalance score was assessed with no headgear change. Children with CP exhibited significantly greater absolute difference in nasalance scores than children without CP. Variability in nasalance scores was significantly different for the vowel context, and the high vowel sentence showed a significantly larger difference in nasalance scores than the low vowel sentence. The cumulative frequencies indicated that, for children with CP in the high vowel sentence, only 8 of 17 (47%) repeated nasalance scores were within 5 points. Test-retest nasalance score variability was greater for children with CP than children without CP, and there was greater variability for the high vowel sentence(s) for both groups. Copyright © 2018 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Dempster, Frank N.; Cooney, John B.
1982-01-01
Individual differences in digit span, susceptibility to proactive interference, and various aptitude/achievement test scores were investigated in two experiments with college students. Results indicated that digit span was strongly correlated with aptitude/achievement scores, but did not indicate that susceptibility to proactive interference…
Kernel Equating Under the Non-Equivalent Groups With Covariates Design
Bränberg, Kenny
2015-01-01
When equating two tests, the traditional approach is to use common test takers and/or common items. Here, the idea is to use variables correlated with the test scores (e.g., school grades and other test scores) as a substitute for common items in a non-equivalent groups with covariates (NEC) design. This is performed in the framework of kernel equating and with an extension of the method developed for post-stratification equating in the non-equivalent groups with anchor test design. Real data from a college admissions test were used to illustrate the use of the design. The equated scores from the NEC design were compared with equated scores from the equivalent group (EG) design, that is, equating with no covariates as well as with equated scores when a constructed anchor test was used. The results indicate that the NEC design can produce lower standard errors compared with an EG design. When covariates were used together with an anchor test, the smallest standard errors were obtained over a large range of test scores. The results obtained, that an EG design equating can be improved by adjusting for differences in test score distributions caused by differences in the distribution of covariates, are useful in practice because not all standardized tests have anchor tests. PMID:29881012
Kernel Equating Under the Non-Equivalent Groups With Covariates Design.
Wiberg, Marie; Bränberg, Kenny
2015-07-01
When equating two tests, the traditional approach is to use common test takers and/or common items. Here, the idea is to use variables correlated with the test scores (e.g., school grades and other test scores) as a substitute for common items in a non-equivalent groups with covariates (NEC) design. This is performed in the framework of kernel equating and with an extension of the method developed for post-stratification equating in the non-equivalent groups with anchor test design. Real data from a college admissions test were used to illustrate the use of the design. The equated scores from the NEC design were compared with equated scores from the equivalent group (EG) design, that is, equating with no covariates as well as with equated scores when a constructed anchor test was used. The results indicate that the NEC design can produce lower standard errors compared with an EG design. When covariates were used together with an anchor test, the smallest standard errors were obtained over a large range of test scores. The results obtained, that an EG design equating can be improved by adjusting for differences in test score distributions caused by differences in the distribution of covariates, are useful in practice because not all standardized tests have anchor tests.
Developing an Academic Ability Scale for the Kuder Occupational Interest Survey.
ERIC Educational Resources Information Center
Figel, William J.
Earlier studies had shown that differences in measured interests are related to differences in scores on tests of academic ability. Specifically, scores on the college major interest scales of the Kuder Occupational Interest Survey (KOIS) were found to be related to scores on the National Merit Scholarship Qualifying Test (NMSQT). This suggested…
ERIC Educational Resources Information Center
Truell, Allen D.; Zhao, Jensen J.; Alexander, Melody W.
2005-01-01
The purposes of this study were to determine if there is a significant difference in postsecondary business student scores and test completion time based on settable test item exposure control interface format, and to determine if there is a significant difference in student scores and test completion time based on settable test item exposure…
Willoughby, Michael T; Kuhn, Laura J; Blair, Clancy B; Samek, Anya; List, John A
2017-10-01
This study investigates the test-retest reliability of a battery of executive function (EF) tasks with a specific interest in testing whether the method that is used to create a battery-wide score would result in differences in the apparent test-retest reliability of children's performance. A total of 188 4-year-olds completed a battery of computerized EF tasks twice across a period of approximately two weeks. Two different approaches were used to create a score that indexed children's overall performance on the battery-i.e., (1) the mean score of all completed tasks and (2) a factor score estimate which used confirmatory factor analysis (CFA). Pearson and intra-class correlations were used to investigate the test-retest reliability of individual EF tasks, as well as an overall battery score. Consistent with previous studies, the test-retest reliability of individual tasks was modest (rs ≈ .60). The test-retest reliability of the overall battery scores differed depending on the scoring approach (r mean = .72; r factor_ score = .99). It is concluded that the children's performance on individual EF tasks exhibit modest levels of test-retest reliability. This underscores the importance of administering multiple tasks and aggregating performance across these tasks in order to improve precision of measurement. However, the specific strategy that is used has a large impact on the apparent test-retest reliability of the overall score. These results replicate our earlier findings and provide additional cautionary evidence against the routine use of factor analytic approaches for representing individual performance across a battery of EF tasks.
The Formalization of Fairness: Issues in Testing for Measurement Invariance Using Subtest Scores
ERIC Educational Resources Information Center
Molenaar, Dylan; Borsboom, Denny
2013-01-01
Measurement invariance is an important prerequisite for the adequate comparison of group differences in test scores. In psychology, measurement invariance is typically investigated by means of linear factor analyses of subtest scores. These subtest scores typically result from summing the item scores. In this paper, we discuss 4 possible problems…
ERIC Educational Resources Information Center
Lowe, James D.; Karnes, Frances A.
1976-01-01
It is indicated that, although the scores [obtained on both tests] are significantly correlated, the tests yield significantly different scores with the Lorge-Thorndike consistently overestimating the WISC-R full scale I.Q. (Author)
Gordon, Elisa J; Sohn, Min-Woong; Chang, Chih-Hung; McNatt, Gwen; Vera, Karina; Beauvais, Nicole; Warren, Emily; Mannon, Roslyn B; Ison, Michael G
2017-06-01
Kidney transplant candidates (KTCs) must provide informed consent to accept kidneys from increased risk donors (IRD), but poorly understand them. We conducted a multisite, randomized controlled trial to evaluate the efficacy of a mobile Web application, Inform Me, for increasing knowledge about IRDs. Kidney transplant candidates undergoing transplant evaluation at 2 transplant centers were randomized to use Inform Me after routine transplant education (intervention) or routine transplant education alone (control). Computer adaptive learning method reinforced learning by embedding educational material, and initial (test 1) and additional test questions (test 2) into each chapter. Knowledge (primary outcome) was assessed in person after education (tests 1 and 2), and 1 week later by telephone (test 3). Controls did not receive test 2. Willingness to accept an IRD kidney (secondary outcome) was assessed after tests 1 and 3. Linear regression test 1 knowledge scores were used to test the significance of Inform Me exposure after controlling for covariates. Multiple imputation was used for intention-to-treat analysis. Two hundred eighty-eight KTCs participated. Intervention participants had higher test 1 knowledge scores (mean difference, 6.61; 95% confidence interval [95% CI], 5.37-7.86) than control participants, representing a 44% higher score than control participants' scores. Intervention participants' knowledge scores increased with educational reinforcement (test 2) compared with control arm test 1 scores (mean difference, 9.50; 95% CI, 8.27-10.73). After 1 week, intervention participants' knowledge remained greater than controls' knowledge (mean difference, 3.63; 95% CI, 2.49-4.78) (test 3). Willingness to accept an IRD kidney did not differ between study arms at tests 1 and 3. Inform Me use was associated with greater KTC knowledge about IRD kidneys above routine transplant education alone.
A Nonparametric Framework for Comparing Trends and Gaps across Tests
ERIC Educational Resources Information Center
Ho, Andrew Dean
2009-01-01
Problems of scale typically arise when comparing test score trends, gaps, and gap trends across different tests. To overcome some of these difficulties, test score distributions on the same score scale can be represented by nonparametric graphs or statistics that are invariant under monotone scale transformations. This article motivates and then…
Summary of Score Changes (in other Tests).
ERIC Educational Resources Information Center
Cleary, T. Anne; McCandless, Sam A.
Scholastic Aptitude Test (SAT) scores have declined during the last 14 years. Similar score declines have been observed in many different testing programs, many groups, and tested areas. The declines, while not large in any given year, have been consistent over time, area, and group. The period around 1965 is critical for the interpretation of…
Is the NIHSS Certification Process Too Lenient?
Hills, Nancy K.; Josephson, S. Andrew; Lyden, Patrick D.; Johnston, S. Claiborne
2009-01-01
Background and Purpose The National Institutes of Health Stroke Scale (NIHSS) is a widely used measure of neurological function in clinical trials and patient assessment; inter-rater scoring variability could impact communications and trial power. The manner in which the rater certification test is scored yields multiple correct answers that have changed over time. We examined the range of possible total NIHSS scores from answers given in certification tests by over 7,000 individual raters who were certified. Methods We analyzed the results of all raters who completed one of two standard multiple-patient videotaped certification examinations between 1998 and 2004. The range for the correct score, calculated using NIHSS ‘correct answers’, was determined for each patient. The distribution of scores derived from those who passed the certification test then was examined. Results A total of 6,268 raters scored 5 patients on Test 1; 1,240 scored 6 patients on Test 2. Using a National Stroke Association (NSA) answer key, we found that correct total scores ranged from 2 correct scores to as many as 12 different correct total scores. Among raters who achieved a passing score and were therefore qualified to administer the NIHSS, score distributions were even wider, with 1 certification patient receiving 18 different correct total scores. Conclusions Allowing multiple acceptable answers for questions on the NIHSS certification test introduces scoring variability. It seems reasonable to assume that the wider the range of acceptable answers in the certification test, the greater the variability in the performance of the test in trials and clinical practice by certified examiners. Greater consistency may be achieved by deriving a set of ‘best’ answers through expert consensus on all questions where this is possible, then teaching raters how to derive these answers using a required interactive training module. PMID:19295205
Validity and Reliability of Baseline Testing in a Standardized Environment.
Higgins, Kathryn L; Caze, Todd; Maerlender, Arthur
2017-08-11
The Immediate Postconcussion Assessment and Cognitive Testing (ImPACT) is a computerized neuropsychological test battery commonly used to determine cognitive recovery from concussion based on comparing post-injury scores to baseline scores. This model is based on the premise that ImPACT baseline test scores are a valid and reliable measure of optimal cognitive function at baseline. Growing evidence suggests that this premise may not be accurate and a large contributor to invalid and unreliable baseline test scores may be the protocol and environment in which baseline tests are administered. This study examined the effects of a standardized environment and administration protocol on the reliability and performance validity of athletes' baseline test scores on ImPACT by comparing scores obtained in two different group-testing settings. Three hundred-sixty one Division 1 cohort-matched collegiate athletes' baseline data were assessed using a variety of indicators of potential performance invalidity; internal reliability was also examined. Thirty-one to thirty-nine percent of the baseline cases had at least one indicator of low performance validity, but there were no significant differences in validity indicators based on environment in which the testing was conducted. Internal consistency reliability scores were in the acceptable to good range, with no significant differences between administration conditions. These results suggest that athletes may be reliably performing at levels lower than their best effort would produce. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
ERIC Educational Resources Information Center
Sullivan, Jeremy R.; Winter, Suzanne M.; Sass, Daniel A.; Svenkerud, Nicole
2014-01-01
Many tests provide users with several different types of scores to facilitate interpretation and description of students' performance. Common examples include raw scores, age- and grade-equivalent scores, and standard scores. However, when used within the context of assessing growth among young children, these scores should not be interchangeable…
Visuospatial Aptitude Testing Differentially Predicts Simulated Surgical Skill.
Hinchcliff, Emily; Green, Isabel; Destephano, Christopher; Cox, Mary; Smink, Douglas; Kumar, Amanika; Hokenstad, Erik; Bengtson, Joan; Cohen, Sarah
2018-02-05
To determine if visuospatial perception (VSP) testing is correlated to simulated or intraoperative surgical performance as rated by the American College of Graduate Medical Education (ACGME) milestones. Classification II-2 SETTING: Two academic training institutions PARTICIPANTS: 41 residents, including 19 Brigham and Women's Hospital and 22 Mayo Clinic residents from three different specialties (OBGYN, general surgery, urology). Participants underwent three different tests: visuospatial perception testing (VSP), Fundamentals of Laparoscopic Surgery (FLS®) peg transfer, and DaVinci robotic simulation peg transfer. Surgical grading from the ACGME milestones tool was obtained for each participant. Demographic and subject background information was also collected including specialty, year of training, prior experience with simulated skills, and surgical interest. Standard statistical analysis using Student's t test were performed, and correlations were determined using adjusted linear regression models. In univariate analysis, BWH and Mayo training programs differed in both times and overall scores for both FLS® peg transfer and DaVinci robotic simulation peg transfer (p<0.05 for all). Additionally, type of residency training impacted time and overall score on robotic peg transfer. Familiarity with tasks correlated with higher score and faster task completion (p= 0.05 for all except VSP score). There was no difference in VSP scores by program, specialty, or year of training. In adjusted linear regression modeling, VSP testing was correlated only to robotic peg transfer skills (average time p=0.006, overall score p=0.001). Milestones did not correlate to either VSP or surgical simulation testing. VSP score was correlated with robotic simulation skills but not with FLS skills or ACGME milestones. This suggests that the ability of VSP score to predict competence differs between tasks. Therefore, further investigation is required into aptitude testing, especially prior to its integration as an entry examination into a surgical subspecialty. Copyright © 2018. Published by Elsevier Inc.
Stegmeier, Nicole; Oak, Sameer R; O'Rourke, Colin; Strnad, Greg; Spindler, Kurt P; Jones, Morgan; Farrow, Lutul D; Andrish, Jack; Saluan, Paul
Two versions of the International Knee Documentation Committee (IKDC) Subjective Knee Evaluation form currently exist: the original version (1999) and a recently modified pediatric-specific version (2011). Comparison of the pediatric IKDC with the adult version in the adult population may reveal that either version could be used longitudinally. We hypothesize that the scores for the adult IKDC and pediatric IKDC will not be clinically different among adult patients aged 18 to 50 years. Randomized crossover study design. Level 2. The study consisted of 100 participants, aged 18 to 50 years, who presented to orthopaedic outpatient clinics with knee problems. All participants completed both adult and pediatric versions of the IKDC in random order with a 10-minute break in between. We used a paired t test to test for a difference between the scores and a Welch's 2-sample t test to test for equivalence. A least-squares regression model was used to model adult scores as a function of pediatric scores, and vice versa. A paired t test revealed a statistically significant 1.6-point difference between the mean adult and pediatric scores. However, the 95% confidence interval (0.54-2.66) for this difference did not exceed our a priori threshold of 5 points, indicating that this difference was not clinically important. Equivalence testing with an equivalence region of 5 points further supported this finding. The adult and pediatric scores had a linear relationship and were highly correlated with an R 2 of 92.6%. There is no clinically relevant difference between the scores of the adult and pediatric IKDC forms in adults, aged 18 to 50 years, with knee conditions. Either form, adult or pediatric, of the IKDC can be used in this population for longitudinal studies. If the pediatric version is administered in adolescence, it can be used for follow-up into adulthood.
Stegmeier, Nicole; Oak, Sameer R.; O’Rourke, Colin; Strnad, Greg; Spindler, Kurt P.; Jones, Morgan; Farrow, Lutul D.; Andrish, Jack; Saluan, Paul
2017-01-01
Background: Two versions of the International Knee Documentation Committee (IKDC) Subjective Knee Evaluation form currently exist: the original version (1999) and a recently modified pediatric-specific version (2011). Comparison of the pediatric IKDC with the adult version in the adult population may reveal that either version could be used longitudinally. Hypothesis: We hypothesize that the scores for the adult IKDC and pediatric IKDC will not be clinically different among adult patients aged 18 to 50 years. Study Design: Randomized crossover study design. Level of Evidence: Level 2. Methods: The study consisted of 100 participants, aged 18 to 50 years, who presented to orthopaedic outpatient clinics with knee problems. All participants completed both adult and pediatric versions of the IKDC in random order with a 10-minute break in between. We used a paired t test to test for a difference between the scores and a Welch’s 2-sample t test to test for equivalence. A least-squares regression model was used to model adult scores as a function of pediatric scores, and vice versa. Results: A paired t test revealed a statistically significant 1.6-point difference between the mean adult and pediatric scores. However, the 95% confidence interval (0.54-2.66) for this difference did not exceed our a priori threshold of 5 points, indicating that this difference was not clinically important. Equivalence testing with an equivalence region of 5 points further supported this finding. The adult and pediatric scores had a linear relationship and were highly correlated with an R2 of 92.6%. Conclusion: There is no clinically relevant difference between the scores of the adult and pediatric IKDC forms in adults, aged 18 to 50 years, with knee conditions. Clinical Relevance: Either form, adult or pediatric, of the IKDC can be used in this population for longitudinal studies. If the pediatric version is administered in adolescence, it can be used for follow-up into adulthood. PMID:28080306
Correcting the SAT's Ethnic and Social-Class Bias: A Method for Reestimating SAT Scores.
ERIC Educational Resources Information Center
Freedle, Roy O.
2003-01-01
A corrective scoring method, the Revised-Scholastic Achievement Test (R-SAT), addresses nonrandom ethnic test bias patterns found in the SAT. The R-SAT has been shown to reduce the mean-score difference between African-American and white test-takers by one-third, increase verbal scores by as much as 200-300 points for individuals, and benefit…
Evaluation of an Innovative Digital Assessment Tool in Dental Anatomy.
Lam, Matt T; Kwon, So Ran; Qian, Fang; Denehy, Gerald E
2015-05-01
The E4D Compare software is an innovative tool that provides immediate feedback to students' projects and competencies. It should provide consistent scores even when different scanners are used which may have inherent subtle differences in calibration. This study aimed to evaluate potential discrepancies in evaluation using the E4D Compare software based on four different NEVO scanners in dental anatomy projects. Additionally, correlation between digital and visual scores was evaluated. Thirty-five projects of maxillary left central incisors were evaluated. Among these, thirty wax-ups were performed by four operators and five consisted of standard dentoform teeth. Five scores were obtained for each project: one from an instructor that visually graded the project and from four different NEVO scanners. A faculty involved in teaching the dental anatomy course blindly scored the 35 projects. One operator scanned all projects to four NEVO scanners (D4D Technologies, Richardson, TX, USA). The images were aligned to the gold standard, and tolerance set at 0.3 mm to generate a score. The score reflected percentage match between the project and the gold standard. One-way ANOVA with repeated measures was used to determine whether there was a significant difference in scores among the four NEVO scanners. Paired-sample t-test was used to detect any difference between visual scores and the average scores of the four NEVO scanners. Pearson's correlation test was used to assess the relationship between visual and average scores of NEVO scanners. There was no significant difference in mean scores among four different NEVO scanners [F(3, 102) = 2.27, p = 0.0852 one-way ANOVA with repeated measures]. Moreover, the data provided strong evidence that a significant difference existed between visual and digital scores (p = 0.0217; a paired - sample t-test). Mean visual scores were significantly lower than digital scores (72.4 vs 75.1). Pearson's correlation coefficient of 0.85 indicated a strong correlation between visual and digital scores (p < 0.0001). The E4D Compare software provides consistent scores even when different scanners are used and correlates well with visual scores. The use of innovative digital assessment tools in dental education is promising with the E4D Compare software correlating well with visual scores and providing consistent scores even when different scanners are used.
Cho, Sun-Joo; Preacher, Kristopher J; Bottge, Brian A
2015-11-01
Multilevel modeling (MLM) is frequently used to detect group differences, such as an intervention effect in a pre-test-post-test cluster-randomized design. Group differences on the post-test scores are detected by controlling for pre-test scores as a proxy variable for unobserved factors that predict future attributes. The pre-test and post-test scores that are most often used in MLM are summed item responses (or total scores). In prior research, there have been concerns regarding measurement error in the use of total scores in using MLM. To correct for measurement error in the covariate and outcome, a theoretical justification for the use of multilevel structural equation modeling (MSEM) has been established. However, MSEM for binary responses has not been widely applied to detect intervention effects (group differences) in intervention studies. In this article, the use of MSEM for intervention studies is demonstrated and the performance of MSEM is evaluated via a simulation study. Furthermore, the consequences of using MLM instead of MSEM are shown in detecting group differences. Results of the simulation study showed that MSEM performed adequately as the number of clusters, cluster size, and intraclass correlation increased and outperformed MLM for the detection of group differences.
Pohl, Steffi; Südkamp, Anna; Hardt, Katinka; Carstensen, Claus H.; Weinert, Sabine
2016-01-01
Assessing competencies of students with special educational needs in learning (SEN-L) poses a challenge for large-scale assessments (LSAs). For students with SEN-L, the available competence tests may fail to yield test scores of high psychometric quality, which are—at the same time—measurement invariant to test scores of general education students. We investigated whether we can identify a subgroup of students with SEN-L, for which measurement invariant competence measures of adequate psychometric quality may be obtained with tests available in LSAs. We furthermore investigated whether differences in test-taking behavior may explain dissatisfying psychometric properties and measurement non-invariance of test scores within LSAs. We relied on person fit indices and mixture distribution models to identify students with SEN-L for whom test scores with satisfactory psychometric properties and measurement invariance may be obtained. We also captured differences in test-taking behavior related to guessing and missing responses. As a result we identified a subgroup of students with SEN-L for whom competence scores of adequate psychometric quality that are measurement invariant to those of general education students were obtained. Concerning test taking behavior, there was a small number of students who unsystematically picked response options. Removing these students from the sample slightly improved item fit. Furthermore, two different patterns of missing responses were identified that explain to some extent problems in the assessments of students with SEN-L. PMID:26941665
Foxton, C R; Black, D; Muhlschlegel, J; Jardine, A
2014-12-01
To assess whether there is a difference in ENT knowledge amongst nurses caring for patients on a dedicated ENT ward and nurses caring for ENT patients in a similar hospital without a dedicated ENT ward. A test of theoretical knowledge of ENT nursing care was devised and administered to nurses working on a dedicated ENT ward and then to nurses working on generic non-subspecialist wards regularly caring for ENT patients in a hospital without a dedicated ENT ward. The test scores were then compared. A single specialist ENT/Maxillo-Facial/Opthalmology ward in hospital A and 3 generic surgical wards in hospital B. Both hospitals are comparable district general hospitals in the south west of England. Nursing staff working in hospital A and hospital B on the relevant wards were approached during the working day. 11 nurses on ward 1, 10 nurses on ward 2, 11 nurses on ward 3 and 10 nurses on ward 4 (the dedicated ENT ward). Each individual test score was used to generate an average score per ward and these scores compared to see if there was a significant difference. The average score out of 10 on ward 1 was 6.8 (+/-1.6). The average score on ward two was 4.8 (+/-1.6). The average score on ward three was 5.5 (+/-2.1). The average score on ward 4, which is the dedicated ENT ward, was 9.7 (+/-0.5). The differences in average test score between the dedicated ENT ward and all of the other wards are statistically significant. Nurses working on a dedicated ENT ward have an average higher score in a test of knowledge than nurses working on generic surgical wards. This difference is statistically significant and persists despite banding or training. © 2014 John Wiley & Sons Ltd.
Haverkate, Liz; Smit, Gerwin; Plettenburg, Dick H
2016-02-01
The functional performance of currently available body-powered prostheses is unknown. The goal of this study was to objectively assess and compare the functional performance of three commonly used body-powered upper limb terminal devices. Experimental trial. A total of 21 able-bodied subjects (n = 21, age = 22 ± 2) tested three different terminal devices: TRS voluntary closing Hook Grip 2S, Otto Bock voluntary opening hand and Hosmer Model 5XA hook, using a prosthesis simulator. All subjects used each terminal device nine times in two functional tests: the Nine-Hole Peg Test and the Box and Blocks Test. Significant differences were found between the different terminal devices and their scores on the Nine-Hole Peg Test and the Box and Blocks Test. The Hosmer hook scored best in both tests. The TRS Hook Grip 2S scored second best. The Otto Bock hand showed the lowest scores. This study is a first step in the comparison of functional performances of body-powered prostheses. The data can be used as a reference value, to assess the performance of a terminal device or an amputee. The measured scores enable the comparison of the performance of a prosthesis user and his or her terminal device relative to standard scores. © The International Society for Prosthetics and Orthotics 2014.
Habets, Petra; Jeandarme, Inge; Uzieblo, Kasia; Oei, Karel; Bogaerts, Stefan
2015-05-01
A stable assessment of cognition is of paramount importance for forensic psychiatric patients (FPP). The purpose of this study was to compare repeated measures of IQ scores in FPPs with and without intellectual disability. Repeated measurements of IQ scores in FPPs (n = 176) were collected. Differences between tests were computed, and each IQ score was categorized. Additionally, t-tests and regression analyses were performed. Differences of 10 points or more were found in 66% of the cases comparing WAIS-III with RAVEN scores. Fisher's exact test revealed differences between two WAIS-III scores and the WAIS categories. The WAIS-III did not predict other IQs (WAIS or RAVEN) in participants with intellectual disability. This study showed that stability or interchangeability of scores is lacking, especially in individuals with intellectual disability. Caution in interpreting IQ scores is therefore recommended, and the use of the unitary concept of IQ should be discouraged. © 2014 John Wiley & Sons Ltd.
Qi, Beier; Liu, Bo; Liu, Sha; Liu, Haihong; Dong, Ruijuan; Zhang, Ning; Gong, Shusheng
2011-05-01
To study the effect of cochlear electrode coverage and different insertion region on speech recognition, especially tone perception of cochlear implant users whose native language is Mandarin Chinese. Setting seven test conditions by fitting software. All conditions were created by switching on/off respective channels in order to simulate different insertion position. Then Mandarin CI users received 4 Speech tests, including Vowel Identification test, Consonant Identification test, Tone Identification test-male speaker, Mandarin HINT test (SRS) in quiet and noise. To all test conditions: the average score of vowel identification was significantly different, from 56% to 91% (Rank sum test, P < 0.05). The average score of consonant identification was significantly different, from 72% to 85% (ANOVNA, P < 0.05). The average score of Tone identification was not significantly different (ANOVNA, P > 0.05). However the more channels activated, the higher scores obtained, from 68% to 81%. This study shows that there is a correlation between insertion depth and speech recognition. Because all parts of the basement membrane can help CI users to improve their speech recognition ability, it is very important to enhance verbal communication ability and social interaction ability of CI users by increasing insertion depth and actively stimulating the top region of cochlear.
ERIC Educational Resources Information Center
Malda, Maike; van de Vijver, Fons J. R.; Temane, Q. Michael
2010-01-01
In this study, cross-cultural differences in cognitive test scores are hypothesized to depend on a test's cultural complexity (Cultural Complexity Hypothesis: CCH), here conceptualized as its content familiarity, rather than on its cognitive complexity (Spearman's Hypothesis: SH). The content familiarity of tests assessing short-term memory,…
Standardized Testing Practices: Effect on Graduation and NCLEX® Pass Rates.
Randolph, Pamela K
The use standardized testing in pre-licensure nursing programs has been accompanied by conflicting reports of effective practices. The purpose of this project was to describe standardized testing practices in one states' nursing programs and discover if the use of a cut score or oversight of remediation had any effect on (a) first time NCLEX® pass rates, (b) on-time graduation (OTG) or (c) the combination of (a) and (b). Administrators of 38 nursing programs in one Southwest state were sent surveys; surveys were returned by 34 programs (89%). Survey responses were compared to each program's NCLEX pass rate and on-time graduation rate; t-tests were conducted for significant differences associated with a required minimum score (cut score) and oversight of remediation. There were no significant differences in NCLEX pass or on-time graduation rates related to establishment of a cut score. There was a significant difference when the NCLEX pass rate and on-time graduation rate were combined (Outcome Index "OI") with significantly higher program outcomes (P=.02.) for programs without cut-scores. There were no differences associated with faculty oversight of remediation. The results of this study do not support establishment of a cut-score when implementing a standardized testing. Copyright © 2016. Published by Elsevier Inc.
McKeough, D Michael; Mattern-Baxter, Katrin; Barakatt, Edward
2010-01-01
The purpose of this study was to determine if a computer-aided instruction learning module improves students' knowledge of the neuroanatomy/physiology and clinical examination of the dorsal column-medial lemniscal (DCML) system. Sixty-one physical therapy students enrolled in a clinical neurology course in entry-level PT educational programs at two universities participated in the study. Students from University-1 (U1;) had not had a previous neuroanatomy course, while students from University-2 (U2;) had taken a neuroanatomy course in the previous semester. Before and after working with the learning module, students took a paper-and-pencil test on the neuroanatomy/physiology and clinical examination of the DCML system. Kruskal-Wallis one-way ANOVA and Mann-Whitney tests were used to determine if differences existed between neuroanatomy/physiology examination scores and clinical examination scores before and after taking the learning module, and between student groups based on university attended. For students from U1, neuroanatomy/physiology post-test scores improved significantly over pre-test scores (p < 0.001), while post-test scores of students from U2 did not (p = 0.60). Neuroanatomy/physiology pre-test scores from U2 were significantly better than those from U1 (p < 0.001); there was no significant difference in post-test scores (p = 0.062). Clinical examination pre-test and post-test scores from U2 were significantly better than those from U1 (p < 0.001). Clinical examination post-test scores improved significantly from the pre-test scores for both U1 (p < 0.001) and U2 (p < 0.001).
State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Tennessee
ERIC Educational Resources Information Center
Center on Education Policy, 2010
2010-01-01
This paper profiles Tennessee's test score trends through 2008-09. Between 2005 and 2009, the percentages of students reaching the proficient level on the state test and the basic level on NAEP (National Assessment of Educational Progress) increased in grade 8 reading and math. At grade 4, trends on the state test and NAEP differed somewhat. In…
State Test Score Trends through 2008-09, Part 1: Rising Scores on State Tests and NAEP. Louisiana
ERIC Educational Resources Information Center
Center on Education Policy, 2010
2010-01-01
This paper profiles Louisiana's test score trends through 2008-09. Between 2005 and 2009, trends on state tests and NAEP (National Assessment of Educational Progress) sometimes differed. On the state test, the percentages of students reaching the proficient level increased at grades 4 and 8 in both reading and math. On NAEP, the percentage of…
Loanwords and Vocabulary Size Test Scores: A Case of Different Estimates for Different L1 Learners
ERIC Educational Resources Information Center
Laufer, Batia; McLean, Stuart
2016-01-01
The article investigated how the inclusion of loanwords in vocabulary size tests affected the test scores of two L1 groups of EFL learners: Hebrew and Japanese. New BNC- and COCA-based vocabulary size tests were constructed in three modalities: word form recall, word form recognition, and word meaning recall. Depending on the test modality, the…
ERIC Educational Resources Information Center
Cho, Sun-Joo; Preacher, Kristopher J.; Bottge, Brian A.
2015-01-01
Multilevel modeling (MLM) is frequently used to detect group differences, such as an intervention effect in a pre-test--post-test cluster-randomized design. Group differences on the post-test scores are detected by controlling for pre-test scores as a proxy variable for unobserved factors that predict future attributes. The pre-test and post-test…
Ethnic differences in the Goodenough-Harris draw-a-man and draw-a-woman tests.
Dugdale, A E; Chen, S T
1979-11-01
The draw-a-man (DAM) and draw-a-woman (DAW) tests were given to 307 schoolchildren in Petaling Jaya, Malaysia. The children were ethnically Malay, Chinese, or Indian (Tamil), and all came from lower socioeconomic groups. The standard scores of the Chinese children averaged 118 in the DAM and 112 in the DAW tests. These scores were significantly better than the American standards. Malay children scored significantly lower than Chinese, and Tamil children scored lower again. The nutritional status of the children had no influence on the scores. Chinese and Tamil children scored better in the DAM than the DAW, while in Malay boys the reverse was true. Malay children tended to emphasise clothing in the DAM, but Chinese and Tamil children scored better on items relating to facial features and body proportions. The Goodenough-Harris draw-a-person tests are obviously not culture-free, but the causes of ethnic differences have not been elucidated.
The Effect of English Language on Multiple Choice Question Scores of Thai Medical Students.
Phisalprapa, Pochamana; Muangkaew, Wayuda; Assanasen, Jintana; Kunavisarut, Tada; Thongngarm, Torpong; Ruchutrakool, Theera; Kobwanthanakun, Surapon; Dejsomritrutai, Wanchai
2016-04-01
Universities in Thailand are preparing for Thailand's integration into the ASEAN Economic Community (AEC) by increasing the number of tests in English language. English language is not the native language of Thailand Differences in English language proficiency may affect scores among test-takers, even when subject knowledge among test-takers is comparable and may falsely represent the knowledge level of the test-taker. To study the impact of English language multiple choice test questions on test scores of medical students. The final examination of fourth-year medical students completing internal medicine rotation contains 120 multiple choice questions (MCQ). The languages used on the test are Thai and English at a ratio of 3:1. Individual scores of tests taken in both languages were collected and the effect of English language on MCQ was analyzed Individual MCQ scores were then compared with individual student English language proficiency and student grade point average (GPA). Two hundred ninety five fourth-year medical students were enrolled. The mean percentage of MCQ scores in Thai and English were significantly different (65.0 ± 8.4 and 56.5 ± 12.4, respectively, p < 0.001). The correlation between MCQ scores in Thai and English was fair (Spearman's correlation coefficient = 0.41, p < 0.001). Of 295 students, only 73 (24.7%) students scored higher when being tested in English than in Thai language. Students were classified into six grade categories (A, B+, B, C+, C, and D+), which cumulatively measured total internal medicine rotation performance score plus final examination score. MCQ scores from Thai language examination were more closely correlated with total course grades than were the scores from English language examination (Spearman's correlation coefficient = 0.73 (p < 0.001) and 0.53 (p < 0.001), respectively). The gap difference between MCQ scores in both languages was higher in borderline students than in the excellent student group (11.2 ± 11.2 and 7.1 ± 8.2, respectively, p < 0.001). Overall, average student English proficiency score was very high, at 3.71 ± 0.35 from a total of 4.00. Mean student GPA was 3.40 ± 0.33 from a possible 4.00. English language MCQ examination scores were more highly associated with GPA than with English language proficiency. The use of English language multiple choice question test may decrease scores of the fourth-year internal medicine post-rotation final examination, especially those of borderline students.
ERIC Educational Resources Information Center
Tan, Xuan; Ricker, Kathryn L.; Puhan, Gautam
2010-01-01
This study examines the differences in equating outcomes between two trend score equating designs resulting from two different scoring strategies for trend scoring when operational constructed-response (CR) items are double-scored--the single group (SG) design, where each trend CR item is double-scored, and the nonequivalent groups with anchor…
NASA Astrophysics Data System (ADS)
Nehm, Ross H.; Ha, Minsu; Mayfield, Elijah
2012-02-01
This study explored the use of machine learning to automatically evaluate the accuracy of students' written explanations of evolutionary change. Performance of the Summarization Integrated Development Environment (SIDE) program was compared to human expert scoring using a corpus of 2,260 evolutionary explanations written by 565 undergraduate students in response to two different evolution instruments (the EGALT-F and EGALT-P) that contained prompts that differed in various surface features (such as species and traits). We tested human-SIDE scoring correspondence under a series of different training and testing conditions, using Kappa inter-rater agreement values of greater than 0.80 as a performance benchmark. In addition, we examined the effects of response length on scoring success; that is, whether SIDE scoring models functioned with comparable success on short and long responses. We found that SIDE performance was most effective when scoring models were built and tested at the individual item level and that performance degraded when suites of items or entire instruments were used to build and test scoring models. Overall, SIDE was found to be a powerful and cost-effective tool for assessing student knowledge and performance in a complex science domain.
Agranovich, Anna V; Panter, A T; Puente, Antonio E; Touradji, Pegah
2011-07-01
Cultural differences in time attitudes and their effect on timed neuropsychological test performance were examined in matched non-clinical samples of 100 Russian and American adult volunteers using 8 tests that were previously reported to be relatively free of cultural bias: Color Trails Test (CTT); Ruff Figural Fluency Test (RFFT); Symbol Digit Modalities Test (SDMT); and Tower of London-Drexel Edition (ToL(Dx)). A measure of time attitudes, the Culture of Time Inventory (COTI-33) was used to assess time attitudes potentially affecting time-limited testing. Americans significantly outscored Russians on CTT, SDMT, and ToL(Dx) (p,.05) while differences in RFFT scores only approached statistical significance. Group differences also emerged in COTI-33 factor scores, which partially mediated differences in performance on CTT-1, SDMT, and ToL(Dx) initiation time, but did not account for the effect of culture on CTT-2. Significant effect of culture was revealed in ratings of familiarity with testing procedures that was negatively related to CTT, ToL(Dx), and SDMT scores. Current findings indicated that attitudes toward time may influence results of time limited testing and suggested that individuals who lack familiarity with timed testing procedures tend to obtain lower scores on timed tests.
ERIC Educational Resources Information Center
Arieli-Attali, Meirav
2016-01-01
This dissertation investigated the feasibility of self-adapted testing (SAT) as a formative assessment tool with the focus on learning. Under two different orientation goals--to excel on a test (performance goal) or to learn from the test (learning goal)--I examined the effect of different scoring rules provided as interactive feedback, on test…
Is Test Security an Issue in a Multistation Clinical Assessment?--A Preliminary Study.
ERIC Educational Resources Information Center
Stillman, Paula L.; And Others
1991-01-01
A study investigated possible differences in standardized patient examination scores for three groups of undergraduate (n=176) and graduate (n=221) medical students assessed at different sites over two years. Results show no systematic change in scores over testing dates, suggesting no problems with breach of test security. (MSE)
Magyari, N; Szakács, V; Bartha, C; Szilágyi, B; Galamb, K; Magyar, M O; Hortobágyi, T; Kiss, R M; Tihanyi, J; Négyesi, J
2017-09-01
Aims The aim of this study was to examine the effects of gender on the relationship between Functional Movement Screen (FMS) and treadmill-based gait parameters. Methods Twenty elite junior athletes (10 women and 10 men) performed the FMS tests and gait analysis at a fixed speed. Between-gender differences were calculated for the relationship between FMS test scores and gait parameters, such as foot rotation, step length, and length of gait line. Results Gender did not affect the relationship between FMS and treadmill-based gait parameters. The nature of correlations between FMS test scores and gait parameters was different in women and men. Furthermore, different FMS test scores predicted different gait parameters in female and male athletes. FMS asymmetry and movement asymmetries measured by treadmill-based gait parameters did not correlate in either gender. Conclusion There were no interactions between FMS, gait parameters, and gender; however, correlation analyses support the idea that strength and conditioning coaches need to pay attention not only to how to score but also how to correctly use FMS.
Verification of learner’s differences by team-based learning in biochemistry classes
2017-01-01
Purpose We tested the effect of team-based learning (TBL) on medical education through the second-year premedical students’ TBL scores in biochemistry classes over 5 years. Methods We analyzed the results based on test scores before and after the students’ debate. The groups of students for statistical analysis were divided as follows: group 1 comprised the top-ranked students, group 3 comprised the low-ranked students, and group 2 comprised the medium-ranked students. Therefore, group T comprised 382 students (the total number of students in group 1, 2, and 3). To calibrate the difficulty of the test, original scores were converted into standardized scores. We determined the differences of the tests using Student t-test, and the relationship between scores before, and after the TBL using linear regression tests. Results Although there was a decrease in the lowest score, group T and 3 showed a significant increase in both original and standardized scores; there was also an increase in the standardized score of group 3. There was a positive correlation between the pre- and the post-debate scores in group T, and 2. And the beta values of the pre-debate scores and “the changes between the pre- and post-debate scores” were statistically significant in both original and standardized scores. Conclusion TBL is one of the educational methods for helping students improve their grades, particularly those of low-ranked students. PMID:29207457
Scoring Yes-No Vocabulary Tests: Reaction Time vs. Nonword Approaches
ERIC Educational Resources Information Center
Pellicer-Sanchez, Ana; Schmitt, Norbert
2012-01-01
Despite a number of research studies investigating the Yes-No vocabulary test format, one main question remains unanswered: What is the best scoring procedure to adjust for testee overestimation of vocabulary knowledge? Different scoring methodologies have been proposed based on the inclusion and selection of nonwords in the test. However, there…
Cho, Sun-Joo; Preacher, Kristopher J.; Bottge, Brian A.
2015-01-01
Multilevel modeling (MLM) is frequently used to detect group differences, such as an intervention effect in a pre-test–post-test cluster-randomized design. Group differences on the post-test scores are detected by controlling for pre-test scores as a proxy variable for unobserved factors that predict future attributes. The pre-test and post-test scores that are most often used in MLM are summed item responses (or total scores). In prior research, there have been concerns regarding measurement error in the use of total scores in using MLM. To correct for measurement error in the covariate and outcome, a theoretical justification for the use of multilevel structural equation modeling (MSEM) has been established. However, MSEM for binary responses has not been widely applied to detect intervention effects (group differences) in intervention studies. In this article, the use of MSEM for intervention studies is demonstrated and the performance of MSEM is evaluated via a simulation study. Furthermore, the consequences of using MLM instead of MSEM are shown in detecting group differences. Results of the simulation study showed that MSEM performed adequately as the number of clusters, cluster size, and intraclass correlation increased and outperformed MLM for the detection of group differences. PMID:29881032
ERIC Educational Resources Information Center
Pan, Tianshu; Yin, Yue
2012-01-01
In the discussion of mean square difference (MSD) and standard error of measurement (SEM), Barchard (2012) concluded that the MSD between 2 sets of test scores is greater than 2(SEM)[superscript 2] and SEM underestimates the score difference between 2 tests when the 2 tests are not parallel. This conclusion has limitations for 2 reasons. First,…
[Cancer nursing care education programs: the effectiveness of different teaching methods].
Cheng, Yun-Ju; Kao, Yu-Hsiu
2012-10-01
In-service education affects the quality of cancer care directly. Using classroom teaching to deliver in-service education is often ineffective due to participants' large workload and shift requirements. This study evaluated the learning effectiveness of different teaching methods in the dimensions of knowledge, attitude, and learning satisfaction. This study used a quasi-experimental study design. Participants were cancer ward nurses working at one medical center in northern Taiwan. Participants were divided into an experimental group and control group. The experimental group took an e-learning course and the control group took a standard classroom course using the same basic course material. Researchers evaluated the learning efficacy of each group using a questionnaire based on the quality of cancer nursing care learning effectiveness scale. All participants answered the questionnaire once before and once after completing the course. (1) Post-test "knowledge" scores for both groups were significantly higher than pre-test scores for both groups. Post-test "attitude" scores were significantly higher for the control group, while the experimental group reported no significant change. (2) after a covariance analysis of the pre-test scores for both groups, the post-test score for the experimental group was significantly lower than the control group in the knowledge dimension. Post-test scores did not differ significantly from pre-test scores for either group in the attitude dimension. (3) Post-test satisfaction scores between the two groups did not differ significantly with regard to teaching methods. The e-learning method, however, was demonstrated as more flexible than the classroom teaching method. Study results demonstrate the importance of employing a variety of teaching methods to instruct clinical nursing staff. We suggest that both classroom teaching and e-learning instruction methods be used to enhance the quality of cancer nursing care education programs. We also encourage that interactivity between student and instructor be incorporated into e-learning course designs to enhance effectiveness.
NASA Astrophysics Data System (ADS)
Anderson, Pamela Bennett
Purpose. The purpose of the first study was to ascertain the extent to which differences were present in the STAAR Mathematics and Science test scores by Grade 5 and Grade 8 student economic status. The purpose of the second study was to examine differences in Grade 5 STAAR Mathematics and Science test performance by gender and by ethnicity/race (i.e., Asian, Black, Hispanic, and White). Finally, with respect to the third study in this journal-ready dissertation, the purpose was to investigate the STAAR Mathematics and Science test scores of Grade 8 students by gender and by ethnicity/race (i.e., Asian, Black, Hispanic, and White). Method. For this journal-ready dissertation, a non-experimental, causal-comparative research design (Creswell, 2009) was used in all three studies. Grade 5 and Grade 8 STAAR Mathematics and Science test data were analyzed for the 2011-2012 through the 2014-2015 school years. The dependent variables were the STAAR Mathematics and Science test scores for Grade 5 and Grade 8. The independent variables analyzed in these studies were student economic status, gender, and ethnicity/race. Findings. Regarding the first study, statistically significant differences were present in Grade 5 and Grade 8 STAAR Mathematics and Science test scores by student economic status for each year. Moderate effect sizes (Cohen's d) were present for each year of the study for the Grade 5 STAAR Mathematics and Science exams, Grade 8 Science exams, and the 2014-2015 Grade 8 STAAR Mathematics exam. However, a small effect size was present for the 2011-2012 through 2013-2014 Grade 8 STAAR Mathematics exam. Regarding the second and third study, statistically significant differences were revealed for Grade 5 and Grade 8 STAAR Mathematics and Science test scores based on gender, with trivial effect sizes. Furthermore, statistically significant differences were present in these test scores by ethnicity/race, with moderate effects for each year of the study. With regard to each year for both studies, Asian students had the highest average test scores, followed by White, Hispanic, and Black students, respectively. Thus, a stairstep achievement gap (Carpenter, Ramirez, & Severn, 2006) was present.
Azad, Aftab Mohammad; Al Juma, Saad; Bhatti, Junaid Ahmad; Delaney, J Scott
2016-01-01
Balance testing is an important part of the initial concussion assessment. There is no research on the differences in Modified Balance Error Scoring System (M-BESS) scores when tested in real world as compared to control conditions. To assess the difference in M-BESS scores in athletes wearing their protective equipment and cleats on different surfaces as compared to control conditions. This cross-sectional study examined university North American football and soccer athletes. Three observers independently rated athletes performing the M-BESS test in three different conditions: (1) wearing shorts and T-shirt in bare feet on firm surface (control); (2) wearing athletic equipment with cleats on FieldTurf; and (3) wearing athletic equipment with cleats on firm surface. Mean M-BESS scores were compared between conditions. 60 participants were recruited: 39 from football (all males) and 21 from soccer (11 males and 10 females). Average age was 21.1 years (SD=1.8). Mean M-BESS scores were significantly lower (p<0.001) for cleats on FieldTurf (mean=26.3; SD=2.0) and for cleats on firm surface (mean=26.6; SD=2.1) as compared to the control condition (mean=28.4; SD=1.5). Females had lower scores than males for cleats on FieldTurf condition (24.9 (SD=1.9) vs 27.3 (SD=1.6), p=0.005). Players who had taping or bracing on their ankles/feet had lower scores when tested with cleats on firm surface condition (24.6 (SD=1.7) vs 26.9 (SD=2.0), p=0.002). Total M-BESS scores for athletes wearing protective equipment and cleats standing on FieldTurf or a firm surface are around two points lower than M-BESS scores performed on the same athletes under control conditions.
Azad, Aftab Mohammad; Al Juma, Saad; Bhatti, Junaid Ahmad; Delaney, J Scott
2016-01-01
Background Balance testing is an important part of the initial concussion assessment. There is no research on the differences in Modified Balance Error Scoring System (M-BESS) scores when tested in real world as compared to control conditions. Objective To assess the difference in M-BESS scores in athletes wearing their protective equipment and cleats on different surfaces as compared to control conditions. Methods This cross-sectional study examined university North American football and soccer athletes. Three observers independently rated athletes performing the M-BESS test in three different conditions: (1) wearing shorts and T-shirt in bare feet on firm surface (control); (2) wearing athletic equipment with cleats on FieldTurf; and (3) wearing athletic equipment with cleats on firm surface. Mean M-BESS scores were compared between conditions. Results 60 participants were recruited: 39 from football (all males) and 21 from soccer (11 males and 10 females). Average age was 21.1 years (SD=1.8). Mean M-BESS scores were significantly lower (p<0.001) for cleats on FieldTurf (mean=26.3; SD=2.0) and for cleats on firm surface (mean=26.6; SD=2.1) as compared to the control condition (mean=28.4; SD=1.5). Females had lower scores than males for cleats on FieldTurf condition (24.9 (SD=1.9) vs 27.3 (SD=1.6), p=0.005). Players who had taping or bracing on their ankles/feet had lower scores when tested with cleats on firm surface condition (24.6 (SD=1.7) vs 26.9 (SD=2.0), p=0.002). Conclusions Total M-BESS scores for athletes wearing protective equipment and cleats standing on FieldTurf or a firm surface are around two points lower than M-BESS scores performed on the same athletes under control conditions. PMID:27900181
Matsuki, Y; Ichinohe, T; Kaneko, Y
2007-01-01
To compare the amnesic effect of propofol and midazolam to electric dental pulp stimulation (invasive) and picture recall test (non-invasive) at two sedation levels with the aid of bispectral index (BIS) monitoring. The subjects were 10 male volunteers (24-34 years) classified as ASA physical status I. Propofol was administered to achieve a sedation score of three with a target-controlled infusion technique; it was then regulated to give a sedation score of two (P group). Midazolam was administered by a titration dosage to achieve a sedation score of three (M group). It then gradually decreased to give a sedation score of two. The BIS score, sedation score, plasma/serum concentration of propofol and midazolam, blood pressure, pulse rate, respiratory rate, end-tidal CO(2) tension and arterial oxygen saturation were observed at each sedation level in both groups. Amnesic effects were evaluated using a picture recall test and electric dental pulp stimulation. No difference was observed in the amnesic effect evaluated by picture recall test at the two sedation levels. Likewise, there was no difference at a sedation score of three when the amnesic effect was evaluated by electric dental pulp stimulation. In contrast, a significant difference was observed at a sedation score of two; midazolam produced amnesia in more subjects than did propofol. Propofol and midazolam did not show any significant difference in amnesic effects to non-invasive stimuli. For invasive stimuli, midazolam showed a stronger amnesic effect at the moderate sedation level, but not at the deeper sedation level.
Mallett, Susan; Halligan, Steve; Collins, Gary S.; Altman, Doug G.
2014-01-01
Background Different methods of evaluating diagnostic performance when comparing diagnostic tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for the evaluation of CT colonography for the detection of polyps, either with or without computer assisted detection. Methods In a multireader multicase study of 10 readers and 107 cases we compared sensitivity and specificity, using radiological reporting of the presence or absence of polyps, to ROC AUC calculated from confidence scores concerning the presence of polyps. Both methods were assessed against a reference standard. Here we focus on five readers, selected to illustrate issues in design and analysis. We compared diagnostic measures within readers, showing that differences in results are due to statistical methods. Results Reader performance varied widely depending on whether sensitivity and specificity or ROC AUC was used. There were problems using confidence scores; in assigning scores to all cases; in use of zero scores when no polyps were identified; the bimodal non-normal distribution of scores; fitting ROC curves due to extrapolation beyond the study data; and the undue influence of a few false positive results. Variation due to use of different ROC methods exceeded differences between test results for ROC AUC. Conclusions The confidence scores recorded in our study violated many assumptions of ROC AUC methods, rendering these methods inappropriate. The problems we identified will apply to other detection studies using confidence scores. We found sensitivity and specificity were a more reliable and clinically appropriate method to compare diagnostic tests. PMID:25353643
Mallett, Susan; Halligan, Steve; Collins, Gary S; Altman, Doug G
2014-01-01
Different methods of evaluating diagnostic performance when comparing diagnostic tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for the evaluation of CT colonography for the detection of polyps, either with or without computer assisted detection. In a multireader multicase study of 10 readers and 107 cases we compared sensitivity and specificity, using radiological reporting of the presence or absence of polyps, to ROC AUC calculated from confidence scores concerning the presence of polyps. Both methods were assessed against a reference standard. Here we focus on five readers, selected to illustrate issues in design and analysis. We compared diagnostic measures within readers, showing that differences in results are due to statistical methods. Reader performance varied widely depending on whether sensitivity and specificity or ROC AUC was used. There were problems using confidence scores; in assigning scores to all cases; in use of zero scores when no polyps were identified; the bimodal non-normal distribution of scores; fitting ROC curves due to extrapolation beyond the study data; and the undue influence of a few false positive results. Variation due to use of different ROC methods exceeded differences between test results for ROC AUC. The confidence scores recorded in our study violated many assumptions of ROC AUC methods, rendering these methods inappropriate. The problems we identified will apply to other detection studies using confidence scores. We found sensitivity and specificity were a more reliable and clinically appropriate method to compare diagnostic tests.
The assessment of fetal brain function in fetuses with ventrikulomegaly: the role of the KANET test.
Talic, Amira; Kurjak, Asim; Stanojevic, Milan; Honemeyer, Ulrich; Badreldeen, Ahmed; DiRenzo, Gian Carlo
2012-08-01
To assess differences in fetal behavior in both normal fetuses and fetuses with cerebral ventriculomegaly (VM). In a period of eighteen months, in a longitudinal prospective cohort study, Kurjak Antenatal NeuorogicalTest (KANET) was applied to assess fetal behavior in both normal pregnancies and pregnancies with cerebral VM using four-dimensional ultrasound (4D US). According to the degree of enlargement of the ventricles, VM was divided into three groups: mild, moderate and severe. Moreover fetuses with isolated VM were separated from those with additional abnormalities. According to the KANET, fetuses with scores ≥ 14 were considered normal, those with scores 6-13 borderline and abnormal if the score was ≤ 5. Differences between two groups were examined by Fisher's exact test. Differences within the subgroups were examined by Kruskal-Wallis test and contingency table test. KANET scores in normal pregnancies and pregnancies with VM showed statistically significant differences. Most of the abnormal KANET scores as well as most of the borderline-scores were found among the fetuses with severe VM associated with additional abnormalities. There were no statistically significant differences between the control group and the groups with isolated and mild and /or moderate VM. Evaluation of the fetal behavior in fetuses with cerebral VM using KANET test has the potential to detect fetuses with abnormal behavior, and to add the dimension of CNS function to the morphological criteria of VM. Long-term postnatal neurodevelopmental follow-up should confirm the data from prenatal investigation of fetal behavior.
Surgical simulation tasks challenge visual working memory and visual-spatial ability differently.
Schlickum, Marcus; Hedman, Leif; Enochsson, Lars; Henningsohn, Lars; Kjellin, Ann; Felländer-Tsai, Li
2011-04-01
New strategies for selection and training of physicians are emerging. Previous studies have demonstrated a correlation between visual-spatial ability and visual working memory with surgical simulator performance. The aim of this study was to perform a detailed analysis on how these abilities are associated with metrics in simulator performance with different task content. The hypothesis is that the importance of visual-spatial ability and visual working memory varies with different task contents. Twenty-five medical students participated in the study that involved testing visual-spatial ability using the MRT-A test and visual working memory using the RoboMemo computer program. Subjects were also trained and tested for performance in three different surgical simulators. The scores from the psychometric tests and the performance metrics were then correlated using multivariate analysis. MRT-A score correlated significantly with the performance metrics Efficiency of screening (p = 0.006) and Total time (p = 0.01) in the GI Mentor II task and Total score (p = 0.02) in the MIST-VR simulator task. In the Uro Mentor task, both the MRT-A score and the visual working memory 3-D cube test score as presented in the RoboMemo program (p = 0.02) correlated with Total score (p = 0.004). In this study we have shown that some differences exist regarding the impact of visual abilities and task content on simulator performance. When designing future cognitive training programs and testing regimes, one might have to consider that the design must be adjusted in accordance with the specific surgical task to be trained in mind.
Assessing the Practical Equivalence of Conversions when Measurement Conditions Change
ERIC Educational Resources Information Center
Liu, Jinghua; Dorans, Neil J.
2012-01-01
At times, the same set of test questions is administered under different measurement conditions that might affect the psychometric properties of the test scores enough to warrant different score conversions for the different conditions. We propose a procedure for assessing the practical equivalence of conversions developed for the same set of test…
Clinical competency evaluation of Brazilian chiropractic interns
Facchinato, Ana Paula A.; Benedicto, Camila C.; Mora, Aline G.; Cabral, Dayane M.C.; Fagundes, Djalma J.
2015-01-01
Objective This study compares the results of an objective structured clinical examination (OSCE) between 2 groups of students before an internship and after 6 months of clinical practice in an internship. Methods Seventy-two students participated, with 36 students in each cohort. The OSCEs were performed in the simulation laboratory before the participants' clinical practice internship and after 6 months of the internship. Students were tested in 9 stations for clinical skills and knowledge. The same procedures were repeated for both cohorts. The t test was used for unpaired parametric samples and Fisher's exact test was used for comparison of proportions. Results There was no difference in the mean final score between the 2 groups (p = .34 for test 1; p = .08 for test 2). The performance of the students in group 1 was not significantly different when performed before and after 6 months of clinical practice, but in group 2 there was a significant decrease in the average score after 6 months of clinical practice. Conclusions There was no difference in the cumulative average score for the 2 groups before and after 6 months of clinical practice in the internship. There were differences within the cohorts, however, with a significant decrease in the average score in group 2. Issues pertaining to test standardization and student motivation for test 2 may have influenced the scores. PMID:25588200
Predictive effects of teachers and schools on test scores, college attendance, and earnings.
Chamberlain, Gary E
2013-10-22
I studied predictive effects of teachers and schools on test scores in fourth through eighth grade and outcomes later in life such as college attendance and earnings. For example, predict the fraction of a classroom attending college at age 20 given the test score for a different classroom in the same school with the same teacher and given the test score for a classroom in the same school with a different teacher. I would like to have predictive effects that condition on averages over many classrooms, with and without the same teacher. I set up a factor model that, under certain assumptions, makes this feasible. Administrative school district data in combination with tax data were used to calculate estimates and do inference.
NASA Astrophysics Data System (ADS)
Burns, Dana
Over the last two decades, online education has become a popular concept in universities as well as K-12 education. This generation of students has grown up using technology and has shown interest in incorporating technology into their learning. The idea of using technology in the classroom to enhance student learning and create higher achievement has become necessary for administrators, teachers, and policymakers. Although online education is a popular topic, there has been minimal research on the effectiveness of online and blended learning strategies compared to the student learning in a traditional K-12 classroom setting. The purpose of this study was to investigate differences in standardized test scores from the Biology End of Course exam when at-risk students completed the course using three different educational models: online format, blended learning, and traditional face-to-face learning. Data was collected from over 1,000 students over a five year time period. Correlation analyzed data from standardized tests scores of eighth grade students was used to define students as "at-risk" for failing high school courses. The results indicated a high correlation between eighth grade standardized test scores and Biology End of Course exam scores. These students were deemed "at-risk" for failing high school courses. Standardized test scores were measured for the at-risk students when those students completed Biology in the different models of learning. Results indicated significant differences existed among the learning models. Students had the highest test scores when completing Biology in the traditional face-to-face model. Further evaluation of subgroup populations indicated statistical differences in learning models for African-American populations, female students, and for male students.
The Effect of Pretest Exercise on Baseline Computerized Neurocognitive Test Scores.
Pawlukiewicz, Alec; Yengo-Kahn, Aaron M; Solomon, Gary
2017-10-01
Baseline neurocognitive assessment plays a critical role in return-to-play decision making following sport-related concussions. Prior studies have assessed the effect of a variety of modifying factors on neurocognitive baseline test scores. However, relatively little investigation has been conducted regarding the effect of pretest exercise on baseline testing. The aim of our investigation was to determine the effect of pretest exercise on baseline Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) scores in adolescent and young adult athletes. We hypothesized that athletes undergoing self-reported strenuous exercise within 3 hours of baseline testing would perform more poorly on neurocognitive metrics and would report a greater number of symptoms than those who had not completed such exercise. Cross-sectional study; Level of evidence, 3. The ImPACT records of 18,245 adolescent and young adult athletes were retrospectively analyzed. After application of inclusion and exclusion criteria, participants were dichotomized into groups based on a positive (n = 664) or negative (n = 6609) self-reported history of strenuous exercise within 3 hours of the baseline test. Participants with a positive history of exercise were then randomly matched, based on age, sex, education level, concussion history, and hours of sleep prior to testing, on a 1:2 basis with individuals who had reported no pretest exercise. The baseline ImPACT composite scores of the 2 groups were then compared. Significant differences were observed for the ImPACT composite scores of verbal memory, visual memory, reaction time, and impulse control as well as for the total symptom score. No significant between-group difference was detected for the visual motor composite score. Furthermore, pretest exercise was associated with a significant increase in the overall frequency of invalid test results. Our results suggest a statistically significant difference in ImPACT composite scores between individuals who report strenuous exercise prior to baseline testing compared with those who do not. Since return-to-play decision making often involves documentation of return to neurocognitive baseline, the baseline test scores must be valid and accurate. As a result, we recommend standardization of baseline testing such that no strenuous exercise takes place 3 hours prior to test administration.
Multidimensional Scoring of Abilities: The Ordered Polytomous Response Case
ERIC Educational Resources Information Center
de la Torre, Jimmy
2008-01-01
Recent work has shown that multidimensionally scoring responses from different tests can provide better ability estimates. For educational assessment data, applications of this approach have been limited to binary scores. Of the different variants, the de la Torre and Patz model is considered more general because implementing the scoring procedure…
Speech-discrimination scores modeled as a binomial variable.
Thornton, A R; Raffin, M J
1978-09-01
Many studies have reported variability data for tests of speech discrimination, and the disparate results of these studies have not been given a simple explanation. Arguments over the relative merits of 25- vs 50-word tests have ignored the basic mathematical properties inherent in the use of percentage scores. The present study models performance on clinical tests of speech discrimination as a binomial variable. A binomial model was developed, and some of its characteristics were tested against data from 4120 scores obtained on the CID Auditory Test W-22. A table for determining significant deviations between scores was generated and compared to observed differences in half-list scores for the W-22 tests. Good agreement was found between predicted and observed values. Implications of the binomial characteristics of speech-discrimination scores are discussed.
Investigating Differences between American and Indian Raters in Assessing TOEFL iBT Speaking Tasks
ERIC Educational Resources Information Center
Wei, Jing; Llosa, Lorena
2015-01-01
This article reports on an investigation of the role raters' language background plays in raters' assessment of test takers' speaking ability. Specifically, this article examines differences between American and Indian raters in their scores and scoring processes when rating Indian test takers' responses to the Test of English as a Foreign…
Ruamviboonsuk, Paisan; Sudsakorn, Napitchareeya; Somkijrungroj, Thanapong; Engkagul, Chayanee; Tiensuwan, Montip
2012-03-01
Electronic measurement of visual acuity (VA) has been proposed and adopted as a method of determining VA scores in clinical research. Characters (optotypes) are displayed on a monitor screen and the examinee selects a match and inputs his choice to another electronic device. Unfortunately, the optotypes, called Sloan letters, in the standard protocol are 10 Roman characters. This limits their practicabilityfor measuring VA of patients who are illiterate to these characters. The authors introduced a method of displaying the Sloan letters one by one on a notebook and all 10 Sloan letters on a tablet computer screen. The former is for testing the patients whereas the latter is for them to input their responses by tapping on a letter that matches the one on the notebook screen. To assess test-retest reliability of VA scores determined with this method. Participants without ocular abnormality were recruited to have their right eyes measured with the same VA measurement method twice, one week apart. Those who were illiterate to Roman characters were enrolled for the aforementioned method for measuring their VA (Tablet group). A 15-inch display notebook computer and a 9-inch display tablet computer (iPad) communicated via a local wireless data network provided by a Wi-Fi router. Those who understood Roman characters were enrolled to have measurements with a 17-inch desktop computer and an infrared wireless keyboard (Keyboard group). Both methods used the same protocols and software for VA measurements. Reliability of VA scores obtained from each group was assessed by the confidence interval (CI) of the difference of the scores from the test and retest. The t test was used to analyze differences in mean VA scores between the test and retest in each group with p < 0.05 determined as statistically significant. There were 49 and 50 participants in the Tablet and Keyboard group respectively. The 95% CI of the difference between the scores from the test and retest in each group was 2 letters. Approximately 95% of participants in each group had an absolute difference of the scores between the test and retest of 7 letters. The mean of VA scores from the first test was significantly different from that of the second test in the Keyboard group (one-letter difference, p = 0.049); there was no significant difference between these scores in the Tablet group (0.1-letter difference, p = 0.86). Tablet computers may be used to assist patients who are illiterate to Roman characters in having their VA measured with the standard electronic protocol. This preliminary study suggested that the proposed method should be useful for reliable measuring VA outcome in multicenter international clinical trials without encountering a language barrier
A Maturing Global Testing Regime Meets the World Economy: Test Scores and Economic Growth, 1960-2012
ERIC Educational Resources Information Center
Kamens, David H.
2015-01-01
This article considers the growth of the international testing regime. It discusses sources of growth and empirically examines two related sets of issues: (1) the stability of countries' achievement scores, and (2) the influence of those national scores on subsequent economic development over different time lags. The article suggests that…
ERIC Educational Resources Information Center
King, Molly Elizabeth
2016-01-01
The purpose of this quantitative, causal-comparative study was to compare the effect elementary music and visual arts lessons had on third through sixth grade standardized mathematics test scores. Inferential statistics were used to compare the differences between test scores of students who took in-school, elementary, music instruction during the…
Cognitive skills, student achievement tests, and schools.
Finn, Amy S; Kraft, Matthew A; West, Martin R; Leonard, Julia A; Bish, Crystal E; Martin, Rebecca E; Sheridan, Margaret A; Gabrieli, Christopher F O; Gabrieli, John D E
2014-03-01
Cognitive skills predict academic performance, so schools that improve academic performance might also improve cognitive skills. To investigate the impact schools have on both academic performance and cognitive skills, we related standardized achievement-test scores to measures of cognitive skills in a large sample (N = 1,367) of eighth-grade students attending traditional, exam, and charter public schools. Test scores and gains in test scores over time correlated with measures of cognitive skills. Despite wide variation in test scores across schools, differences in cognitive skills across schools were negligible after we controlled for fourth-grade test scores. Random offers of enrollment to oversubscribed charter schools resulted in positive impacts of such school attendance on math achievement but had no impact on cognitive skills. These findings suggest that schools that improve standardized achievement-test scores do so primarily through channels other than improving cognitive skills.
Asymptotic Standard Errors of Observed-Score Equating with Polytomous IRT Models
ERIC Educational Resources Information Center
Andersson, Björn
2016-01-01
In observed-score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response…
Exploring the gender gap in the conceptual survey of electricity and magnetism
NASA Astrophysics Data System (ADS)
Henderson, Rachel; Stewart, Gay; Stewart, John; Michaluk, Lynnette; Traxler, Adrienne
2017-12-01
The "gender gap" on various physics conceptual evaluations has been extensively studied. Men's average pretest scores on the Force Concept Inventory and Force and Motion Conceptual Evaluation are 13% higher than women's, and post-test scores are on average 12% higher than women's. This study analyzed the gender differences within the Conceptual Survey of Electricity and Magnetism (CSEM) in which the gender gap has been less well studied and is less consistent. In the current study, data collected from 1407 students (77% men, 23% women) in a calculus-based physics course over ten semesters showed that male students outperformed female students on the CSEM pretest (5%) and post-test (6%). Separate analyses were conducted for qualitative and quantitative problems on lab quizzes and course exams and showed that male students outperformed female students by 3% on qualitative quiz and exam problems. Male and female students performed equally on the quantitative course exam problems. The gender gaps within CSEM post-test scores, qualitative lab quiz scores, and qualitative exam scores were insignificant for students with a CSEM pretest score of 25% or less but grew as pretest scores increased. Structural equation modeling demonstrated that a latent variable, called Conceptual Physics Performance/Non-Quantitative (CPP/NonQnt), orthogonal to quantitative test performance was useful in explaining the differences observed in qualitative performance; this variable was most strongly related to CSEM post-test scores. The CPP/NonQnt of male students was 0.44 standard deviations higher than female students. The CSEM pretest measured CPP/NonQnt much less accurately for women (R2=4 % ) than for men (R2=17 % ). The failure to detect a gender gap for students scoring 25% or less on the pretest suggests that the CSEM instrument itself is not gender biased. The failure to find a performance difference in quantitative test performance while detecting a gap in qualitative performance suggests the qualitative differences do not result from psychological factors such as science anxiety or stereotype threat.
Datta, Rakesh; Datta, Karuna; Venkatesh, M D
2015-07-01
The classical didactic lecture has been the cornerstone of the theoretical undergraduate medical education. Their efficacy however reduces due to reduced interaction and short attention span of the students. It is hypothesized that the interactive response pad obviates some of these drawbacks. The aim of this study was to evaluate the effectiveness of an interactive response system by comparing it with conventional classroom teaching. A prospective comparative longitudinal study was conducted on 192 students who were exposed to either conventional or interactive teaching over 20 classes. Pre-test, Post-test and retentions test (post 8-12 weeks) scores were collated and statistically analysed. An independent observer measured number of student interactions in each class. Pre-test scores from both groups were similar (p = 0.71). There was significant improvement in both post test scores when compared to pre-test scores in either method (p < 0.001). The interactive post-test score was better than conventional post test score (p < 0.001) by 8-10% (95% CI-difference of means - 8.2%-9.24%-10.3%). The interactive retention test score was better than conventional retention test score (p < 0.001) by 15-18% (95% CI-difference of means - 15.0%-16.64%-18.2%). There were 51 participative events in the interactive group vs 25 in the conventional group. The Interactive Response Pad method was efficacious in teaching. Students taught with the interactive method were likely to score 8-10% higher (statistically significant) in the immediate post class time and 15-18% higher (statistically significant) after 8-12 weeks. The number of student-teacher interactions increases when using the interactive response pads.
Hackethal, A; Immenroth, M; Bürger, T
2006-04-01
The Minimally Invasive Surgical Trainer-Virtual Reality (MIST-VR) simulator is validated for laparoscopy training, but benchmarks and target scores for assessing single tasks are needed. Control data for the MIST-VR traversal task scenario were collected from 61 novices who performed the task 10 times over 3 days (1 h daily). Data were collected on the time taken, error score, economy of movement, and total score. Test differences were analyzed through percentage scores and t-tests for paired samples. Improvement was greatest over tests 1 to 5 (improvement: test(1.2), 38.07%; p = 0.000; test(4.5), 10.66%; p = 0.010): between tests 5 and 10, improvement slowed and scores stabilized. Variation in participants' performance fell steadily over the 10 tests. Trainees should perform at least 10 tests of the traversal task-five to get used to the equipment and task (automation phase; target total score, 95.16) and five to stabilize and consolidate performance (test 10 target total score, 74.11).
NASA Astrophysics Data System (ADS)
Baumgarten, Kristyne A.
This study investigated the possible relationship between collaborative learning strategies and the learning of core concepts. This study examined the differences between two groups of nursing students enrolled in an introductory microbiology laboratory course. The control group consisted of students enrolled in sections taught in the traditional method. The experimental group consisted of those students enrolled in the sections using collaborative learning strategies. The groups were assessed on their degrees of learning core concepts using a pre-test/post-test method. Scores from the groups' laboratory reports were also analyzed. There was no difference in the two group's pre-test scores. The post-test scores of the experimental group averaged 11 points higher than the scores of the control group. The lab report scores of the experimental group averaged 15 points higher than those scores of the control group. The data generated from this study demonstrated that collaborative learning strategies can be used to increase students learning of core concepts in microbiology labs.
Sun, Jennifer K; Qin, Haijing; Aiello, Lloyd Paul; Melia, Michele; Beck, Roy W; Andreoli, Christopher M; Edwards, Paul A; Glassman, Adam R; Pavlica, Michael R
2012-04-01
To compare visual acuity (VA) scores after autorefraction vs manual refraction in eyes of patients with diabetes mellitus and a wide range of VAs. The letter score from the Electronic Visual Acuity (EVA) test from the electronic Early Treatment Diabetic Retinopathy Study was measured after autorefraction (AR-EVA score) and after manual refraction (MR-EVA score), which is the research protocol of the Diabetic Retinopathy Clinical Research Network. Testing order was randomized, study participants and VA examiners were masked to refraction source, and a second EVA test using an identical supplemental manual refraction (MR-EVAsuppl score) was performed to determine test-retest variability. In 878 eyes of 456 study participants, the median MR-EVA score was 74 (Snellen equivalent, approximately 20/32). The spherical equivalent was often similar for manual refraction and autorefraction (median difference, 0.00; 5th-95th percentile range, -1.75 to 1.13 diopters). However, on average, the MR-EVA scores were slightly better than the AR-EVA scores, across the entire VA range. Furthermore, the variability between the AR-EVA scores and the MR-EVA scores was substantially greater than the test-retest variability of the MR-EVA scores (P < .001). The variability of differences was highly dependent on the autorefractor model. Across a wide range of VAs at multiple sites using a variety of autorefractors, VA measurements tend to be worse with autorefraction than manual refraction. Differences between individual autorefractor models were identified. However, even among autorefractor models that compare most favorably with manual refraction, VA variability between autorefraction and manual refraction is higher than the test-retest variability of manual refraction. The results suggest that, with current instruments, autorefraction is not an acceptable substitute for manual refraction for most clinical trials with primary outcomes dependent on best-corrected VA.
Inter-Rater and Test-Retest Reliability of the Beery VMI in Schoolchildren
Harvey, Erin M.; Leonard-Green, Tina K.; Mohan, Kathleen M.; Kulp, Marjean Taylor; Davis, Amy L.; Miller, Joseph M.; Twelker, J. Daniel; Campus, Irene; Dennis, Leslie K.
2017-01-01
Purpose To assess inter-rater and test-retest reliability of the 6th Edition Beery-Buktenica Developmental Test of Visual-Motor Integration (VMI) and test-retest reliability of the VMI Visual Perception Supplemental Test (VMIp) in school-age children. Methods Subjects were 163 Native American 3rd – 8th grade students with no significant refractive error (astigmatism < 1.00 D, myopia: < 0.75 D, hyperopia: < 2.50 D, anisometropia < 1.50 D) or ocular abnormalities. The VMI and VMIp were administered twice, on separate days. All VMI tests were scored by two trained scorers and a subset of 50 tests were also scored by an experienced scorer. Scorers strictly applied objective scoring criteria. Analyses included inter-rater and test-retest assessments of bias, 95% limits of agreement, and intraclass correlation analysis. Results Trained scorers had no significant scoring bias compared to the experienced scorer. One of the two trained scorers tended to provide higher scores than the other (mean difference in standardized scores = 1.54). Inter-rater correlations were strong (0.75 to 0.88). VMI and VMIp test-retest comparisons indicated no significant bias (subjects did not tend to score better on retest). Test-retest correlations were moderate (0.54 to 0.58). The 95% LOAs for the VMI were −24.14 to 24.67 (scorer 1) and −26.06 to 26.58 (scorer 2) and the 95% LOAs for the VMIp were −27.11 to 27.34. Conclusions The 95% LOA for test-retest differences will be useful for determining if the VMI and VMIp have sufficient sensitivity for detecting change with treatment in both clinical and research settings. Further research on test-retest reliability reporting 95% LOAs for children across different age ranges are recommended, particularly if the test is to be used to detect changes due to intervention or treatment. PMID:28422801
Long-term stability of the Wechsler Intelligence Scale for Children--Fourth Edition.
Watkins, Marley W; Smith, Lourdes G
2013-06-01
Long-term stability of the Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV; Wechsler, 2003) was investigated with a sample of 344 students from 2 school districts twice evaluated for special education eligibility at an average interval of 2.84 years. Test-retest reliability coefficients for the Verbal Comprehension Index (VCI), Perceptual Reasoning Index (PRI), Working Memory Index (WMI), Processing Speed Index (PSI), and the Full Scale IQ (FSIQ) were .72, .76, .66, .65, and .82, respectively. As predicted, the test-retest reliability coefficients for the subtests (Mdn = .56) were generally lower than the index scores (Mdn = .69) and the FSIQ (.82). On average, subtest scores did not differ by more than 1 point, and index scores did not differ by more than 2 points across the test-retest interval. However, 25% of the students earned FSIQ scores that differed by 10 or more points, and 29%, 39%, 37%, and 44% of the students earned VCI, PRI, WMI, and PSI scores, respectively, that varied by 10 or more points. Given this variability, it cannot be assumed that WISC-IV scores will be consistent across long test-retest intervals for individual students. PsycINFO Database Record (c) 2013 APA, all rights reserved.
How Do Students Experience Testing on the University Computer?
ERIC Educational Resources Information Center
Whittington, Dale; And Others
1995-01-01
Reports a study of the administration mode, scores, and testing experiences of students taking the PreProfessional Skills Test (PPST) under differing conditions (computer based and paper and pencil). PPST scores and surveys of the students revealed varied test-taking strategies and computer-related alterations in test difficulty, construct,…
34 CFR 668.144 - Application for test approval.
Code of Federal Regulations, 2010 CFR
2010-07-01
... the comparability of scores on the current test to scores on the previous test, and data from validity... explanation of the methodology and procedures for measuring the reliability of the test; (ii) Evidence that different forms of the test, including, if applicable, short forms, are comparable in reliability; (iii...
The development of a test of biodiversity knowledge of high school students
NASA Astrophysics Data System (ADS)
Ajayi, Olabisi Modupe
2002-09-01
The primary purpose of this study was to develop a valid and reliable test of the knowledge of biodiversity of high school students. The test differentiated students' knowledge on three levels of biodiversity: species, ecosystem and genetics. A secondary purpose was to examine how biodiversity scores were affected by gender, grade point average, and families' socioeconomic status. The initial phase of the instrument development involved the construction of 60 dichotomous items (true/false). To establish content validity, a panel of biodiversity experts reviewed the items for appropriateness and clarity. The items were checked for readability using Flesch-Kincaid Readability Index and the readability was at the fifth grade level. The instrument was subjected to factor analysis. As a result, the final instrument was compiled and named the Ajayi Biodiversity Instrument (ABI). The reliability of ABI was .87. The mean score on the 25-item test was 79%. No significant difference at >0.05 was found in the score of students on each of the three subtests for genetics, species, and ecosystem. No significant difference was found in the score of students relative to their family's socioeconomic status. There was a significant correlation between grade point average and participation in extracurricular activities that related to biodiversity concepts and scores on ABI. Gender differences emerged at the ecosystem level, females scoring higher than males. Differences among ethnic groups also emerged. Anglo-Americans scored significantly higher on the test of knowledge of biodiversity for high school students than the rest of the ethnic groups combined.
ERIC Educational Resources Information Center
Kon, Jane Heckley; Martin-Kniep, Giselle O.
1992-01-01
Describes a case study to determine whether performance tests are a feasible alternative to multiple-choice tests. Examines the difficulties of administering and scoring performance assessments. Explains that the study employed three performance tests and one multiple-choice test. Concludes that performance test administration and scoring was no…
The video-based test of communication skills: description, development, and preliminary findings.
Mazor, Kathleen M; Haley, Heather-Lyn; Sullivan, Kate; Quirk, Mark E
2007-01-01
The importance of assessing physician-patient communication skills is widely recognized, but assessment methods are limited. Objective structured clinical examinations are time-consuming and resource intensive. For practicing physicians, patient surveys may be useful, but these also require substantial resources. Clearly, it would be advantageous to develop alternative or supplemental methods for assessing communication skills of medical students, residents, and physicians. The Video-based Test of Communication Skills (VTCS) is an innovative, computer-administered test, consisting of 20 very short video vignettes. In each vignette, a patient makes a statement or asks a question. The examinee responds verbally, as if it was a real encounter and he or she were the physician. Responses are recorded for later scoring. Test administration takes approximately 1 h. Generalizability studies were conducted, and scores for two groups of physicians predicted to differ in their communication skills were compared. Preliminary results are encouraging; the estimated g coefficient for the communication score for 20-vignette test (scored by five raters) is 0.79; g for the personal/affective score under the same conditions is 0.62. Differences between physicians were in the predicted direction, with physicians considered "at risk" for communication difficulties scoring lower than those not so identified. The VTCS is a short, portable test of communication skills. Results reported here suggest that scores reflect differences in skill levels and are generalizable. However, these findings are based on very small sample sizes and must be considered preliminary. Additional work is required before it will be possible to argue confidently that this test in particular, and this approach to testing communication skills in general, is valuable and likely to make a substantial contribution to assessment in medical education.
How Changes in Families and Schools Are Related to Trends in Black-White Test Scores
ERIC Educational Resources Information Center
Berends, Mark; Lucas, Samuel R.; Penaloza, Roberto V.
2008-01-01
Through several decades of research, a great deal has been written about trends in black-white test scores and the factors that may explain the gaps in different subject areas. Only a few studies have examined the changing relationships between gaps in students' test scores and family and school measures in nationally representative data over…
Marchick, Michael R; Setteducato, Michael L; Revenis, Jesse J; Robinson, Matthew A; Weeks, Emily C; Payton, Thomas F; Winchester, David E; Allen, Brandon R
2017-09-01
The History, Electrocardiography, Age, Risk factors, Troponin (HEART) score enables rapid risk stratification of emergency department patients presenting with chest pain. However, the subjectivity in scoring introduced by the history component has been criticized by some clinicians. We examined the association of 3 objective scoring models with the results of noninvasive cardiac testing. Medical records for all patients evaluated in the chest pain center of an academic medical center during a 1-year period were reviewed retrospectively. Each patient's history component score was calculated using 3 models developed by the authors. Differences in the distribution of HEART scores for each model, as well as their degree of agreement with one another, as well as the results of cardiac testing were analyzed. Seven hundred forty nine patients were studied, 58 of which had an abnormal stress test or computed tomography coronary angiography. The mean HEART scores for models 1, 2, and 3 were 2.97 (SD 1.17), 2.57 (SD 1.25), and 3.30 (SD 1.35), respectively, and were significantly different (P < 0.001). However, for each model, the likelihood of an abnormal cardiovascular test did not correlate with higher scores on the symptom component of the HEART score (P = 0.09, 0.41, and 0.86, respectively). While the objective scoring models produced different distributions of HEART scores, no model performed well with regards to identifying patients with abnormal advanced cardiac studies in this relatively low-risk cohort. Further studies in a broader cohort of patients, as well as comparison with the performance of subjective history scoring, is warranted before adoption of any of these objective models.
Paap, Kenneth R; Sawi, Oliver
2016-12-01
Studies testing for individual or group differences in executive functioning can be compromised by unknown test-retest reliability. Test-retest reliabilities across an interval of about one week were obtained from performance in the antisaccade, flanker, Simon, and color-shape switching tasks. There is a general trade-off between the greater reliability of single mean RT measures, and the greater process purity of measures based on contrasts between mean RTs in two conditions. The individual differences in RT model recently developed by Miller and Ulrich was used to evaluate the trade-off. Test-retest reliability was statistically significant for 11 of the 12 measures, but was of moderate size, at best, for the difference scores. The test-retest reliabilities for the Simon and flanker interference scores were lower than those for switching costs. Standard practice evaluates the reliability of executive-functioning measures using split-half methods based on data obtained in a single day. Our test-retest measures of reliability are lower, especially for difference scores. These reliability measures must also take into account possible day effects that classical test theory assumes do not occur. Measures based on single mean RTs tend to have acceptable levels of reliability and convergent validity, but are "impure" measures of specific executive functions. The individual differences in RT model shows that the impurity problem is worse than typically assumed. However, the "purer" measures based on difference scores have low convergent validity that is partly caused by deficiencies in test-retest reliability. Copyright © 2016 Elsevier B.V. All rights reserved.
Predictive effects of teachers and schools on test scores, college attendance, and earnings
Chamberlain, Gary E.
2013-01-01
I studied predictive effects of teachers and schools on test scores in fourth through eighth grade and outcomes later in life such as college attendance and earnings. For example, predict the fraction of a classroom attending college at age 20 given the test score for a different classroom in the same school with the same teacher and given the test score for a classroom in the same school with a different teacher. I would like to have predictive effects that condition on averages over many classrooms, with and without the same teacher. I set up a factor model that, under certain assumptions, makes this feasible. Administrative school district data in combination with tax data were used to calculate estimates and do inference. PMID:24101492
Dividing the Force Concept Inventory into two equivalent half-length tests
NASA Astrophysics Data System (ADS)
Han, Jing; Bao, Lei; Chen, Li; Cai, Tianfang; Pi, Yuan; Zhou, Shaona; Tu, Yan; Koenig, Kathleen
2015-06-01
The Force Concept Inventory (FCI) is a 30-question multiple-choice assessment that has been a building block for much of the physics education research done today. In practice, there are often concerns regarding the length of the test and possible test-retest effects. Since many studies in the literature use the mean score of the FCI as the primary variable, it would be useful then to have different shorter tests that can produce FCI-equivalent scores while providing the benefits of being quicker to administer and overcoming the test-retest effects. In this study, we divide the 1995 version of the FCI into two half-length tests; each contains a different subset of the original FCI questions. The two new tests are shorter, still cover the same set of concepts, and produce mean scores equivalent to those of the FCI. Using a large quantitative data set collected at a large midwestern university, we statistically compare the assessment features of the two half-length tests and the full-length FCI. The results show that the mean error of equivalent scores between any two of the three tests is within 3%. Scores from all tests are well correlated. Based on the analysis, it appears that the two half-length tests can be a viable option for score based assessment that need to administer tests quickly or need to measure short-term gains where using identical pre- and post-test questions is a concern.
Evidence-based practice knowledge, attitudes, and practice of online graduate nursing students.
Rojjanasrirat, Wilaiporn; Rice, Jan
2017-06-01
This study aimed to evaluate changes in evidence-based practice (EBP) knowledge, attitudes, and practice of nursing students before and after completing an online, graduate level, introductory research/EBP course. A prospective one-group pretest-posttest design. A private university in the Midwestern, USA. Sixty-three online nurse practitioner students in Master's program. A convenient sample of online graduate nursing students who enrolled in the research/EBP course was invited to participate in the study. Study outcomes were measured using the Evidence-Based Practice Questionnaire (EBPQ) before and after completing the course. Descriptive statistics and paired-Samples t-test was used to assess the mean differences between pre-and post-test scores. Overall, students' post-test EBP scores were significantly improved over pre-test scores, t(63)=-9.034, p<0.001). Statistically significant differences were found for practice of EBP mean scores t(63)=-12.78, p=0.001). No significant differences were found between pre and post-tests on knowledge and attitudes toward EBP scores. Most frequently cited barriers to EBP were lack of understanding of statistics, interpretation of findings, lack of time, and lack of library resources. Copyright © 2017 Elsevier Ltd. All rights reserved.
Winegarden, Babbi; Glaser, Dale; Schwartz, Alan; Kelly, Carolyn
2012-09-01
Medical College Admission Test (MCAT) scores are widely used as part of the decision-making process for selecting candidates for admission to medical school. Applicants who learned English as a second language may be at a disadvantage when taking tests in their non-native language. Preliminary research found significant differences between English language learners (ELLs), applicants who learned English after the age of 11 years, and non-ELL examinees on the Verbal Reasoning (VR) sub-test of the MCAT. The purpose of this study was to determine if relationships between VR sub-test scores and measures of medical school performance differed between ELL and non-ELL students. Scores on the MCAT VR sub-test and student performance outcomes (grades, examination scores, and markers of distinction and difficulty) were extracted from University of California San Diego School of Medicine admissions files and the Association of American Medical Colleges database for 924 students who matriculated in 1998-2005 (graduation years 2002-2009). Regression models were fitted to determine whether MCAT VR sub-test scores predicted medical school performance similarly for ELLs and non-ELLs. For several outcomes, including pre-clerkship grades, academic distinction, US Medical Licensing Examination Step 2 Clinical Knowledge scores and two clerkship shelf examinations, ELL status significantly affects the ability of the VR score to predict performance. Higher correlations between VR score and medical school performance emerged for non-ELL students than for ELL students for each of these outcomes. The MCAT VR score should be used with discretion when assessing ELL applicants for admission to medical school. © Blackwell Publishing Ltd 2012.
Mungkhetklang, Chantanee; Bavin, Edith L.; Crewther, Sheila G.; Goharpey, Nahal; Parsons, Carl
2016-01-01
It is usually assumed that performance on non-verbal intelligence tests reflects visual cognitive processing and that aspects of working memory (WM) will be involved. However, the unique contribution of memory to non-verbal scores is not clear, nor is the unique contribution of vocabulary. Thus, we aimed to investigate these contributions. Non-verbal test scores for 17 individuals with intellectual disability (ID) and 39 children with typical development (TD) of similar mental age were compared to determine the unique contribution of visual and verbal short-term memory (STM) and WM and the additional variance contributed by vocabulary scores. No significant group differences were found in the non-verbal test scores or receptive vocabulary scores, but there was a significant difference in expressive vocabulary. Regression analyses indicate that for the TD group STM and WM (both visual and verbal) contributed similar variance to the non-verbal scores. For the ID group, visual STM and verbal WM contributed most of the variance to the non-verbal test scores. The addition of vocabulary scores to the model contributed greater variance for both groups. More unique variance was contributed by vocabulary than memory for the TD group, whereas for the ID group memory contributed more than vocabulary. Visual and auditory memory and vocabulary contributed significantly to solving visual non-verbal problems for both the TD group and the ID group. However, for each group, there were different weightings of these variables. Our findings indicate that for individuals with TD, vocabulary is the major factor in solving non-verbal problems, not memory, whereas for adolescents with ID, visual STM, and verbal WM are more influential than vocabulary, suggesting different pathways to achieve solutions to non-verbal problems. PMID:28082922
Binetruy, M; Mauny, F; Lavaux, M; Meyer, A; Sylvestre, G; Puyraveau, M; Berger, E; Magnin, E; Vandel, P; Galmiche, J; Chopard, G
Cognitive evaluation of young subjects is now widely carried out for non-traumatic diseases such as multiple sclerosis, HIV, or sleep disorders. This evaluation requires normative data based on healthy adult samples. However, most clinicians use a set of tests that were normed in an isolated manner from different samples using different cutoff criteria. Thus, the score of an individual may be considered either normal or impaired according to the norms used. It is well established that healthy adults obtained low-test scores when a battery of tests is administered. Thus, the knowledge of low base rates is required so as to minimize false diagnosis of cognitive impairment. The aim of this study was twofold (1) to provide normative data for RAPID-II battery in healthy adults, and (2) estimate the proportion of healthy adults having low scores across this battery. Norms for the 44 test scores of the RAPID-II test battery were developed using the overall sample of 335 individuals based on three categories of age (20 to 29, 30 to 39, and 40 to 49 years) and two educational levels: Baccalaureate or higher educational degree (high educational level), lower than baccalaureate (low educational level). The 5th, 25th, 50th, and 75th percentiles were calculated from the six age and education subsamples and used to define norms. The frequency of low scores on the RAPID-II battery was calculated by simultaneously examining the performance of 33 primary scores. A low score was defined as less than or equal to the 5th percentile drawn from the six age and education normative subsamples. In addition, the percentages of low scores were also determined when all possible combinations of two-test scores across the RAPID-II were considered in the overall normative sample. Our data showed that 59.4% subjects of the normative sample obtained at least one or more low score. With more than 9 test scores, this percentage was equal to 0% in the normative sample. Among all combinations of two-test scores, 96% had a false positive rate<2%. Low scores are very common in young healthy subjects and are more obvious when simultaneously analyzing test scores across a battery of tests and are thus not necessarily indicative of cognitive impairment. The combinations of two-test scores can be a useful tool to improve the interpretation of low scores. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Online pre-race education improves test scores for volunteers at a marathon.
Maxwell, Shane; Renier, Colleen; Sikka, Robby; Widstrom, Luke; Paulson, William; Christensen, Trent; Olson, David; Nelson, Benjamin
2017-09-01
This study examined whether an online course would lead to increased knowledge about the medical issues volunteers encounter during a marathon. Health care professionals who volunteered to provide medical coverage for an annual marathon were eligible for the study. Demographic information about medical volunteers including profession, specialty, education level and number of marathons they had volunteered for was collected. A 15-question test about the most commonly encountered medical issues was created by the authors and administered before and after the volunteers took the online educational course and compared to a pilot study the previous year. Seventy-four subjects completed the pre-test. Those who participated in the pilot study last year (N = 15) had pre-test scores that were an average of 2.4 points higher than those who did not (mean ranks: pilot study = 51.6 vs. non-pilot = 33.9, p = 0.004). Of the 74 subjects who completed the pre-test, 54 also completed the post-test. The overall post-pre mean score difference was 3.8 ± 2.7 (t = 10.5 df = 53 p < 0.001). While subjects with all levels of volunteer experience demonstrated improvement, only change among first time marathon volunteers was significantly different from the others. Subjects reporting all degree/certification levels demonstrated improvement, but no difference in improvement was found between degree/certification levels. In this follow-up to the previous year's pilot study, online education demonstrated a long-term (one-year) increase in test scores. Testing also continued to show short-term improvement in post-course test scores, compared to pre-course test scores. In general, marathon medical volunteers who had no volunteer experience demonstrated greater improvement than those who had prior volunteer experience.
Childhood overweight and academic performance: national study of kindergartners and first-graders.
Datar, Ashlesha; Sturm, Roland; Magnabosco, Jennifer L
2004-01-01
To examine the association between children's overweight status in kindergarten and their academic achievement in kindergarten and first grade. The data analyzed consisted of 11,192 first time kindergartners from the Early Childhood Longitudinal Study, a nationally representative sample of kindergartners in the U.S. in 1998. Multivariate regression techniques were used to estimate the independent association of overweight status with children's math and reading standardized test scores in kindergarten and grade 1. We controlled for socioeconomic status, parent-child interaction, birth weight, physical activity, and television watching. Overweight children had significantly lower math and reading test scores compared with non-overweight children in kindergarten. Both groups were gaining similarly on math and reading test scores, resulting in significantly lower test scores among overweight children at the end of grade 1. However, these differences, except for boys' math scores at baseline (difference = 1.22 points, p = 0.001), became insignificant after including socioeconomic and behavioral variables, indicating that overweight is a marker but not a causal factor. Race/ethnicity and mother's education were stronger predictors of test score gains or levels than overweight status. Significant differences in test scores by overweight status at the beginning of kindergarten and the end of grade 1 can be explained by other individual characteristics, including parental education and the home environment. However, overweight is more easily observable by other students compared with socioeconomic characteristics, and its significant (unadjusted) association with worse academic performance can contribute to the stigma of overweight as early as the first years of elementary school.
Thrall, Grace C; Coverdale, John H; Benjamin, Sophiya; Wiggins, Anna; Lane, Christianne Joy; Pato, Michele T
2016-10-01
This goal of this study was to evaluate the efficacy of team-based learning (TBL) on knowledge retention compared to traditional lectures with small break-out group discussion (teaching as usual (TAU)) using a randomized controlled trial. This randomized controlled trial was conducted during a daylong conference for psychiatric educators on attention-deficit hyperactivity disorder and the research literacy topic of efficacy versus effectiveness trials. Learners (n = 115) were randomized with concealed allocation to either TBL or TAU. Knowledge was measured prior to the intervention, immediately afterward, and 2 months later via multiple-choice tests. Participants were necessarily unblinded. Data enterers, data analysts, and investigators were blinded to group assignment in data analysis. Per-protocol analyses of test scores were performed using change in knowledge from baseline. The primary endpoint was test scores at 2 months. At baseline, there were no statistically significant differences between groups in pre-test knowledge. At immediate post-test, both TBL and TAU groups showed improved knowledge scores compared with their baseline scores. The TBL group performed better statistically on the immediate post-test than the TAU group (Cohen's d = 0.73; p < 0.001), although the differences in knowledge scores were not educationally meaningful, averaging just one additional test question correct (out of 15). On the 2-month remote post-test, there were no group differences in knowledge retention among the 42 % of participants who returned the 2-month test. Both TBL and TAU learners acquired new knowledge at the end of the intervention and retained knowledge over 2 months. At the end of the intervention day and after 2 months, knowledge test scores were not meaningfully different between TBL and TAU completers. In conclusion, this study failed to demonstrate the superiority of TBL over TAU on the primary outcome of knowledge retention at 2 months post-intervention.
ERIC Educational Resources Information Center
Zimmerman, Donald W.
2012-01-01
In order to circumvent the influence of correlation in paired-samples and repeated measures experimental designs, researchers typically perform a one-sample Student "t" test on difference scores. That procedure entails some loss of power, because it employs N - 1 degrees of freedom instead of the 2N - 2 degrees of freedom of the…
An Empirical Comparison of Two-Stage and Pyramidal Adaptive Ability Testing.
ERIC Educational Resources Information Center
Larkin, Kevin C.; Weiss, David J.
A 15-stage pyramidal test and a 40-item two-stage test were constructed and administered by computer to 111 college undergraduates. The two-stage test was found to utilize a smaller proportion of its potential score range than the pyramidal test. Score distributions for both tests were positively skewed but not significantly different from the…
Developmental Eye Movement (DEM) Test Norms for Mandarin Chinese-Speaking Chinese Children.
Xie, Yachun; Shi, Chunmei; Tong, Meiling; Zhang, Min; Li, Tingting; Xu, Yaqin; Guo, Xirong; Hong, Qin; Chi, Xia
2016-01-01
The Developmental Eye Movement (DEM) test is commonly used as a clinical visual-verbal ocular motor assessment tool to screen and diagnose reading problems at the onset. No established norm exists for using the DEM test with Mandarin Chinese-speaking Chinese children. This study aims to establish the normative values of the DEM test for the Mandarin Chinese-speaking population in China; it also aims to compare the values with three other published norms for English-, Spanish-, and Cantonese-speaking Chinese children. A random stratified sampling method was used to recruit children from eight kindergartens and eight primary schools in the main urban and suburban areas of Nanjing. A total of 1,425 Mandarin Chinese-speaking children aged 5 to 12 years took the DEM test in Mandarin Chinese. A digital recorder was used to record the process. All of the subjects completed a symptomatology survey, and their DEM scores were determined by a trained tester. The scores were computed using the formula in the DEM manual, except that the "vertical scores" were adjusted by taking the vertical errors into consideration. The results were compared with the three other published norms. In our subjects, a general decrease with age was observed for the four eye movement indexes: vertical score, adjusted horizontal score, ratio, and total error. For both the vertical and adjusted horizontal scores, the Mandarin Chinese-speaking children completed the tests much more quickly than the norms for English- and Spanish-speaking children. However, the same group completed the test slightly more slowly than the norms for Cantonese-speaking children. The differences in the means were significant (P<0.001) in all age groups. For several ages, the scores obtained in this study were significantly different from the reported scores of Cantonese-speaking Chinese children (P<0.005). Compared with English-speaking children, only the vertical score of the 6-year-old group, the vertical-horizontal time ratio of the 8-year-old group and the errors of 9-year-old group had no significant difference (P>0.05); compared with Spanish-speaking children, the scores were statistically significant (P<0.001) for the total error scores of the age groups, except the 6-, 9-, 10-, and 11-year-old age groups (P>0.05). DEM norms may be affected by differences in language, cultural, and educational systems among various ethnicities. The norms of the DEM test are proposed for use with Mandarin Chinese-speaking children in Nanjing and will be proposed for children throughout China.
Leight, Hayley; Saunders, Cheston; Calkins, Robin; Withers, Michelle
2012-01-01
Collaborative testing has been shown to improve performance but not always content retention. In this study, we investigated whether collaborative testing could improve both performance and content retention in a large, introductory biology course. Students were semirandomly divided into two groups based on their performances on exam 1. Each group contained equal numbers of students scoring in each grade category (“A”–“F”) on exam 1. All students completed each of the four exams of the semester as individuals. For exam 2, one group took the exam a second time in small groups immediately following the individually administered test. The other group followed this same format for exam 3. Individual and group exam scores were compared to determine differences in performance. All but exam 1 contained a subset of cumulative questions from the previous exam. Performances on the cumulative questions for exams 3 and 4 were compared for the two groups to determine whether there were significant differences in content retention. Even though group test scores were significantly higher than individual test scores, students who participated in collaborative testing performed no differently on cumulative questions than students who took the previous exam as individuals. PMID:23222835
Training improves laparoscopic tasks performance and decreases operator workload.
Hu, Jesse S L; Lu, Jirong; Tan, Wee Boon; Lomanto, Davide
2016-05-01
It has been postulated that increased operator workload during task performance may increase fatigue and surgical errors. The National Aeronautics and Space Administration-Task Load Index (NASA-TLX) is a validated tool for self-assessment for workload. Our study aims to assess the relationship of workload and performance of novices in simulated laparoscopic tasks of different complexity levels before and after training. Forty-seven novices without prior laparoscopic experience were recruited in a trial to investigate whether training improves task performance as well as mental workload. The participants were tested on three standard tasks (ring transfer, precision cutting and intracorporeal suturing) in increasing complexity based on the Fundamentals of Laparoscopic Surgery (FLS) curriculum. Following a period of training and rest, participants were tested again. Test scores were computed from time taken and time penalties for precision errors. Test scores and NASA-TLX scores were recorded pre- and post-training and analysed using paired t tests. One-way repeated measures ANOVA was used to analyse differences in NASA-TLX scores between the three tasks. NASA-TLX score was lowest with ring transfer and highest with intracorporeal suturing. This was statistically significant in both pre-training (p < 0.001) and post-training (p < 0.001). NASA-TLX scores mirror the changes in test scores for the three tasks. Workload scores decreased significantly after training for all three tasks (ring transfer = 2.93, p < 0.001, precision cutting = 3.74, p < 0.001, intracorporeal suturing = 2.98, p < 0.001). NASA-TLX score is an accurate reflection of the complexity of simulated laparoscopic tasks in the FLS curriculum. This also correlates with the relationship of test scores between the three tasks. Simulation training improves both performance score and workload score across the tasks.
Diabetes and Cognitive Decline in Older Adults: The Ginkgo Evaluation of Memory Study.
Palta, Priya; Carlson, Michelle C; Crum, Rosa M; Colantuoni, Elizabeth; Sharrett, A Richey; Yasar, Sevil; Nahin, Richard L; DeKosky, Steven T; Snitz, Beth; Lopez, Oscar; Williamson, Jeff D; Furberg, Curt D; Rapp, Stephen R; Golden, Sherita Hill
2017-12-12
Previous studies have shown that individuals with diabetes exhibit accelerated cognitive decline. However, methodological limitations have limited the quality of this evidence. Heterogeneity in study design, cognitive test administration, and methods of analysis of cognitive data have made it difficult to synthesize and translate findings to practice. We analyzed longitudinal data from the Ginkgo Evaluation of Memory Study to test our hypothesis that older adults with diabetes have greater test-specific and domain-specific cognitive declines compared to older adults without diabetes. Tests of memory, visuo-spatial construction, language, psychomotor speed, and executive function were administered. Test scores were standardized to z-scores and averaged to yield domain scores. Linear random effects models were used to compare baseline differences and changes over time in test and domain scores among individuals with and without diabetes. Among the 3,069 adults, aged 72-96 years, 9.3% reported diabetes. Over a median follow-up of 6.1 years, participants with diabetes exhibited greater baseline differences in a test of executive function (trail making test, Part B) and greater declines in a test of language (phonemic verbal fluency). For the composite cognitive domain scores, participants with diabetes exhibited lower baseline executive function and global cognition domain scores, but no significant differences in the rate of decline. Identifying cognitive domains most affected by diabetes can lead to targeted risk modification, possibly in the form of lifestyle interventions such as diet and physical activity, which we know to be beneficial for improving vascular risk factors, such as diabetes, and therefore may reduce the risk of executive dysfunction and possible dementia. © The Author 2017. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Temple, V; Drummond, C; Valiquette, S; Jozsvai, E
2010-06-01
Video conferencing (VC) technology has great potential to increase accessibility to healthcare services for those living in rural or underserved communities. Previous studies have had some success in validating a small number of psychological tests for VC administration; however, VC has not been investigated for use with persons with intellectual disabilities (ID). A comparison of test results for two well known and widely used assessment instruments was undertaken to establish if scores for VC administration would differ significantly from in-person assessments. Nineteen individuals with ID aged 23-63 were assessed once in-person and once over VC using the Wechsler Abbreviated Scale of Intelligence (WASI) and the Beery-Buktenica Test of Visual-Motor Integration (VMI). Highly similar results were found for test scores. Full-scale IQ on the WASI and standard scores for the VMI were found to be very stable across the two administration conditions, with a mean difference of less than one IQ point/standard score. Video conferencing administration does not appear to alter test results significantly for overall score on a brief intelligence test or a test of visual-motor integration.
Evaluating Pekin duck walking ability using a treadmill performance test.
Byrd, C J; Main, R P; Makagon, M M
2016-10-01
Gait scoring is the most popular method for assessing the walking ability of poultry species. Although inexpensive and easy to implement, gait scoring systems are often criticized for being subjective. Using a treadmill performance test we assessed whether observable differences in Pekin duck walking ability identified using a gait scoring system translated to differences in walking performance. One hundred and eighty ducks were selected using a three-category gait scoring system (GS0 = smooth gait, n = 55; GS0.5 = labored walk without easily identifiable impediment, n = 56; GS1 = obvious impediment, n = 59) and the amount of time each duck was able to sustain walking on a treadmill at a speed of 0.31 m/s was evaluated. The walking test ended when each duck met one of three elimination criteria: (1) The duck walked for a maximum time of ten minutes, (2) the duck required support from the observer's hand for more than three seconds in order to continue walking on the treadmill, or (3) the duck sat down on the treadmill and made no attempt to stand despite receiving assistance from the observer. Data were analyzed in SAS 9.4 using PROC GLM. Tukey's multiple comparison test was used to compare differences in time spent walking between gait scores. Significant differences were found between all gait scores (P < 0.05). Behavioral correlates of walking performance were investigated. Video recorded during the treadmill test was analyzed for counts of sitting, standing, and leaning behaviors. Data were analyzed in SAS 9.4 using a negative binomial model for count data. No differences were found between gait scores for counts of sitting, standing, and leaning behaviors (P > 0.05). In conclusion, the amount of time spent walking on the treadmill corresponded to gait score and was an effective measurement for quantifying Pekin duck walking ability. The test could be a valuable tool for assessing the development of walking issues or the effectiveness of treatments aimed at promoting leg health. © 2016 Poultry Science Association Inc.
Basagni, Benedetta; Luzzatti, Claudio; Navarrete, Eduardo; Caputo, Marina; Scrocco, Gessica; Damora, Alessio; Giunchi, Laura; Gemignani, Paola; Caiazzo, Annarita; Gambini, Maria Grazia; Avesani, Renato; Mancuso, Mauro; Trojano, Luigi; De Tanti, Antonio
2017-04-01
Verbal reasoning is a complex, multicomponent function, which involves activation of functional processes and neural circuits distributed in both brain hemispheres. Thus, this ability is often impaired after brain injury. The aim of the present study is to describe the construction of a new verbal reasoning test (VRT) for patients with brain injury and to provide normative values in a sample of healthy Italian participants. Three hundred and eighty healthy Italian subjects (193 women and 187 men) of different ages (range 16-75 years) and educational level (primary school to postgraduate degree) underwent the VRT. VRT is composed of seven subtests, investigating seven different domains. Multiple linear regression analysis revealed a significant effect of age and education on the participants' performance in terms of both VRT total score and all seven subtest scores. No gender effect was found. A correction grid for raw scores was built from the linear equation derived from the scores. Inferential cut-off scores were estimated using a non-parametric technique, and equivalent scores were computed. We also provided a grid for the correction of results by z scores.
Do racial and ethnic group differences in performance on the MCAT exam reflect test bias?
Davis, Dwight; Dorsey, J Kevin; Franks, Ronald D; Sackett, Paul R; Searcy, Cynthia A; Zhao, Xiaohui
2013-05-01
The Medical College Admission Test (MCAT) is a standardized examination that assesses fundamental knowledge of scientific concepts, critical reasoning ability, and written communication skills. Medical school admission officers use MCAT scores, along with other measures of academic preparation and personal attributes, to select the applicants they consider the most likely to succeed in medical school. In 2008-2011, the committee charged with conducting a comprehensive review of the MCAT exam examined four issues: (1) whether racial and ethnic groups differ in mean MCAT scores, (2) whether any score differences are due to test bias, (3) how group differences may be explained, and (4) whether the MCAT exam is a barrier to medical school admission for black or Latino applicants. This analysis showed that black and Latino examinees' mean MCAT scores are lower than white examinees', mirroring differences on other standardized admission tests and in the average undergraduate grades of medical school applicants. However, there was no evidence that the MCAT exam is biased against black and Latino applicants as determined by their subsequent performance on selected medical school performance indicators. Among other factors which could contribute to mean differences in MCAT performance, whites, blacks, and Latinos interested in medicine differ with respect to parents' education and income. Admission data indicate that admission committees accept majority and minority applicants at similar rates, which suggests that medical students are selected on the basis of a combination of attributes and competencies rather than on MCAT scores alone.
A Comparison of Student Understanding of Seasons Using Inquiry and Didactic Teaching Methods
NASA Astrophysics Data System (ADS)
Ashcraft, Paul G.
2006-02-01
Student performance on open-ended questions concerning seasons in a university physical science content course was examined to note differences between classes that experienced inquiry using a 5-E lesson planning model and those that experienced the same content with a traditional, didactic lesson. The class examined is a required content course for elementary education majors and understanding the seasons is part of the university's state's elementary science standards. The two self-selected groups of students showed no statistically significant differences in pre-test scores, while there were statistically significant differences between the groups' post-test scores with those who participated in inquiry-based activities scoring higher. There were no statistically significant differences between the pre-test and the post-test for the students who experienced didactic teaching, while there were statistically significant improvements for the students who experienced the 5-E lesson.
ERIC Educational Resources Information Center
Liu, Jinghua; Sinharay, Sandip; Holland, Paul W.; Feigenbaum, Miriam; Curley, Edward
2009-01-01
This study explores the use of a different type of anchor, a "midi anchor", that has a smaller spread of item difficulties than the tests to be equated, and then contrasts its use with the use of a "mini anchor". The impact of different anchors on observed score equating were evaluated and compared with respect to systematic…
Woods, David L.; Wyma, John M.; Herron, Timothy J.; Yund, E. William
2017-01-01
Verbal learning tests (VLTs) are widely used to evaluate memory deficits in neuropsychiatric and developmental disorders. However, their validity has been called into question by studies showing significant differences in VLT scores obtained by different examiners. Here we describe the computerized Bay Area Verbal Learning Test (BAVLT), which minimizes inter-examiner differences by incorporating digital list presentation and automated scoring. In the 10-min BAVLT, a 12-word list is presented on three acquisition trials, followed by a distractor list, immediate recall of the first list, and, after a 30-min delay, delayed recall and recognition. In Experiment 1, we analyzed the performance of 195 participants ranging in age from 18 to 82 years. Acquisition trials showed strong primacy and recency effects, with scores improving over repetitions, particularly for mid-list words. Inter-word intervals (IWIs) increased with successive words recalled. Omnibus scores (summed over all trials except recognition) were influenced by age, education, and sex (women outperformed men). In Experiment 2, we examined BAVLT test-retest reliability in 29 participants tested with different word lists at weekly intervals. High intraclass correlation coefficients were seen for omnibus and acquisition scores, IWIs, and a categorization index reflecting semantic reorganization. Experiment 3 examined the performance of Experiment 2 participants when feigning symptoms of traumatic brain injury. Although 37% of simulated malingerers showed abnormal (p < 0.05) omnibus z-scores, z-score cutoffs were ineffective in discriminating abnormal malingerers from control participants with abnormal scores. In contrast, four malingering indices (recognition scores, primacy/recency effects, learning rate across acquisition trials, and IWIs) discriminated the two groups with 80% sensitivity and 80% specificity. Experiment 4 examined the performance of a small group of patients with mild or severe TBI. Overall, both patient groups performed within the normal range, although significant performance deficits were seen in some patients. The BAVLT improves the speed and replicability of verbal learning assessments while providing comprehensive measures of retrieval timing, semantic organization, and primacy/recency effects that clarify the nature of performance. PMID:28127280
Kariuki, Symon M; Abubakar, Amina; Newton, Charles R J C; Kihara, Michael
2014-09-16
Persistent neurocognitive impairments occur in a fifth of children hospitalized with severe falciparum malaria. There is little data on the association between different neurological phenotypes of severe malaria (seizures, impaired consciousness and prostration) and impairments in executive function. Executive functioning of children exposed to severe malaria with different neurological phenotypes (N = 58) and in those unexposed (N = 56) was examined using neuropsychological tests such as vigilance test, test for everyday attention test for children (TEA-Ch), contingency naming test (CNT) and self-ordered pointing test (SOPT). Linear regression was used to determine the association between neurological phenotypes of severe malaria and executive function performance scores, accounting for potential confounders. Children with complex seizures in severe malaria performed more poorly than unexposed controls in the vigilance (median efficiency scores (interquartile range) = 4.84 (1.28-5.68) vs. 5.84 (4.71-6.42), P = 0.030) and SOPT (mean errors (standard deviation) = 29.50 (8.82) vs. 24.80 (6.50), P = 0.029) tests, but no differences were observed in TEA-Ch and CNT tests. Performance scores for other neurological phenotypes of severe malaria were similar with those of unexposed controls. After accounting for potential confounders, such as child's age, sex, schooling; maternal age, schooling and economic activity; perinatal factors and history of seizures, complex seizures remained associated with efficiency scores in the vigilance test (beta coefficient (β) (95% confidence interval (CI)) = -0.40 (-0.67, -0.13), P = 0.006) and everyday attention scores of the TEA-Ch test (β (95% CI) = -0.57 (-1.04, -0.10), P = 0.019); the association with SOPT error scores was weak (β (95% CI) = 4.57 (-0.73-9.89), P = 0.089). Combined neurological phenotypes were not significantly associated with executive function performance scores. Executive function impairment in children with severe malaria is associated with specific neurological phenotypes, particularly complex seizures. Effective prophylaxis and management of malaria-associated acute seizures may improve executive functioning performance scores of children.
Bernabeu-Mora, Roberto; Medina-Mirapeix, Françesc; Llamazares-Herrán, Eduardo; García-Guillamón, Gloria; Giménez-Giménez, Luz María; Sánchez-Nieto, Juan Miguel
2015-01-01
Limited mobility is a risk factor for developing chronic obstructive pulmonary disease (COPD)-related disabilities. Little is known about the validity of the Short Physical Performance Battery (SPPB) for identifying mobility limitations in patients with COPD. To determine the clinical validity of the SPPB summary score and its three components (standing balance, 4-meter gait speed, and five-repetition sit-to-stand) for identifying mobility limitations in patients with COPD. This cross-sectional study included 137 patients with COPD, recruited from a hospital in Spain. Muscle strength tests and SPPB were measured; then, patients were surveyed for self-reported mobility limitations. The validity of SPPB scores was analyzed by developing receiver operating characteristic curves to analyze the sensitivity and specificity for identifying patients with mobility limitations; by examining group differences in SPPB scores across categories of mobility activities; and by correlating SPPB scores to strength tests. Only the SPPB summary score and the five-repetition sit-to-stand components showed good discriminative capabilities; both showed areas under the receiver operating characteristic curves greater than 0.7. Patients with limitations had significantly lower SPPB scores than patients without limitations in nine different mobility activities. SPPB scores were moderately correlated with the quadriceps test (r>0.40), and less correlated with the handgrip test (r<0.30), which reinforced convergent and divergent validities. A SPPB summary score cutoff of 10 provided the best accuracy for identifying mobility limitations. This study provided evidence for the validity of the SPPB summary score and the five-repetition sit-to-stand test for assessing mobility in patients with COPD. These tests also showed potential as a screening test for identifying patients with COPD that have mobility limitations.
Bernabeu-Mora, Roberto; Medina-Mirapeix, Françesc; Llamazares-Herrán, Eduardo; García-Guillamón, Gloria; Giménez-Giménez, Luz María; Sánchez-Nieto, Juan Miguel
2015-01-01
Background Limited mobility is a risk factor for developing chronic obstructive pulmonary disease (COPD)-related disabilities. Little is known about the validity of the Short Physical Performance Battery (SPPB) for identifying mobility limitations in patients with COPD. Objective To determine the clinical validity of the SPPB summary score and its three components (standing balance, 4-meter gait speed, and five-repetition sit-to-stand) for identifying mobility limitations in patients with COPD. Methods This cross-sectional study included 137 patients with COPD, recruited from a hospital in Spain. Muscle strength tests and SPPB were measured; then, patients were surveyed for self-reported mobility limitations. The validity of SPPB scores was analyzed by developing receiver operating characteristic curves to analyze the sensitivity and specificity for identifying patients with mobility limitations; by examining group differences in SPPB scores across categories of mobility activities; and by correlating SPPB scores to strength tests. Results Only the SPPB summary score and the five-repetition sit-to-stand components showed good discriminative capabilities; both showed areas under the receiver operating characteristic curves greater than 0.7. Patients with limitations had significantly lower SPPB scores than patients without limitations in nine different mobility activities. SPPB scores were moderately correlated with the quadriceps test (r>0.40), and less correlated with the handgrip test (r<0.30), which reinforced convergent and divergent validities. A SPPB summary score cutoff of 10 provided the best accuracy for identifying mobility limitations. Conclusion This study provided evidence for the validity of the SPPB summary score and the five-repetition sit-to-stand test for assessing mobility in patients with COPD. These tests also showed potential as a screening test for identifying patients with COPD that have mobility limitations. PMID:26664110
Goos, Matthias; Schubach, Fabian; Seifert, Gabriel; Boeker, Martin
2016-08-17
Health professionals often manage medical problems in critical situations under time pressure and on the basis of vague information. In recent years, dual process theory has provided a framework of cognitive processes to assist students in developing clinical reasoning skills critical especially in surgery due to the high workload and the elevated stress levels. However, clinical reasoning skills can be observed only indirectly and the corresponding constructs are difficult to measure in order to assess student performance. The script concordance test has been established in this field. A number of studies suggest that the test delivers a valid assessment of clinical reasoning. However, different scoring methods have been suggested. They reflect different interpretations of the underlying construct. In this work we want to shed light on the theoretical framework of script theory and give an idea of script concordance testing. We constructed a script concordance test in the clinical context of "acute abdomen" and compared previously proposed scores with regard to their validity. A test comprising 52 items in 18 clinical scenarios was developed, revised along the guidelines and administered to 56 4(th) and 5(th) year medical students at the end of a blended-learning seminar. We scored the answers using five different scoring methods (distance (2×), aggregate (2×), single best answer) and compared the scoring keys, the resulting final scores and Cronbach's α after normalization of the raw scores. All scores except the single best answers calculation achieved acceptable reliability scores (>= 0.75), as measured by Cronbach's α. Students were clearly distinguishable from the experts, whose results were set to a mean of 80 and SD of 5 by the normalization process. With the two aggregate scoring methods, the students' means values were between 62.5 (AGGPEN) and 63.9 (AGG) equivalent to about three expert SD below the experts' mean value (Cronbach's α : 0.76 (AGGPEN) and 0.75 (AGG)). With the two distance scoring methods the students' mean was between 62.8 (DMODE) and 66.8 (DMEAN) equivalent to about two expert SD below the experts' mean value (Cronbach's α: 0.77 (DMODE) and 0.79 (DMEAN)). In this study the single best answer (SBA) scoring key yielded the worst psychometric results (Cronbach's α: 0.68). Assuming the psychometric properties of the script concordance test scores are valid, then clinical reasoning skills can be measured reliably with different scoring keys in the SCT presented here. Psychometrically, the distance methods seem to be superior, wherein inherent statistical properties of the scales might play a significant role. For methodological reasons, the aggregate methods can also be used. Despite the limitations and complexity of the underlying scoring process and the calculation of reliability, we advocate for SCT because it allows a new perspective on the measurement and teaching of cognitive skills.
Ginn, Sheryl R; Pickens, Stefanie J
2005-06-01
Previous results suggested that female college students' scores on the Mental Rotations Test might be related to their prior experience with spatial tasks. For example, women who played video games scored better on the test than their non-game-playing peers, whereas playing video games was not related to men's scores. The present study examined whether participation in different types of spatial activities would be related to women's performance on the Mental Rotations Test. 31 men and 59 women enrolled at a small, private church-affiliated university and majoring in art or music as well as students who participated in intercollegiate athletics completed the Mental Rotations Test. Women's scores on the Mental Rotations Test benefitted from experience with spatial activities; the more types of experience the women had, the better their scores. Thus women who were athletes, musicians, or artists scored better than those women who had no experience with these activities. The opposite results were found for the men. Efforts are currently underway to assess how length of experience and which types of experience are related to scores.
Measures of Partial Knowledge and Unexpected Responses in Multiple-Choice Tests
ERIC Educational Resources Information Center
Chang, Shao-Hua; Lin, Pei-Chun; Lin, Zih-Chuan
2007-01-01
This study investigates differences in the partial scoring performance of examinees in elimination testing and conventional dichotomous scoring of multiple-choice tests implemented on a computer-based system. Elimination testing that uses the same set of multiple-choice items rewards examinees with partial knowledge over those who are simply…
Multiple-Choice Test Bias Due to Answering Strategy Variation.
ERIC Educational Resources Information Center
Frary, Robert B.; Giles, Mary B.
This paper describes the development and investigation of a new approach to determining the existence of bias in multiple-choice test scores. Previous work in this area has concentrated almost exclusively on bias attributable to specific test items or to differences in test score distributions across racial or ethnic groups. In contrast, the…
Racial/Ethnic Differences in the Predictive Validity of MCAT Scores.
ERIC Educational Resources Information Center
Jones, Robert F.; Mitchell, Karen
Medical College Admission Test (MCAT) score differences were examined for Black and White examinees who entered American medical schools in 1978 and 1979. The incidence of academic difficulty resulting in delayed graduation, withdrawal, or dismissal was also examined. The MCAT provides six scores: biology, chemistry, physics, science problems,…
Developmental Eye Movement (DEM) Test Norms for Mandarin Chinese-Speaking Chinese Children
Tong, Meiling; Zhang, Min; Li, Tingting; Xu, Yaqin; Guo, Xirong; Hong, Qin; Chi, Xia
2016-01-01
The Developmental Eye Movement (DEM) test is commonly used as a clinical visual-verbal ocular motor assessment tool to screen and diagnose reading problems at the onset. No established norm exists for using the DEM test with Mandarin Chinese-speaking Chinese children. This study aims to establish the normative values of the DEM test for the Mandarin Chinese-speaking population in China; it also aims to compare the values with three other published norms for English-, Spanish-, and Cantonese-speaking Chinese children. A random stratified sampling method was used to recruit children from eight kindergartens and eight primary schools in the main urban and suburban areas of Nanjing. A total of 1,425 Mandarin Chinese-speaking children aged 5 to 12 years took the DEM test in Mandarin Chinese. A digital recorder was used to record the process. All of the subjects completed a symptomatology survey, and their DEM scores were determined by a trained tester. The scores were computed using the formula in the DEM manual, except that the “vertical scores” were adjusted by taking the vertical errors into consideration. The results were compared with the three other published norms. In our subjects, a general decrease with age was observed for the four eye movement indexes: vertical score, adjusted horizontal score, ratio, and total error. For both the vertical and adjusted horizontal scores, the Mandarin Chinese-speaking children completed the tests much more quickly than the norms for English- and Spanish-speaking children. However, the same group completed the test slightly more slowly than the norms for Cantonese-speaking children. The differences in the means were significant (P<0.001) in all age groups. For several ages, the scores obtained in this study were significantly different from the reported scores of Cantonese-speaking Chinese children (P<0.005). Compared with English-speaking children, only the vertical score of the 6-year-old group, the vertical-horizontal time ratio of the 8-year-old group and the errors of 9-year-old group had no significant difference (P>0.05); compared with Spanish-speaking children, the scores were statistically significant (P<0.001) for the total error scores of the age groups, except the 6-, 9-, 10-, and 11-year-old age groups (P>0.05). DEM norms may be affected by differences in language, cultural, and educational systems among various ethnicities. The norms of the DEM test are proposed for use with Mandarin Chinese-speaking children in Nanjing and will be proposed for children throughout China. PMID:26881754
Cognitive Skills, Student Achievement Tests, and Schools
Finn, Amy S.; Kraft, Matthew A.; West, Martin R.; Leonard, Julia A.; Bish, Crystal E.; Martin, Rebecca E.; Sheridan, Margaret A.; Gabrieli, Christopher F. O.; Gabrieli, John D. E.
2014-01-01
Cognitive skills predict academic performance, so schools that improve academic performance might also improve cognitive skills. To investigate the impact schools have on both academic performance and cognitive skills, we related standardized achievement test scores to measures of cognitive skills in a large sample (N=1,367) of 8th-grade students attending traditional, exam, and charter public schools. Test scores and gains in test scores over time correlated with measures of cognitive skills. Despite wide variation in test scores across schools, differences in cognitive skills across schools were negligible after controlling for 4th-grade test scores. Random offers of enrollment to over-subscribed charter schools resulted in positive impacts of such school attendance on math achievement, but had no impact on cognitive skills. These findings suggest that schools that improve standardized achievement tests do so primarily through channels other than cognitive skills. PMID:24434238
Li, Leah
2012-01-01
Summary Studies of cognitive development in children are often based on tests designed for specific ages. Examination of the changes of these scores over time may not be meaningful. This paper investigates the influence of early life factors on cognitive development using maths and reading test scores at ages 7, 11, and 16 years in a British birth cohort born in 1958. The distributions of these test scores differ between ages, for example, 20% participants scored the top mark in the reading test at 7 and the distribution of reading score at 16 is heavily skewed. In this paper, we group participants into 5 ordered categories, approximately 20% in each category according to their test scores at each age. Multilevel models for a repeated ordinal outcome are applied to relate the ordinal scale of maths and reading ability to early life factors. PMID:22661923
ERIC Educational Resources Information Center
DeMars, Christine E.
2009-01-01
The Mantel-Haenszel (MH) and logistic regression (LR) differential item functioning (DIF) procedures have inflated Type I error rates when there are large mean group differences, short tests, and large sample sizes.When there are large group differences in mean score, groups matched on the observed number-correct score differ on true score,…
Adjorlolo, Samuel
2018-06-01
The sociocultural differences between Western and sub-Saharan African countries make it imperative to standardize neuropsychological tests in the latter. However, Western-normed tests are frequently administered in sub-Saharan Africa because of challenges hampering standardization efforts. Yet a salient topical issue in the cross-cultural neuropsychology literature relates to the utility of Western-normed neuropsychological tests in minority groups, non-Caucasians, and by extension Ghanaians. Consequently, this study investigates the diagnostic accuracy, sensitivity, and specificity of executive function (EF) tests (The Stroop Test, Trail Making Test, and Controlled Oral Word Association Test), and a Revised Quick Cognitive Screening Test (RQCST) in a sample of 50 patients diagnosed with moderate traumatic brain injury and 50 healthy controls in Ghana. The EF test scores showed good diagnostic accuracy, with area under the curve (AUC) values of the Trail Making Test scores ranging from .746 to .902. With respect to the Stroop Test scores, the AUC values ranged from .793 to .898, while Controlled Oral Word Association Test had AUC value of .787. The RQCST scores discriminated between the groups, with AUC values ranging from .674 to .912. The AUC values of composite EF score and a neuropsychological score created from EF and RQCST scores were .936 and. 942, respectively. Additionally, the Stroop Test, Trail Making Test, EF composite score, and RQCST scores showed good to excellent sensitivities and specificities. In general, this study has shown that commonly used EF tests in Western countries have diagnostic accuracy, sensitivity, and specificity when administered in Ghanaian samples. The findings and implications of the study are discussed.
Do collaborative practical tests encourage student-centered active learning of gross anatomy?
Green, Rodney A; Cates, Tanya; White, Lloyd; Farchione, Davide
2016-05-06
Benefits of collaborative testing have been identified in many disciplines. This study sought to determine whether collaborative practical tests encouraged active learning of anatomy. A gross anatomy course included a collaborative component in four practical tests. Two hundred and seven students initially completed the test as individuals and then worked as a team to complete the same test again immediately afterwards. The relationship between mean individual, team, and difference (between team and individual) test scores to overall performance on the final examination (representing overall learning in the course) was examined using regression analysis. The overall mark in the course increased by 9% with a decreased failure rate. There was a strong relationship between individual score and final examination mark (P < 0.001) but no relationship for team score (P = 0.095). A longitudinal analysis showed that the test difference scores increased after Test 1 which may be indicative of social loafing and this was confirmed by a significant negative relationship between difference score on Test 4 (indicating a weaker student) and final examination mark (P < 0.001). It appeared that for this cohort, there was little peer-to-peer learning occurring during the collaborative testing and that weaker students gained the benefit from team marks without significant active learning taking place. This negative outcome may be due to insufficient encouragement of the active learning strategies that were expected to occur during the collaborative testing process. An improved understanding of the efficacy of collaborative assessment could be achieved through the inclusion of questionnaire based data to allow a better interpretation of learning outcomes. Anat Sci Educ 9: 231-237. © 2015 American Association of Anatomists. © 2015 American Association of Anatomists.
Shenker, Bennett S
2014-02-01
To validate a scoring system that evaluates the ability of Internet search engines to correctly predict diagnoses when symptoms are used as search terms. We developed a five point scoring system to evaluate the diagnostic accuracy of Internet search engines. We identified twenty diagnoses common to a primary care setting to validate the scoring system. One investigator entered the symptoms for each diagnosis into three Internet search engines (Google, Bing, and Ask) and saved the first five webpages from each search. Other investigators reviewed the webpages and assigned a diagnostic accuracy score. They rescored a random sample of webpages two weeks later. To validate the five point scoring system, we calculated convergent validity and test-retest reliability using Kendall's W and Spearman's rho, respectively. We used the Kruskal-Wallis test to look for differences in accuracy scores for the three Internet search engines. A total of 600 webpages were reviewed. Kendall's W for the raters was 0.71 (p<0.0001). Spearman's rho for test-retest reliability was 0.72 (p<0.0001). There was no difference in scores based on Internet search engine. We found a significant difference in scores based on the webpage's order on the Internet search engine webpage (p=0.007). Pairwise comparisons revealed higher scores in the first webpages vs. the fourth (corr p=0.009) and fifth (corr p=0.017). However, this significance was lost when creating composite scores. The five point scoring system to assess diagnostic accuracy of Internet search engines is a valid and reliable instrument. The scoring system may be used in future Internet research. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Haran, F Jay; Dretsch, Michael N; Slaboda, Jill C; Johnson, Dagny E; Adam, Octavian R; Tsao, Jack W
2016-01-01
To examine differences between the baseline-referenced and norm-referenced approaches for determining decrements in Automated Neuropsychological Assessment Metrics Version 4 TBI-MIL (ANAM) performance following mild traumatic brain injury (mTBI). ANAM data were reviewed for 616 US Service members, with 528 of this sample having experienced an mTBI and 88 were controls. Post-injury change scores were calculated for each sub-test: (1) normative change score = in-theater score - normative mean and (2) baseline change score = in-theater score - pre-deployment baseline. Reliable change cut-scores were applied to the change and the resulting frequency distributions were compared using McNemar tests. Receiver operator curves (ROC) using both samples (i.e. mTBI and control) were calculated for the change scores for each approach to determine the discriminate ability of the ANAM. There were no statistical differences, p < 0.05 (Bonferonni-Holm corrected), between the approaches. When the area under the curve for the ROCs were averaged across sub-tests, there were no significant differences between either the norm-referenced (0.65) or baseline-referenced (0.66) approaches, p > 0.05. Overall, the findings suggest there is no clear advantage of using the baseline-referenced approach over norm-referenced approach.
Task-based learning versus problem-oriented lecture in neurology continuing medical education.
Vakani, Farhan; Jafri, Wasim; Ahmad, Amina; Sonawalla, Aziz; Sheerani, Mughis
2014-01-01
To determine whether general practitioners learned better with task-based learning or problem-oriented lecture in a Continuing Medical Education (CME) set-up. Quasi-experimental study. The Aga Khan University, Karachi campus, from April to June 2012. Fifty-nine physicians were given a choice to opt for either Task-based Learning (TBL) or Problem Oriented Lecture (PBL) in a continuing medical education set-up about headaches. The TBL group had 30 participants divided into 10 small groups, and were assigned case-based tasks. The lecture group had 29 participants. Both groups were given a pre and a post-test. Pre/post assessment was done using one-best MCQs. The reliability coefficient of scores for both the groups was estimated through Cronbach's alpha. An item analysis for difficulty and discriminatory indices was calculated for both the groups. Paired t-test was used to determine the difference between pre- and post-test scores of both groups. Independent t-test was used to compare the impact of the two teaching methods in terms of learning through scores produced by MCQ test. Cronbach's alpha was 0.672 for the lecture group and 0.881 for TBL group. Item analysis for difficulty (p) and discriminatory indexes (d) was obtained for both groups. The results for the lecture group showed pre-test (p) = 42% vs. post-test (p) = 43%; pre- test (d) = 0.60 vs. post-test (d) = 0.40. The TBL group showed pre -test (p) = 48% vs. post-test (p) = 70%; pre-test (d) = 0.69 vs. post-test (d) = 0.73. Lecture group pre-/post-test mean scores were (8.52 ± 2.95 vs. 12.41 ± 2.65; p < 0.001), where TBL group showed (9.70 ± 3.65 vs. 14 ± 3.99; p < 0.001). Independent t-test exhibited an insignificant difference at baseline (lecture 8.52 ± 2.95 vs. TBL 9.70 ± 3.65; p = 0.177). The post-scores were not statistically different lecture 12.41 ± 2.65 vs. TBL 14 ± 3.99; p = 0.07). Both delivery methods were found to be equally effective, showing statistically insignificant differences. However, TBL groups' post-test higher mean scores and radical increase in the post-test difficulty index demonstrated improved learning through TBL delivery and calls for further exploration of longitudinal studies in the context of CME.
ERIC Educational Resources Information Center
Ahmadi, Alireza; Sadeghi, Elham
2016-01-01
In the present study we investigated the effect of test format on oral performance in terms of test scores and discourse features (accuracy, fluency, and complexity). Moreover, we explored how the scores obtained on different test formats relate to such features. To this end, 23 Iranian EFL learners participated in three test formats of monologue,…
NASA Astrophysics Data System (ADS)
da Silva, Roberto; Lamb, Luis C.; Barbosa, Marcia C.
2016-09-01
We analyze the scores obtained by students who have taken the ENEM examination, The Brazilian High School National Examination which is used in the admission process at Brazilian universities. The average high schools scores from different disciplines are compared through the Pearson correlation coefficient. The results show a very large correlation between the performance in the different school subjects. Even though the students' scores in the ENEM form a Gaussian due to the standardization, we show that the high schools' scores form a bimodal distribution that cannot be used to evaluate and compare students performance over time. We also show that this high schools distribution reflects the correlation between school performance and the economic level (based on the average family income) of the students. The ENEM scores are compared with a Brazilian non standardized exam, the entrance examination from the Universidade Federal do Rio Grande do Sul. The analysis of the performance of the same individuals in both tests shows that the two tests not only select different abilities, but also lead to the admission of different sets of individuals. Our results indicate that standardized tests might be an interesting tool to compare performance of individuals over the years, but not of institutions.
ERIC Educational Resources Information Center
Lin, Peng; Dorans, Neil; Weeks, Jonathan
2016-01-01
The nonequivalent groups with anchor test (NEAT) design is frequently used in test score equating or linking. One important assumption of the NEAT design is that the anchor test is a miniversion of the 2 tests to be equated/linked. When the content of the 2 tests is different, it is not possible for the anchor test to be adequately representative…
Impact of Measurement Error on Statistical Power: Review of an Old Paradox.
ERIC Educational Resources Information Center
Williams, Richard H.; And Others
1995-01-01
The paradox that a Student t-test based on pretest-posttest differences can attain its greatest power when the difference score reliability is zero was explained by demonstrating that power is not a mathematical function of reliability unless either true score variance or error score variance is constant. (SLD)
Categorical Differences in Statewide Standardized Testing Scores of Students with Disabilities
ERIC Educational Resources Information Center
Trexler, Ellen L.
2013-01-01
The No Child Left Behind Act requires all students be proficient in reading and mathematics by 2014, and students in subgroups to make Adequate Yearly Progress. One of these groups is students with disabilities, who continue to score well below their general education peers. This quantitative study identified scoring differences between disability…
Relationship between the Wide Range Achievement Test 3 and the Wechsler Individual Achievement Test.
Smith, T D; Smith, B L
1998-12-01
The present study examined the relationship between the Wide Range Achievement Test 3 and the Wechsler Individual Achievement Test for a sample of children with learning disabilities in two rural school districts. Data were collected for 87 school children who had been classified as learning disabled and placed in special education resource services. Pearson product-moment correlations between scores on the two measures were significant and moderate to high; however, mean scores were not significantly different on Reading, Spelling, and Arithmetic subtests of the Wide Range Achievement Test 3 compared to those for the basic Reading, Spelling, and Mathematics Reasoning subtests of the Wechsler Individual Achievement Test. Although there were significant mean differences between scores on Reading and Reading Comprehension and on Arithmetic and Numerical Operations, magnitudes were small. It appears that the two tests provide similar results when screening for reading, spelling, and arithmetic.
Little, Paul; Hobbs, F D Richard; Moore, Michael; Mant, David; Williamson, Ian; McNulty, Cliodna; Cheng, Ying Edith; Leydon, Geraldine; McManus, Richard; Kelly, Joanne; Barnett, Jane; Glasziou, Paul; Mullee, Mark
2013-10-10
To determine the effect of clinical scores that predict streptococcal infection or rapid streptococcal antigen detection tests compared with delayed antibiotic prescribing. Open adaptive pragmatic parallel group randomised controlled trial. Primary care in United Kingdom. Patients aged ≥ 3 with acute sore throat. An internet programme randomised patients to targeted antibiotic use according to: delayed antibiotics (the comparator group for analyses), clinical score, or antigen test used according to clinical score. During the trial a preliminary streptococcal score (score 1, n=1129) was replaced by a more consistent score (score 2, n=631; features: fever during previous 24 hours; purulence; attends rapidly (within three days after onset of symptoms); inflamed tonsils; no cough/coryza (acronym FeverPAIN). Symptom severity reported by patients on a 7 point Likert scale (mean severity of sore throat/difficulty swallowing for days two to four after the consultation (primary outcome)), duration of symptoms, use of antibiotics. For score 1 there were no significant differences between groups. For score 2, symptom severity was documented in 80% (168/207 (81%) in delayed antibiotics group; 168/211 (80%) in clinical score group; 166/213 (78%) in antigen test group). Reported severity of symptoms was lower in the clinical score group (-0.33, 95% confidence interval -0.64 to -0.02; P=0.04), equivalent to one in three rating sore throat a slight versus moderate problem, with a similar reduction for the antigen test group (-0.30, -0.61 to -0.00; P=0.05). Symptoms rated moderately bad or worse resolved significantly faster in the clinical score group (hazard ratio 1.30, 95% confidence interval 1.03 to 1.63) but not the antigen test group (1.11, 0.88 to 1.40). In the delayed antibiotics group, 75/164 (46%) used antibiotics. Use of antibiotics in the clinical score group (60/161) was 29% lower (adjusted risk ratio 0.71, 95% confidence interval 0.50 to 0.95; P=0.02) and in the antigen test group (58/164) was 27% lower (0.73, 0.52 to 0.98; P=0.03). There were no significant differences in complications or reconsultations. Targeted use of antibiotics for acute sore throat with a clinical score improves reported symptoms and reduces antibiotic use. Antigen tests used according to a clinical score provide similar benefits but with no clear advantages over a clinical score alone. ISRCTN32027234.
ERIC Educational Resources Information Center
Valant, Jon; Newark, Daniel A.
2016-01-01
For decades, researchers have documented large differences in average test scores between minority and White students and between poor and wealthy students. These gaps are a focal point of reformers' and policymakers' efforts to address educational inequities. However, the U.S. public's views on achievement gaps have received little attention from…
ERIC Educational Resources Information Center
Pisano, Mark C.
The differences in California Achievement Test (CAT) scores from 1990 to 1991 in seventh graders, currently enrolled in Albritton Junior High School in the Fort Bragg Schools, of deployed and nondeployed fathers were analyzed. CAT percentile scores from 1990 and 1991 (1991 being the year of "Desert Storm") were obtained in reading, math…
Evaluation of "e-rater"® for the "Praxis I"®Writing Test. Research Report. ETS RR-15-03
ERIC Educational Resources Information Center
Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.
2015-01-01
Automated scoring models were trained and evaluated for the essay task in the "Praxis I"® writing test. Prompt-specific and generic "e-rater"® scoring models were built, and evaluation statistics, such as quadratic weighted kappa, Pearson correlation, and standardized differences in mean scores, were examined to evaluate the…
Jensen, Christian Gaden; Niclasen, Janni; Vangkilde, Signe Allerup; Petersen, Anders; Hasselbalch, Steen Gregers
2016-05-01
The Mindful Attention Awareness Scale (MAAS) measures perceived degree of inattentiveness in different contexts and is often used as a reversed indicator of mindfulness. MAAS is hypothesized to reflect a psychological trait or disposition when used outside attentional training contexts, but the long-term test-retest reliability of MAAS scores is virtually untested. It is unknown whether MAAS predicts psychological health after controlling for standardized socioeconomic status classifications. First, MAAS translated to Danish was validated psychometrically within a randomly invited healthy adult community sample (N = 490). Factor analysis confirmed that MAAS scores quantified a unifactorial construct of excellent composite reliability and consistent convergent validity. Structural equation modeling revealed that MAAS scores contributed independently to predicting psychological distress and mental health, after controlling for age, gender, income, socioeconomic occupational class, stressful life events, and social desirability (β = 0.32-.42, ps < .001). Second, MAAS scores showed satisfactory short-term test-retest reliability in 100 retested healthy university students. Finally, MAAS sample mean scores as well as individuals' scores demonstrated satisfactory test-retest reliability across a 6 months interval in the adult community (retested N = 407), intraclass correlations ≥ .74. MAAS scores displayed significantly stronger long-term test-retest reliability than scores measuring psychological distress (z = 2.78, p = .005). Test-retest reliability estimates did not differ within demographic and socioeconomic strata. Scores on the Danish MAAS were psychometrically validated in healthy adults. MAAS's inattentiveness scores reflected a unidimensional construct, long-term reliable disposition, and a factor of independent significance for predicting psychological health. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Zeng, Rui; Xiang, Lian-rui; Zeng, Jing; Zuo, Chuan
2017-01-01
Background We aimed to introduce team-based learning (TBL) as one of the teaching methods for diagnostics and to compare its teaching effectiveness with that of the traditional teaching methods. Methods We conducted a randomized controlled trial on diagnostics teaching involving 111 third-year medical undergraduates, using TBL as the experimental intervention, compared with lecture-based learning as the control, for teaching the two topics of symptomatology. Individual Readiness Assurance Test (IRAT)-baseline and Group Readiness Assurance Test (GRAT) were performed in members of each TBL subgroup. The scores in Individual Terminal Test 1 (ITT1) immediately after class and Individual Terminal Test 2 (ITT2) 1 week later were compared between the two groups. The questionnaire and interview were also implemented to survey the attitude of students and teachers toward TBL. Results There was no significant difference between the two groups in ITT1 (19.85±4.20 vs 19.70±4.61), while the score of the TBL group was significantly higher than that of the control group in ITT2 (19.15±3.93 vs 17.46±4.65). In the TBL group, the scores of the two terminal tests after the teaching intervention were significantly higher than the baseline test score of individuals. IRAT-baseline, ITT1, and ITT2 scores of students at different academic levels in the TBL teaching exhibited significant differences, but the ITT1-IRAT-baseline and ITT2-IRAT-baseline indicated no significant differences among the three subgroups. Conclusion Our TBL in symptomatology approach was highly accepted by students in the improvement of interest and self-directed learning and resulted in an increase in knowledge acquirements, which significantly improved short-term test scores compared with lecture-based learning. TBL is regarded as an effective teaching method worthy of promoting. PMID:28331383
Zeng, Rui; Xiang, Lian-Rui; Zeng, Jing; Zuo, Chuan
2017-01-01
We aimed to introduce team-based learning (TBL) as one of the teaching methods for diagnostics and to compare its teaching effectiveness with that of the traditional teaching methods. We conducted a randomized controlled trial on diagnostics teaching involving 111 third-year medical undergraduates, using TBL as the experimental intervention, compared with lecture-based learning as the control, for teaching the two topics of symptomatology. Individual Readiness Assurance Test (IRAT)-baseline and Group Readiness Assurance Test (GRAT) were performed in members of each TBL subgroup. The scores in Individual Terminal Test 1 (ITT1) immediately after class and Individual Terminal Test 2 (ITT2) 1 week later were compared between the two groups. The questionnaire and interview were also implemented to survey the attitude of students and teachers toward TBL. There was no significant difference between the two groups in ITT1 (19.85±4.20 vs 19.70±4.61), while the score of the TBL group was significantly higher than that of the control group in ITT2 (19.15±3.93 vs 17.46±4.65). In the TBL group, the scores of the two terminal tests after the teaching intervention were significantly higher than the baseline test score of individuals. IRAT-baseline, ITT1, and ITT2 scores of students at different academic levels in the TBL teaching exhibited significant differences, but the ITT1-IRAT-baseline and ITT2-IRAT-baseline indicated no significant differences among the three subgroups. Our TBL in symptomatology approach was highly accepted by students in the improvement of interest and self-directed learning and resulted in an increase in knowledge acquirements, which significantly improved short-term test scores compared with lecture-based learning. TBL is regarded as an effective teaching method worthy of promoting.
Multi-group measurement invariance of the multiple sclerosis walking scale-12?
Motl, Robert W; Mullen, Sean; McAuley, Edward
2012-03-01
One primary assumption underlying the interpretation of composite multiple sclerosis walking scale-12 (MSWS-12) scores across levels of disability status is multi-group measurement invariance. This assumption was tested in the present study between samples that differed in self-reported disability status. Participants (n = 867) completed a battery of questionnaires that included the MSWS-12 and patient-determined disease step (PDDS) scale. The multi-group invariance was tested between samples that had PDDS scores of ≤2 (i.e. no mobility limitation; n = 470) and PDDS scores ≥3 (onset of mobility limitation; n = 397) using Mplus 6·0. The omnibus test of equal covariance matrices indicated that the MSWS-12 was not invariant between the two samples that differed in disability status. The source of non-invariance occurred with the initial equivalence test of the factor structure itself. We provide evidence that questions the unambiguous interpretation of scores from the MSWS-12 as a measure of walking impairment between samples of persons with multiple sclerosis who differ in disability status.
Willmes, K
1985-08-01
Methods for the analysis of a single subject's test profile(s) proposed by Huber (1973) are applied to the Aachen Aphasia Test (AAT). The procedures are based on the classical test theory model (Lord & Novick, 1968) and are suited for any (achievement) test with standard norms from a large standardization sample and satisfactory reliability estimates. Two test profiles of a Wernicke's aphasic, obtained before and after a 3-month period of speech therapy, are analyzed using inferential comparisons between (groups of) subtest scores on one test application and between two test administrations for single (groups of) subtests. For each of these comparisons, the two aspects of (i) significant (reliable) differences in performance beyond measurement error and (ii) the diagnostic validity of that difference in the reference population of aphasic patients are assessed. Significant differences between standardized subtest scores and a remarkably better preserved reading and writing ability could be found for both test administrations using the multiple test procedure of Holm (1979). Comparison of both profiles revealed an overall increase in performance for each subtest as well as changes in level of performance relations between pairs of subtests.
Scoring systems for the Clock Drawing Test: A historical review
Spenciere, Bárbara; Alves, Heloisa; Charchat-Fichman, Helenice
2017-01-01
The Clock Drawing Test (CDT) is a simple neuropsychological screening instrument that is well accepted by patients and has solid psychometric properties. Several different CDT scoring methods have been developed, but no consensus has been reached regarding which scoring method is the most accurate. This article reviews the literature on these scoring systems and the changes they have undergone over the years. Historically, different types of scoring systems emerged. Initially, the focus was on screening for dementia, and the methods were both quantitative and semi-quantitative. Later, the need for an early diagnosis called for a scoring system that can detect subtle errors, especially those related to executive function. Therefore, qualitative analyses began to be used for both differential and early diagnoses of dementia. A widely used qualitative method was proposed by Rouleau et al. (1992). Tracing the historical path of these scoring methods is important for developing additional scoring systems and furthering dementia prevention research. PMID:29213488
Thinking Maps: An innovative way to increase sixth-grade student achievement in social studies
NASA Astrophysics Data System (ADS)
Reed, Tamita
The purpose of this quantitative study was to determine the effect of Thinking Maps on the achievement of 6th-grade social studies students in order to determine its effectiveness. The population of this study came from a suburban middle school in the state of Georgia. The quantitative data included a pretest and posttest. The study was designed to find (a) whether there is a significant difference between the mean posttest scores on the benchmark test of 6th-grade students who are taught with either Thinking Maps or traditional social studies methods, (b) whether there is a significant difference between the mean posttest scores on the benchmark test of 6th-grade male versus female social studies students, and (c) whether there is a significant interaction between 6th-grade students' type of social studies class and gender as to differentially affect their mean posttest scores on the benchmark test. To answer these questions, students' pretest and posttest were compared to determine if there was a statistically significant difference after Thinking Maps were implemented with the treatment group for 9 weeks. The results indicate that there was no significant difference in the test scores between the students who were taught with Thinking Maps and the students who were taught without Thinking Maps. However, the students taught with Thinking Maps had the higher adjusted posttest scores.
Validity of the Dictionary of Occupational Titles for Assessing Upper Extremity Work Demands
Opsteegh, Lonneke; Soer, Remko; Reinders-Messelink, Heleen A.; Reneman, Michiel F.; van der Sluis, Corry K.
2010-01-01
Objectives The Dictionary of Occupational Titles (DOT) is used in vocational rehabilitation to guide decisions about the ability of a person with activity limitations to perform activities at work. The DOT has categorized physical work demands in five categories. The validity of this categorization is unknown. Aim of this study was to investigate whether the DOT could be used validly to guide decisions for patients with injuries to the upper extremities. Four hypotheses were tested. Methods A database including 701 healthy workers was used. All subjects filled out the Dutch Musculoskeletal Questionnaire, from which an Upper Extremity Work Demands score (UEWD) was derived. First, relation between the DOT-categories and UEWD-score was analysed using Spearman correlations. Second, variance of the UEWD-score in occupational groups was tested by visually inspecting boxplots and assessing kurtosis of the distribution. Third, it was investigated whether occupations classified in one DOT-category, could significantly differ on UEWD-scores. Fourth, it was investigated whether occupations in different DOT-categories could have similar UEWD-scores using Mann Whitney U-tests (MWU). Results Relation between the DOT-categories and the UEWD-score was weak (rsp = 0.40; p<.01). Overlap between categories was found. Kurtosis exceeded ±1.0 in 3 occupational groups, indicating large variance. UEWD-scores were significantly different within one DOT-category (MWU = 1.500; p<.001). UEWD scores between DOT-categories were not significantly different (MWU = 203.000; p = .49). Conclusion All four hypotheses could not be rejected. The DOT appears to be invalid for assessing upper extremity work demands. PMID:21151934
Ebrahimi-Madiseh, Azadeh; Eikelboom, Robert H; Jayakody, Dona Mp; Atlas, Marcus D
2016-01-01
To evaluate the clinical utility of the City University of New York sentence test in a cohort of post-lingually deafened cochlear implants recipients over time. 117 post-lingually deafened, Australian English-speaking CI recipients aged between 23 and 98 years (M = 66 years; SD = 15.09) were recruited. CUNY sentence test scores in quiet were collated and analysed at two cut-offs, 95% and 100%, as ceiling scores. CUNY sentence scores ranged from 4% to 100% (M = 86.75; SD = 20.65), with 38.8% of participants scoring 95% and 16.5% of participants reaching the 100% scores. The percentage of participants reaching the 95% and 100% ceiling scores increased over time (6 and 12 months post-implantation). The distribution of all post-operative CUNY test scores skewed to the right with 82% of test scores reaching above 90%. This study demonstrates that the CUNY test cannot be used as a valid tool to measure the speech perception skills of post-lingually deafened CI recipients over time. This may be overcome by using adaptive test protocols or linguistically, cognitively or contextually demanding test materials. The high percentage of CI recipients achieving ceiling scores for the CUNY sentence test in quiet at 3 months post-implantation, questions the validity of using CUNY in CI assessment test battery and limits its application for use in longitudinal studies evaluating CI outcomes. Further studies are required to examine different methods to overcome this problem.
A job-related fitness test for the Dutch police.
Strating, M; Bakker, R H; Dijkstra, G J; Lemmink, K A P M; Groothoff, J W
2010-06-01
The variety of tasks that characterize police work highlights the importance of being in good physical condition. To take a first step at standardizing the administration of a job-related test to assess a person's ability to perform the physical demands of the core tasks of police work. The principal research questions were: are test scores related to gender, age and function and are test scores related to body mass index (BMI) and the number of hours of physical exercise? Data of 6999 police officers, geographically spread over all parts of The Netherlands, who completed a physical competence test over a 1 year period were analysed. Women performed the test significantly more slowly than men. The mean test score was also related to age; the older a person the longer it took to complete the test. A higher BMI was associated with less hours of body exercise a week and a slower test performance, both in women and men. The differences in individual test scores, based on gender and age, have implications for future strategy within the police force. From a viewpoint of 'same job, same standard' one has to accept that test-score differences may lead to the exclusion of certain staff. However, from a viewpoint of 'diversity as a business issue', one may have to accept that on average, both female and older police officers are physically less tailored to their jobs than their male and younger colleagues.
Digital education and dynamic assessment of tongue diagnosis based on Mashup technique.
Tsai, Chin-Chuan; Lo, Yen-Cheng; Chiang, John Y; Sainbuyan, Natsagdorj
2017-01-24
To assess the digital education and dynamic assessment of tongue diagnosis based on Mashup technique (DEDATD) according to specifific user's answering pattern, and provide pertinent information tailored to user's specifific needs supplemented by the teaching materials constantly updated through the Mashup technique. Fifty-four undergraduate students were tested with DEDATD developed. The effificacy of the DEDATD was evaluated based on the pre- and post-test performance, with interleaving training sessions targeting on the weakness of the student under test. The t-test demonstrated that signifificant difference was reached in scores gained during pre- and post-test sessions, and positive correlation between scores gained and length of time spent on learning, while no signifificant differences between the gender and post-test score, and the years of students in school and the progress in score gained. DEDATD, coupled with Mashup technique, could provide updated materials fifiltered through diverse sources located across the network. The dynamic assessment could tailor each individual learner's needs to offer custom-made learning materials. DEDATD poses as a great improvement over the traditional teaching methods.
The developmental eye movement (DEM) test and Cantonese-speaking children in Hong Kong SAR, China.
Pang, Peter C; Lam, Carly S; Woo, George C
2010-07-01
There is no published norm for the Developmental Eye Movement (DEM) Test for Cantonese-speaking Chinese children. This study aimed to determine the normative values of this test for Cantonese-speaking Chinese children in Hong Kong SAR and to compare the results with the published norms of English-speaking and Spanish-speaking children. Cantonese-speaking students aged from 6 to 11 years were tested by the DEM test in Cantonese and a digital recorder was used to record the process. The DEM scores for the 305 students were determined by listening again to the audio records after the test and computed by using the formula from the DEM manual, except that the 'vertical scores' were adjusted by taking the vertical errors into consideration. The results were compared with other norms that have been published. Our subjects made more vertical errors than in other normative studies and adjusted vertical scores were proposed. In both adjusted vertical and horizontal scores, the Cantonese-speaking children completed the tests much faster than the norms for English- and Spanish-speaking children, the differences of the means being significant (p < 0.0001) in all age groups. The DEM norms may be affected by differences in languages, cultures and education systems among different ethnicities. The norms of the DEM test are proposed for Cantonese-speaking children in Hong Kong SAR, China.
A Guide for Setting the Cut-Scores to Minimize Weighted Classification Errors in Test Batteries
ERIC Educational Resources Information Center
Grabovsky, Irina; Wainer, Howard
2017-01-01
In this article, we extend the methodology of the Cut-Score Operating Function that we introduced previously and apply it to a testing scenario with multiple independent components and different testing policies. We derive analytically the overall classification error rate for a test battery under the policy when several retakes are allowed for…
Sex differences in physics learning and evaluations in an introductory course
NASA Astrophysics Data System (ADS)
Blue, Jennifer Marie
On a national level, boys and men score higher than girls and women on science and math tests. There have been several investigations into the reasons for these differences, with some believing that they are caused by innate biological sex differences and some that they are caused by social and cultural gender differences. In addition, women who plan to major in science and engineering drop out of those majors at higher rates than men do. This study was designed to contribute to the ongoing discussion about why these differences between women and men exist. This study compared post-test physics scores of a matched sample of men and women to see whether there were differences in how much physics had been learned at the end of a course when there were few differences at the beginning of the course. The study also looked at the ratings that men and women gave to the problem solving method and the sections of the course that used cooperative grouping. It was found that, although the population of students taking Physics 1251 showed differences in performance on physics tests both at the beginning and at the end of the course, when students were matched according to their high school background and their physics pretest scores there was no difference in their post-test scores. It was also found that women liked the relevant aspects of the course more than men did. Implications of these results are discussed.
Low aerobic fitness and obesity are associated with lower standardized test scores in children.
Roberts, Christian K; Freed, Benjamin; McCarthy, William J
2010-05-01
To investigate whether aerobic fitness and obesity in school children are associated with standardized test performance. Ethnically diverse (n = 1989) 5th, 7th, and 9th graders attending California schools comprised the sample. Aerobic fitness was determined by a 1-mile run/walk test; body mass index (BMI) was obtained from state-mandated measurements. California standardized test scores were obtained from the school district. Students whose mile run/walk times exceeded California Fitnessgram standards or whose BMI exceeded Centers for Disease Control sex- and age-specific body weight standards scored lower on California standardized math, reading, and language tests than students with desirable BMI status or fitness level, even after controlling for parent education among other covariates. Ethnic differences in standardized test scores were consistent with ethnic differences in obesity status and aerobic fitness. BMI-for-age was no longer a significant multivariate predictor when covariates included fitness level. Low aerobic fitness is common among youth and varies among ethnic groups, and aerobic fitness level predicts performance on standardized tests across ethnic groups. More research is needed to uncover the physiological mechanisms by which aerobic fitness may contribute to performance on standardized academic tests.
Oak, Sameer R; O'Rourke, Colin; Strnad, Greg; Andrish, Jack T; Parker, Richard D; Saluan, Paul; Jones, Morgan H; Stegmeier, Nicole A; Spindler, Kurt P
2015-09-01
The International Knee Documentation Committee (IKDC) Subjective Knee Evaluation Form is a patient-reported outcome with adult (1998) and pediatric (2011) versions validated at different ages. Prior longitudinal studies of patients aged 13 to 17 years who tore their anterior cruciate ligament (ACL) have used the only available adult IKDC, whereas currently the pediatric IKDC is the accepted form of choice. This study compared the adult and pediatric IKDC forms and tested whether the differences were clinically significant. The hypothesis was that the pediatric and adult IKDC questionnaires would show no clinically significant differences in score when completed by patients aged 13 to 17 years. Cohort study (diagnosis); Level of evidence, 2. A total of 100 participants aged 13 to 17 years with knee injuries were split into 2 groups by use of simple randomization. One group answered the adult IKDC form first and then the pediatric form. The second group answered the pediatric IKDC form first and then the adult form. A 10-minute break was given between form administrations to prevent rote repetition of answers. Study design was based on established methods to compare 2 forms of patient-reported outcomes. A 5-point threshold for clinical significance was set below previously published minimum clinically important differences for the adult IKDC. Paired t tests were used to test both differences and equivalence between scores. By ordinary least-squares models, scores were modeled to predict adult scores given certain pediatric scores and vice versa. Comparison between adult and pediatric IKDC scores showed a statistically significant difference of 1.5 points; however, the 95% CI (0.3-2.6) fell below the threshold of 5 points set for clinical significance. Further equivalence testing showed the 95% CI (0.5-2.4) between adult and pediatric scores being within the defined 5-point equivalence region. The scores were highly correlated, with a linear relationship (R(2) = 92%). There was no clinically significant difference between the pediatric and adult IKDC form scores in adolescents aged 13 to 17 years. This result allows use of whichever form is most practical for long-term tracking of patients. A simple linear equation can convert one form into the other. If the adult questionnaire is used at this age, it can be consistently used during follow-up. © 2015 The Author(s).
Berndl, K; von Cranach, M; Grüsser, O J
1986-01-01
The perception and recognition of faces, mimic expression and gestures were investigated in normal subjects and schizophrenic patients by means of a movie test described in a previous report (Berndl et al. 1986). The error scores were compared with results from a semi-quantitative evaluation of psychopathological symptoms and with some data from the case histories. The overall error scores found in the three groups of schizophrenic patients (paranoic, hebephrenic, schizo-affective) were significantly increased (7-fold) over those of normals. No significant difference in the distribution of the error scores in the three different patient groups was found. In 10 different sub-tests following the movie the deficiencies found in the schizophrenic patients were analysed in detail. The error score for the averbal test was on average higher in paranoic patients than in the two other groups of patients, while the opposite was true for the error scores found in the verbal tests. Age and sex had some impact on the test results. In normals, female subjects were somewhat better than male. In schizophrenic patients the reverse was true. Thus female patients were more affected by the disease than male patients with respect to the task performance. The correlation between duration of the disease and error score was small; less than 10% of the error scores could be attributed to factors related to the duration of illness. Evaluation of psychopathological symptoms indicated that the stronger the schizophrenic defect, the higher the error score, but again this relationship was responsible for not more than 10% of the errors. The estimated degree of acute psychosis and overall sum of psychopathological abnormalities as scored in a semi-quantitative exploration did not correlate with the error score, but with each other. Similarly, treatment with psychopharmaceuticals, previous misuse of drugs or of alcohol had practically no effect on the outcome of the test data. The analysis of performance and test data of schizophrenic patients indicated that our findings are most likely not due to a "non-specific" impairment of cognitive function in schizophrenia, but point to a fairly selective defect in elementary cognitive visual functions necessary for averbal social communication. Some possible explanations of the data are discussed in relation to neuropsychological and neurophysiological findings on "face-specific" cortical areas located in the primate temporal lobe.
McLean, James M; Brumby-Rendell, Oscar; Lisle, Ryan; Brazier, Jacob; Dunn, Kieran; Gill, Tiffany; Hill, Catherine L; Mandziak, Daniel; Leith, Jordan
2018-05-01
The aim was to assess whether the Knee Society Score, Oxford Knee Score (OKS) and Knee Injury and Osteoarthritis Outcome Score (KOOS) were comparable in asymptomatic, healthy, individuals of different age, gender and ethnicity, across two remote continents. The purpose of this study was to establish normal population values for these scores using an electronic data collection system. There is no difference in clinical knee scores in an asymptomatic population when comparing age, gender and ethnicity, across two remote continents. 312 Australian and 314 Canadian citizens, aged 18-94 years, with no active knee pain, injury or pathology in the ipsilateral knee corresponding to their dominant arm, were evaluated. A knee examination was performed and participants completed an electronically administered questionnaire covering the subjective components of the knee scores. The cohorts were age- and gender-matched. Chi-square tests, Fisher's exact test and Poisson regression models were used where appropriate, to investigate the association between knee scores, age, gender, ethnicity and nationality. There was a significant inverse relationship between age and all assessment tools. OKS recorded a significant difference between gender with females scoring on average 1% lower score. There was no significant difference between international cohorts when comparing all assessment tools. An electronic, multi-centre data collection system can be effectively utilized to assess remote international cohorts. Differences in gender, age, ethnicity and nationality should be taken into consideration when using knee scores to compare to pathological patient scores. This study has established an electronic, normal control group for future studies using the Knee society, Oxford, and KOOS knee scores. Diagnostic Level II.
Does familiarity with computers affect computerized neuropsychological test performance?
Iverson, Grant L; Brooks, Brian L; Ashton, V Lynn; Johnson, Lynda G; Gualtieri, C Thomas
2009-07-01
The purpose of this study was to determine whether self-reported computer familiarity is related to performance on computerized neurocognitive testing. Participants were 130 healthy adults who self-reported whether their computer use was "some" (n = 65) or "frequent" (n = 65). The two groups were individually matched on age, education, sex, and race. All completed the CNS Vital Signs (Gualtieri & Johnson, 2006b) computerized neurocognitive battery. There were significant differences on 6 of the 23 scores, including scores derived from the Symbol-Digit Coding Test, Stroop Test, and the Shifting Attention Test. The two groups were also significantly different on the Psychomotor Speed (Cohen's d = 0.37), Reaction Time (d = 0.68), Complex Attention (d = 0.40), and Cognitive Flexibility (d = 0.64) domain scores. People with "frequent" computer use performed better than people with "some" computer use on some tests requiring rapid visual scanning and keyboard work.
[Development and validation of the Visual Analogue Scale (VAS) Spine Score].
Knop, C; Oeser, M; Bastian, L; Lange, U; Zdichavsky, M; Blauth, M
2001-06-01
The aim of the study was the development and validation of a new subjective rating scale for assessment of outcome in patients with thoracolumbar fractures and fracture dislocations. The VAS spine score consists of 19 score items, using 100-mm visual analogue scales. The items are answered by the patients independently of rater assessment. To measure the analogue scales and calculate the score, a computer-aided system was evolved consisting of self-developed software and digitizer board. The overall score is the mean of all items answered with values between 0 and 100. The individual score loss is calculated as the difference between the preinjury score and at follow-up with values between 0 and 100. The VAS spine score was tested for reliability with a group of 136 healthy volunteers. We performed a test-retest study with an interval of 24 h. For statistical analysis of the validity, we prospectively followed a group of 53 patients with the new outcome score. We chose patients with injuries of the thoracolumbar spine, all having been operatively treated by combined posterior-anterior stabilization and fusion between 1994 and 1996. In the reference group, the average test score was 91.95 (58-100) and 92.10 (58-100) at retest. The mean individual difference between test and retest scored 1.037 (0-8). A high reliability was proved by a strong correlation with a coefficient of 0.976 (p < 0.001). A high internal consistency of the VAS spine score was shown by a Cronbach-alpha of 0.9117. The mean score for the preinjury status of the patients was comparable to the reference group, amounting to 89.60 (21-100). The mean score at the time of implant removal was significantly (p < 0.001) decreased to 58.25 (13-97). Until the time of follow-up a significant (p < 0.001) increase was noted, and the group scored 66.08 (15-100) at follow-up. This was a significant (p < 0.001) difference compared with the preinjury status. The individual score loss averaged 24.1 (0-80). In the patient group we also noted a Cronbach-alpha > 0.95, indicating a high internal consistency. With the VAS spine score the authors have inaugurated a new tool for outcome measurement in the treatment of patients with thoracolumbar injuries. The study has proved the score to be both reliable and valid. The application of the score is helpful in analyzing the subjective outcome, and the results can be correlated with objective measures. The score is a useful tool for comparative clinical studies, addressing the outcome after different methods of treatment.
Valciukas, J A; Lilis, R; Wolff, M S; Anderson, H A
1978-01-01
An analysis of findings regarding the prevalence and time course of symptoms and the results of neurobehavioral testing among Michigan and Wisconsin dairy farmers, is reported. Reviewed are: (1) differences in the prevalence of neurological symptoms at the time of examination; (2) differences in the incidence and time course of symptoms for the period 1972--1976; (3) differences among populations and subgroups (sex and age) regarding performance test scores; (4) correlations between performance test scores and neurological symptoms; and (5) correlations between serum PBB levels as indicators of exposure and performance tests and neurological symptoms. PMID:209977
Gibson, Todd A; Oller, D Kimbrough; Jarmulowicz, Linda
2018-03-01
Receptive standardized vocabulary scores have been found to be much higher than expressive standardized vocabulary scores in children with Spanish as L1, learning L2 (English) in school (Gibson et al., 2012). Here we present evidence suggesting the receptive-expressive gap may be harder to evaluate than previously thought because widely-used standardized tests may not offer comparable normed scores. Furthermore monolingual Spanish-speaking children tested in Mexico and monolingual English-speaking children in the US showed other, yet different statistically significant discrepancies between receptive and expressive scores. Results suggest comparisons across widely used standardized tests in attempts to assess a receptive-expressive gap are precarious.
Correlations of diffusion tensor imaging values and symptom scores in patients with schizophrenia.
Michael, Andrew M; Calhoun, Vince D; Pearlson, Godfrey D; Baum, Stefi A; Caprihan, Arvind
2008-01-01
Abnormalities in white matter (WM) brain regions are attributed as a possible biomarker for schizophrenia (SZ). Diffusion tensor imaging (DTI) is used to capture WM tracts. Psychometric tests that evaluate the severity of symptoms of SZ are clinically used in the diagnosis process. In this study we investigate the correlates of scalar DTI measures, such as fractional anisotropy, mean diffusivity, axial diffusivity, and radial diffusivity with behavioral test scores. The correlations were found by different schemes: mean correlation with WM atlas regions and multiple regression of DTI values with test scores. The corpus callosum, superior longitudinal fasciculus right and inferior longitudinal fasciculus left were found to be having high correlations with test scores.
Thibodeau, Michel A; Leonard, Rachel C; Abramowitz, Jonathan S; Riemann, Bradley C
2015-12-01
The Dimensional Obsessive-Compulsive Scale (DOCS) is a promising measure of obsessive-compulsive disorder (OCD) symptoms but has received minimal psychometric attention. We evaluated the utility and reliability of DOCS scores. The study included 832 students and 300 patients with OCD. Confirmatory factor analysis supported the originally proposed four-factor structure. DOCS total and subscale scores exhibited good to excellent internal consistency in both samples (α = .82 to α = .96). Patient DOCS total scores reduced substantially during treatment (t = 16.01, d = 1.02). DOCS total scores discriminated between students and patients (sensitivity = 0.76, 1 - specificity = 0.23). The measure did not exhibit gender-based differential item functioning as tested by Mantel-Haenszel chi-square tests. Expected response options for each item were plotted as a function of item response theory and demonstrated that DOCS scores incrementally discriminate OCD symptoms ranging from low to extremely high severity. Incremental differences in DOCS scores appear to represent unbiased and reliable differences in true OCD symptom severity. © The Author(s) 2014.
Sex Differences in Vestibular/Ocular and Neurocognitive Outcomes After Sport-Related Concussion.
Sufrinko, Alicia M; Mucha, Anne; Covassin, Tracey; Marchetti, Greg; Elbin, R J; Collins, Michael W; Kontos, Anthony P
2017-03-01
To examine sex differences in vestibular and oculomotor symptoms and impairment in athletes with sport-related concussion (SRC). The secondary purpose was to replicate previously reported sex differences in total concussion symptoms, and performance on neurocognitive and balance testing. Prospective cross-sectional study of consecutively enrolled clinic patients within 21 days of a SRC. Specialty Concussion Clinic. Included male (n = 36) and female (n = 28) athletes ages 9 to 18 years. Vestibular symptoms and impairment was measured with the Vestibular/Ocular Motor Screening (VOMS). Participants completed the Immediate Post-concussion Assessment and Cognitive Test (ImPACT), Post-concussion Symptom Scale (PCSS), and Balance Error Scoring System (BESS). Sex differences on clinical measures. Females had higher PCSS scores (P = 0.01) and greater VOMS vestibular ocular reflex (VOR) score (P = 0.01) compared with males. There were no sex differences on BESS or ImPACT. Total PCSS scores together with female sex accounted for 45% of the variance in VOR scores. Findings suggest higher VOR scores after SRC in female compared with male athletes. Findings did not extend to other components of the VOMS tool suggesting that sex differences may be specific to certain types of vestibular impairment after SRC. Additional research on the clinical significance of the current findings is needed.
Pradhan, Malati; Dash, Bijayalakshmi
2015-05-01
Infectious disease is a major public health issue for both developed and developing countries. Among infectious diseases, tuberculosis (TB) is most prevalent in the develop- ing countries. India is the highest TB burden country in the world and accounts for nearly one fifth (20%) of global burden of tuberculosis. A pre-experimental design where pre- and post-test without control group with experimental approach was undertaken in Kuchinda block of Sambalpur district (Odisha) with the objectives to assess effectiveness of Video-assisted Teaching Module (VATM) on knowledge of Accredited Social Health Activists (ASHAs) regarding Revised National Tuberculosis Control Programme (RNTCP) Data were collected from 52 ASHAs, selected by systematic random sampling technique through structured questionnaire. The overall mean score in pre-test was 23.31±3.07 which is 58.27 percent of maximum score and good knowledge whereas it was 34.35±3.56 while post-test it was 85.87 percent of maximum score during post-test showing a difference of 27.6 percent effectiveness. Highly significant (p<0.01) differ- ence was found between pre- and post-test knowledge score and no significant (>0.05) association was found between post-test knowledge score when compared to all the demographic variables of ASHAs.
The Role of Family Socioeconomic Resources in the Black-White Test Score Gap among Young Children
ERIC Educational Resources Information Center
Magnuson, Katherine A.; Duncan, Greg J.
2006-01-01
This paper reviews evidence on the family origins of racial differences in young children's test scores and considers how much of the gap is due to differences in the economic and demographic conditions in which black and white children grow up. Our review of the literature finds that the estimated size of the gaps varies considerably across…
1985-04-01
EM 32 12 MICROCOP REOUTO TETCHR NTOA B URA FSA4ARS16- AFHRL-TR-84-64 9 AIR FORCE 6 __ H EQUIPERCENTILE TEST EQUATING: THE EFFECTS OF PRESMOOTHING AND...combined or compound presmoother and a presmoothing method based on a particular model of test scores. Of the seven methods of presmoothing the score...unsmoothed distributions, the smoothing of that sequence of differences by the same compound method, and, finally, adding the smoothed differences back
Improved auscultation skills in paramedic students using a modified stethoscope.
Simon, Erin L; Lecat, Paul J; Haller, Nairmeen A; Williams, Carolyn J; Martin, Scott W; Carney, John A; Pakiela, John A
2012-12-01
The Ventriloscope® (Lecat's SimplySim, Tallmadge, OH) is a modified stethoscope used as a simulation training device for auscultation. To test the effectiveness of the Ventriloscope as a training device in teaching heart and lung auscultatory findings to paramedic students. A prospective, single-hospital study conducted in a paramedic-teaching program. The standard teaching group learned heart and lung sounds via audiocassette recordings and lecture, whereas the intervention group utilized the modified stethoscope in conjunction with patient volunteers. Study subjects took a pre-test, post-test, and a follow-up test to measure recognition of heart and lung sounds. The intervention group included 22 paramedic students and the standard group included 18 paramedic students. Pre-test scores did not differ using two-sample t-tests (standard group: t [16]=-1.63, p=0.12) and (intervention group: t [20]=-1.17, p=0.26). Improvement in pre-test to post-test scores was noted within each group (standard: t [17]=2.43, p=0.03; intervention: t [21]=4.81, p<0.0001). Follow-up scores for the standard group were not different from pre-test scores of 16.06 (t [17]=0.94, p=0.36). However, follow-up scores for the intervention group significantly improved from their respective pre-test score of 16.05 (t [21]=2.63, p=0.02). Simulation training using a modified stethoscope in conjunction with standardized patients allows for realistic learning of heart and lung sounds. This technique of simulation training achieved proficiency and better retention of heart and lung sounds in a safe teaching environment. Copyright © 2012 Elsevier Inc. All rights reserved.
Content and retention evaluation of an audiovisual patient-education program on bronchodilators.
Darr, M S; Self, T H; Ryan, M R; Vanderbush, R E; Boswell, R L
1981-05-01
A study was conducted to: (1) evaluate the effect of a slide-tape program on patients' short-term and long-term knowledge about their bronchodilator medications; and (2) determine it any differences exist in learning or retention patterns for different content areas of drug information. The knowledge of 30 patients was measured using a randomized sequence of three comparable 15-question tests. The first test was given before the slide-tape program was presented, the second test within 24 hours, and the last test one to six months (mean = 2.8 months) later. Scores attained on the first posttest were significantly higher (p less than 0.001) than pretest scores. Learning differences among drug-information-content areas were not evidenced on the first posttest. No significant difference was demonstrated between scores on pretest and last posttest (p = 0.100). However, retention patterns among content areas were found to differ significantly (p less than 0.05). Carefully designed audiovisual programs can impart drug information to patients. Medication counseling should be repeated at appropriate opportunities because patients lose drug knowledge over time.
Medical students perception of test anxiety triggered by different assessment modalities.
Guraya, Salman Y; Guraya, Shaista S; Habib, Fawzia; AlQuiliti, Khalid W; Khoshhal, Khalid I
2018-05-06
Test anxiety is well known among medical students. However, little is known about test anxiety produced by different components of exam individually. This study aimed to stratify varying levels of test anxiety provoked by each exam modality and to explore the students perceptions about confounding factors. A self-administered questionnaire was administered to medical students. The instrument contained four main themes; lifestyle, psychological and specific factors of information needs, learning styles, and perceived difficulty level of each assessment tool. A highest test anxiety score of 5 was ranked for "not scheduling available time" and "insufficient exercise" by 28.8 and 28.3% students, respectively. For "irrational thoughts about exam" and "fear to fail", a highest test anxiety score of 5 was scored by 28.8 and 25.7% students, respectively. The highest total anxiety score of 1255 was recorded for long case exam, followed by 975 for examiner-based objective structured clinical examination. Excessive course load and course not well covered by faculty were thought to be the main confounding factors. The examiner-based assessment modalities induced high test anxiety. Faculty is urged to cover core contents within stipulated time and to rigorously reform and update existing curricula to prepare relevant course material.
Sälzer, S; Rosema, N A M; Martin, E C J; Slot, D E; Timmer, C J; Dörfer, C E; van der Weijden, G A
2016-04-01
The aim of this study was to compare the efficacy of a dentifrice without sodium lauryl sulfate (SLS) to a dentifrice with SLS in young adults aged 18-34 years on gingivitis. One hundred twenty participants (non-dental students) with a moderate gingival inflammation (bleeding on probing at 40-70 % of test sites) were included in this randomized controlled double blind clinical trial. According to randomization, participants had to brush their teeth either with dentifrice without SLS or with SLS for 8 weeks. The primary outcome was bleeding on marginal probing (BOMP). The secondary outcomes were plaque scores and gingival abrasion scores (GA) as well as a visual analogue scale (VAS) score at exit survey. Baseline and end differences were analysed by univariate analysis of covariance (ANCOVA) test, between group differences by independent t test and within groups by paired sample t test. BOMP improved within groups from on average 0.80 at baseline to 0.60 in the group without SLS and to 0.56 in the group with SLS. No statistical difference for BOMP, plaque and gingival abrasion was found between both groups. VAS scores for taste, freshness and foaming effect were significantly in favour of the SLS-containing dentifrice. The test dentifrice without SLS was as effective as a regular SLS dentifrice on gingival bleeding scores and plaque scores. There was no significant difference in the incidence of gingival abrasion. In patients diagnosed with gingivitis, a dentifrice without SLS seems to be equally effective compared to a dentifrice with SLS and did not demonstrate any significant difference in gingival abrasion. In patient with recurrent aphthous ulcers, the absence of SLS may even be beneficial. However, participants indicate that they appreciate the foaming effect of a dentifrice with SLS more.
Age-related invariance of abilities measured with the Wechsler Adult Intelligence Scale-IV.
Sudarshan, Navaneetham J; Bowden, Stephen C; Saklofske, Donald H; Weiss, Lawrence G
2016-11-01
Assessment of measurement invariance across populations is essential for meaningful comparison of test scores, and is especially relevant where repeated measurements are required for educational assessment or clinical diagnosis. Establishing measurement invariance legitimizes the assumption that test scores reflect the same psychological trait in different populations or across different occasions. Examination of Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV) U.S. standardization samples revealed that a first-order 5-factor measurement model was best fitting across 9 age groups from 16 years to 69 years. Strong metric invariance was found for 3 of 5 factors and partial intercept invariance for the remaining 2. Pairwise comparisons of adjacent age groups supported the inference that cognitive-trait group differences are manifested by group differences in the test scores. In educational and clinical settings these findings provide theoretical and empirical support to interpret changes in the index or subtest scores as reflecting changes in the corresponding cognitive abilities. Further, where clinically relevant, the subtest score composites can be used to compare changes in respective cognitive abilities. The model was supported in the Canadian standardization data with pooled age groups but the sample sizes were not adequate for detailed examination of separate age groups in the Canadian sample. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Terashima, Taiko; Yoshimura, Sadako
2018-03-01
To determine whether nurses can accurately assess the skin colour of replanted fingers displayed as digital images on a computer screen. Colour measurement and clinical diagnostic methods for medical digital images have been studied, but reproducing skin colour on a computer screen remains difficult. The inter-rater reliability of skin colour assessment scores was evaluated. In May 2014, 21 nurses who worked on a trauma ward in Japan participated in testing. Six digital images with different skin colours were used. Colours were scored from both digital images and direct patient's observation. The score from a digital image was defined as the test score, and its difference from the direct assessment score as the difference score. Intraclass correlation coefficients were calculated. Nurses' opinions were classified and summarised. The intraclass correlation coefficients for the test scores were fair. Although the intraclass correlation coefficients for the difference scores were poor, they improved to good when three images that might have contributed to poor reliability were excluded. Most nurses stated that it is difficult to assess skin colour in digital images; they did not think it could be a substitute for direct visual assessment. However, most nurses were in favour of including images in nursing progress notes. Although the inter-rater reliability was fairly high, the reliability of colour reproduction in digital images as indicated by the difference scores was poor. Nevertheless, nurses expect the incorporation of digital images in nursing progress notes to be useful. This gap between the reliability of digital colour reproduction and nurses' expectations towards it must be addressed. High inter-rater reliability for digital images in nursing progress notes was not observed. Assessments of future improvements in colour reproduction technologies are required. Further digitisation and visualisation of nursing records might pose challenges. © 2017 John Wiley & Sons Ltd.
Bereby-Meyer, Yoella; Meyer, Joachim; Budescu, David V
2003-02-01
This paper assesses framing effects on decision making with internal uncertainty, i.e., partial knowledge, by focusing on examinees' behavior in multiple-choice (MC) tests with different scoring rules. In two experiments participants answered a general-knowledge MC test that consisted of 34 solvable and 6 unsolvable items. Experiment 1 studied two scoring rules involving Positive (only gains) and Negative (only losses) scores. Although answering all items was the dominating strategy for both rules, the results revealed a greater tendency to answer under the Negative scoring rule. These results are in line with the predictions derived from Prospect Theory (PT) [Econometrica 47 (1979) 263]. The second experiment studied two scoring rules, which allowed respondents to exhibit partial knowledge. Under the Inclusion-scoring rule the respondents mark all answers that could be correct, and under the Exclusion-scoring rule they exclude all answers that might be incorrect. As predicted by PT, respondents took more risks under the Inclusion rule than under the Exclusion rule. The results illustrate that the basic process that underlies choice behavior under internal uncertainty and especially the effect of framing is similar to the process of choice under external uncertainty and can be described quite accurately by PT. Copyright 2002 Elsevier Science B.V.
NASA Astrophysics Data System (ADS)
Jacek, Laura Lee
This dissertation details an experiment designed to identify gender differences in learning using three experimental treatments: animation, static graphics, and verbal instruction alone. Three learning presentations were used in testing of 332 university students. Statistical analysis was performed using ANOVA, binomial tests for differences of proportion, and descriptive statistics. Results showed that animation significantly improved women's long-term learning over static graphics (p = 0.067), but didn't significantly improve men's long-term learning over static graphics. In all cases, women's scores improved with animation over both other forms of instruction for long-term testing, indicating that future research should not abandon the study of animation as a tool that may promote gender equity in science. Short-term test differences were smaller, and not statistically significant. Variation present in short-term scores was related more to presentation topic than treatment. This research also details characteristics of each of the three presentations, to identify variables (e.g. level of abstraction in presentation) affecting score differences within treatments. Differences between men's and women's scores were non-standard between presentations, but these differences were not statistically significant (long-term p = 0.2961, short-term p = 0.2893). In future research, experiments might be better designed to test these presentational variables in isolation, possibly yielding more distinctive differences between presentational scores. Differences in confidence interval overlaps between presentations suggested that treatment superiority may be somewhat dependent on the design or topic of the learning presentation. Confidence intervals greatly overlap in all situations. This undercut, to some degree, the surety of conclusions indicating superiority of one treatment type over the others. However, confidence intervals for animation were smaller, overlapped nearly completely for men and women (there was less overlap between the genders for the other two treatments), and centered around slightly higher means, lending further support to the conclusion that animation helped equalize men's and women's learning. The most important conclusion identified in this research is that gender is an important variable experimental populations testing animation as a learning device. Averages indicated that both men and women prefer to work with animation over either static graphics or verbal instruction alone.
Standardized Testing of Special Education Students: A Comparison of Service Type and Test Scores
ERIC Educational Resources Information Center
Hogan-Young, Christine
2013-01-01
The purpose of this study was to determine if there was a difference in Tennessee Comprehensive Assessment Program Modified Academic Achievement Standards (TCAP MAAS) achievement test scores for special education students who receive their instruction in the resource classroom or in an inclusion classroom. The study involved third, fourth, and…
ERIC Educational Resources Information Center
Needham, Martha Elaine
2010-01-01
This research compares differences between standardized test scores in problem-based learning (PBL) classrooms and a traditional classroom for 6th grade students using a mixed-method, quasi-experimental and qualitative design. The research shows that problem-based learning is as effective as traditional teaching methods on standardized tests. The…
The Disaggregation of Value-Added Test Scores to Assess Learning Outcomes in Economics Courses
ERIC Educational Resources Information Center
Walstad, William B.; Wagner, Jamie
2016-01-01
This study disaggregates posttest, pretest, and value-added or difference scores in economics into four types of economic learning: positive, retained, negative, and zero. The types are derived from patterns of student responses to individual items on a multiple-choice test. The micro and macro data from the "Test of Understanding in College…
Effects of Didactic Instruction and Test-Enhanced Learning in a Nursing Review Course.
Tu, Yu-Ching; Lin, Yi-Jung; Lee, Jonathan W; Fan, Lir-Wan
2017-11-01
Determining the most effective approach for students' successful academic performance and achievement on the national licensure examination for RNs is important to nursing education and practice. A quasi-experimental design was used to compare didactic instruction and test-enhanced learning among nursing students divided into two fundamental nursing review courses in their final semester. Students in each course were subdivided into low-, intermediate-, and high-score groups based on their first examination scores. Mixed model of repeated measure and two-way analysis of variance were applied to evaluate students' academic results and both teaching approaches. Intermediate-scoring students' performances improved more through didactic instruction, whereas low-scoring students' performances improved more through test-enhanced learning. Each method had differing effects on individual subgroups within the different performance level groups of their classes, which points to the importance of considering both the didactic and test-enhanced learning approaches. [J Nurs Educ. 2017;56(11):683-687.]. Copyright 2017, SLACK Incorporated.
Liu, Jianghong; Lynn, Richard
2011-08-01
This study presents data on the factor structure of the Wechsler Preschool and Primary Scale of Intelligence (WPPSI) and sex and cultural differences in WPPSI test scores among 5- and 6-year-olds from China, Japan, and the United States. Results show the presence of a verbal and nonverbal factor structure across all three countries. Sex differences on the 10 subtests were generally consistent, with a male advantage on a subtest of spatial abilities (Mazes). Males in the Chinese sample obtained significantly higher Full Scale IQ scores than females and had lower variability in their test scores. These observations were not present in the Japan and United States samples. Mean Full Scale IQ score in the Chinese sample was 104.1, representing a 4-point increase from 1988 to 2004.
ERIC Educational Resources Information Center
Wright, Robert E.; Bachrach, Daniel G.
2003-01-01
Graduate Management Admission Test (GMAT) scores and grade point average in graduate core courses were compared for 190 male and 144 female business administration students. No significant differences in course performance were found, but males had been admitted with significantly higher GMAT scores, suggesting a bias against women. (Contains 27…
Effect of practice and training in spatial skills on embedded figures scores of males and females.
Johnson, S; Flinn, J M; Tyer, Z E
1979-06-01
The effect of practice and training in spatial skills on scores obtained by male and female students on the Embedded Figures Test was examined. Forms A and B were administered 6 wk. apart to three groups of subjects (ns = 28, 27, 27) enrolled in drafting, mathematics, and liberal arts courses. During the pretest-posttest period the drafting students received training while the other two groups served as controls. Analysis indicated (1) no initial sex difference in test scores; (2) liberal arts students differed significantly from drafting and mathematics students, but there was no significant difference between the last two groups; (3) all groups improved with practice; (4) women receiving training improved more than women who did not; (5) there was a trend toward women receiving spatial training scoring more poorly than males receiving training on the pretest, but there was no significant difference on the posttest. These results suggest that sex differences in embedded-figures scores found by many previous experimenters may have been associated with differences in prior experience in spatial skills and by a confounding of sex with area of academic study.
André, Helô-Isa; Carnide, Filomena; Moço, Andreia; Valamatos, Maria-João; Ramalho, Fátima; Santos-Rocha, Rita; Veloso, António
2018-06-05
The assessment of the plantar-flexors muscle strength in older adults (OA) is of the utmost importance since they are strongly associated with the performance of fundamental tasks of daily life. The objective was to strengthen the validity of the Calf-Raise-Senior (CRS) test by assessing the biomechanical movement pattern of calf muscles in OA with different levels of functional fitness (FF) and physical activity (PA). Twenty-six OA were assessed with CRS, a FF battery, accelerometry, strength tests, kinematics and electromyography (EMG). OA with the best and worst CRS scores were compared. The association between the scores and EMG pattern of ankle muscles was determined. OA with the best CRS scores presented higher levels of FF, PA, strength, power, speed and range of movement, and a more efficient movement pattern during the test. Subjects who scored more at the CRS test demonstrated the possibility to use a stretch-shortening cycle type of action in the PF muscles to increase power during the movements. OA with different levels of FF can be stratified by the muscular activation pattern of the calf muscles and the scores in CRS test. This study reinforced the validity of CRS for evaluating ankle strength and power in OA. Copyright © 2018 Elsevier Ltd. All rights reserved.
Glover, Mark L; Sussmane, Jeffrey B
2002-10-01
To evaluate residents' skills in performing basic mathematical calculations used for prescribing medications to pediatric patients. In 2001, a test of ten questions on basic calculations was given to first-, second-, and third-year residents at Miami Children's Hospital in Florida. Four additional questions were included to obtain the residents' levels of training, specific pediatrics intensive care unit (PICU) experience, and whether or not they routinely double-checked doses and adjusted them for each patient's weight. The test was anonymous and calculators were permitted. The overall score and the score for each resident class were calculated. Twenty-one residents participated. The overall average test score and the mean test score of each resident class was less than 70%. Second-year residents had the highest mean test scores, although there was no significant difference between the classes of residents (p =.745) or relationship between the residents' PICU experiences and their exam scores (p =.766). There was no significant difference between residents' levels of training and whether they double-checked their calculations (p =.633) or considered each patient's weight relative to the dose prescribed (p =.869). Seven residents committed tenfold dosing errors, and one resident committed a 1,000-fold dosing error. Pediatrics residents need to receive additional education in performing the calculations needed to prescribe medications. In addition, residents should be required to demonstrate these necessary mathematical skills before they are allowed to prescribe medications.
Esmaeili, Alireza; Stewart, Andrew M; Hopkins, William G; Elias, George P; Lazarus, Brendan H; Rowell, Amber E; Aughey, Robert J
2018-01-01
Aim: The sit and reach test (S&R), dorsiflexion lunge test (DLT), and adductor squeeze test (AST) are commonly used in weekly musculoskeletal screening for athlete monitoring and injury prevention purposes. The aim of this study was to determine the normal week to week variability of the test scores, individual differences in variability, and the effects of training load on the scores. Methods: Forty-four elite Australian rules footballers from one club completed the weekly screening tests on day 2 or 3 post-main training (pre-season) or post-match (in-season) over a 10 month season. Ratings of perceived exertion and session duration for all training sessions were used to derive various measures of training load via both simple summations and exponentially weighted moving averages. Data were analyzed via linear and quadratic mixed modeling and interpreted using magnitude-based inference. Results: Substantial small to moderate variability was found for the tests at both season phases; for example over the in-season, the normal variability ±90% confidence limits were as follows: S&R ±1.01 cm, ±0.12; DLT ±0.48 cm, ±0.06; AST ±7.4%, ±0.6%. Small individual differences in variability existed for the S&R and AST (factor standard deviations between 1.31 and 1.66). All measures of training load had trivial effects on the screening scores. Conclusion: A change in a test score larger than the normal variability is required to be considered a true change. Athlete monitoring and flagging systems need to account for the individual differences in variability. The tests are not sensitive to internal training load when conducted 2 or 3 days post-training or post-match, and the scores should be interpreted cautiously when used as measures of recovery.
Gender differences in illness behavior after cardiac surgery.
Modica, Maddalena; Ferratini, Maurizio; Spezzaferri, Rosa; De Maria, Renata; Previtali, Emanuele; Castiglioni, Paolo
2014-01-01
Differences in the ways male and female patients confront their illness after cardiac surgery may contribute to previously observed gender differences in the outcomes of cardiac rehabilitation. The aim of this cross-sectional study was to verify whether there are gender-related differences in illness behavior (IB) soon after cardiac surgery and before entering cardiac rehabilitation. Patients (N = 1323) completed the IB Questionnaire and Hospital Anxiety and Depression Scale (HADS) 9 ± 5 (mean ± SD) days after cardiac surgery. The scores were tested for gender differences in score distributions (Mann-Whitney U test) and in prevalence of clinically relevant scores (the Pearson χ² test). Multivariate regression analyses were made with IB Questionnaire and HADS scores as independent variables, and gender, age, education, marital status, and type of surgery as predictors. Denial was significantly (P < .01) prevalent among the men (3.6 ± 1.4) versus women (3.2 ± 1.6), whereas disease conviction (men = 2.1 ± 1.5, women = 2.5 ± 1.6), dysphoria (men = 1.5 ± 1.5, women = 2.0 ± 1.6), anxiety (men = 6.0 ± 3.6, women = 6.9 ± 3.9), and depression (men = 5.3 ± 3.8, women = 6.5 ± 4.0) were significantly more prevalent among women. The prevalences of clinically relevant scores for disease conviction, anxiety, and depression were also significantly higher in women. Multivariate analysis showed that gender predicted these scores even after the removal of confounders. Gender differences exist in denial, disease conviction, and dysphoria, probably depending on the culturally assigned roles of men and women. As these aspects of IB may compromise treatment compliance and the quality of life, the efficacy of cardiac rehabilitation programs might be improved taking into account the different prevalences in men and women.
Relationship of the functional movement screen in-line lunge to power, speed, and balance measures.
Hartigan, Erin H; Lawrence, Michael; Bisson, Brian M; Torgerson, Erik; Knight, Ryan C
2014-05-01
The in-line lunge of the Functional Movement Screen (FMS) evaluates lateral stability, balance, and movement asymmetries. Athletes who score poorly on the in-line lunge should avoid activities requiring power or speed until scores are improved, yet relationships between the in-line lunge scores and other measures of balance, power, and speed are unknown. (1) Lunge scores will correlate with center of pressure (COP), maximum jump height (MJH), and 36.6-meter sprint time and (2) there will be no differences between limbs on lunge scores, MJH, or COP. Descriptive laboratory study. Level 3. Thirty-seven healthy, active participants completed the first 3 tasks of the FMS (eg, deep squat, hurdle step, in-line lunge), unilateral drop jumps, and 36.6-meter sprints. A 3-dimensional motion analysis system captured MJH. Force platforms measured COP excursion. A laser timing system measured 36.6-m sprint time. Statistical analyses were used to determine whether a relationship existed between lunge scores and COP, MJH, and 36.6-m speed (Spearman rho tests) and whether differences existed between limbs in lunge scores (Wilcoxon signed-rank test), MJH, and COP (paired t tests). Lunge scores were not significantly correlated with COP, MJH, or 36.6-m sprint time. Lunge scores, COP excursion, and MJH were not statistically different between limbs. Performance on the FMS in-line lunge was not related to balance, power, or speed. Healthy participants were symmetrical in lunging measures and MJH. Scores on the FMS in-line lunge should not be attributed to power, speed, or balance performance without further examination. However, assessing limb symmetry appears to be clinically relevant.
Butler, Bennet A; Lawton, Cort D; Burgess, Jamie; Balderama, Earvin S; Barsness, Katherine A; Sarwark, John F
2017-12-06
Simulation-based education has been integrated into many orthopaedic residency programs to augment traditional teaching models. Here we describe the development and implementation of a combined didactic and simulation-based course for teaching medical students and interns how to properly perform a closed reduction and percutaneous pinning of a pediatric supracondylar humeral fracture. Subjects included in the study were either orthopaedic surgery interns or subinterns at our institution. Subjects all completed a combined didactic and simulation-based course on pediatric supracondylar humeral fractures. The first part of this course was an electronic (e)-learning module that the subjects could complete at home in approximately 40 minutes. The second part of the course was a 20-minute simulation-based skills learning session completed in the simulation center. Subject knowledge of closed reduction and percutaneous pinning of supracondylar humeral fractures was tested using a 30-question, multiple-choice, written test. Surgical skills were tested in the operating room or in a simulated operating room. Subject pre-intervention and post-intervention scores were compared to determine if and how much they had improved. A total of 21 subjects were tested. These subjects significantly improved their scores on both the written, multiple-choice test and skills test after completing the combined didactic and simulation module. Prior to the module, intern and subintern multiple-choice test scores were significantly worse than postgraduate year (PGY)-2 to PGY-5 resident scores (p < 0.01); after completion of the module, there was no significant difference in the multiple-choice test scores. After completing the module, there was no significant difference in skills test scores between interns and PGY-2 to PGY-5 residents. Both tests were validated using the scores obtained from PGY-2 to PGY-5 residents. Our combined didactic and simulation course significantly improved intern and subintern understanding of supracondylar humeral fractures and their ability to perform a closed reduction and percutaneous pinning of these fractures.
An interventional program for nursing staff on selected mass gathering infectious diseases at Hajj.
El-Bahnasawy, Mamdouh M; Elmeniawy, Nagwa Zein El Abdeen A; Morsy, Tosson A
2014-08-01
This work improved military nursing staff knowledge on selected mass gathering infectious diseases at Hajj. The results showed that only (20%) of the participating nurses attended training program about health hazard during pilgrim. But only (40.0%) of them found the training programs were specific to nurses. Majority found the program useful (70.0%), and the average duration of this training program in weeks was 3.5+1.1. There was significant improvement P = < 0.001, of correct knowledge about meningitis regarding causes, organisms, mode of spread, people at risk, transmission, prevention and treatment, the highest improvement was in causes of meningitis the lowest was in adult vaccination. 25% of participants had adequate knowledge (> 60% from total score) in pre-test 93% in post-test 72% after 3 month with significant difference among tests regarding adequate knowledge. There was significant improvement of correct knowledge P = <0.001 about seasonal influenza and respiratory diseases during pilgrim, the highest improvement was in influenza vaccine strains the lowest was in antiviral drugs. 23% of nurses had adequate knowledge (> 60% from total score) in pre-test 94% in post-test 66% after 3 month with significant difference among tests regarding adequate knowledge. There was significant improvement P = < 0.001 of correct knowledge about gastrointestinal diseases and food poisoning during pilgrim among nurses at military hospital, the highest improvement was in risk factors of food poisoning the lowest was in what GE patient should do. 22% of participants had adequate knowledge (> 60% from total score) in pre-test 91% in post-test 58% after 3 month with significant difference among tests regarding adequate knowledge. There was significant improvement P = < 0.001 of correct knowledge about heat exhaustion during pilgrim among nurses at military hospital, the highest improvement was in non-communicable diseases the lowest was in sun stroke prevention. 27% of participant had adequate knowledge (> 60% from total score) in the pre-test 94% in the post-test 74% after 3 month with significant difference among pre, post and FU regarding adequate knowledge. Also, there were significant improvement P = < 0.001 of correct knowledge about hypertension, dengue fever, skin scalding & others diseases during pilgrim among nurses at military hospital, the highest improvement was in skin scalding prevention the lowest was in first aid bag. 28% of participant had adequate knowledge (> 60% from total score) in the pre-test 92% in the post-test 61% after 3 month with significant difference among pre, post and FU regarding adequate knowledge. There was a significant difference between total knowledge score according to education, and work experience (P > 0.05). in the pre, post and after 3 month in age and in all intervention time in department the highest was ICU then ward then operation room.
Socioeconomic Status and MMPI-2 Interpretation.
ERIC Educational Resources Information Center
Long, Kathleen A.; And Others
1994-01-01
Examined differences in Minnesota Multiphasic Personality Inventory-2 (MMPI-2) scores between persons of differing educational levels and family income in the MMPI-2 normative sample to determine if MMPI-2 scores are differentially accurate in predicting relevant extra-test characteristics of persons of differing socioeconomic levels. MMPI-2…
Emotional Intelligence Abilities and Traits in Different Career Paths
ERIC Educational Resources Information Center
Kafetsios, Konstantinos; Maridaki-Kassotaki, Aikaterini; Zammuner, Vanda L.; Zampetakis, Leonidas A.; Vouzas, Fotios
2009-01-01
Two studies tested hypotheses about differences in emotional intelligence (EI) abilities and traits between followers of different career paths. Compared to their social science peers, science students had higher scores in adaptability and general mood traits measured with the Emotion Quotient Inventory, but lower scores in strategic EI abilities…
Are overreferrals on developmental screening tests really a problem?
Glascoe, F P
2001-01-01
Developmental screening tests, even those meeting standards for screening test accuracy, produce numerous false-positive results for 15% to 30% of children. This is thought to produce unnecessary referrals for diagnostic testing or special services and increase the cost of screening programs. To explore whether children who pass screening tests differ in important ways from those who do not and to determine whether children overreferred for testing benefit from the scrutiny of diagnostic testing and treatment planning. Subjects were a national sample of 512 parents and their children (age range of the children, 7 months to 8 years) who participated in validation studies of various screening tests. Psychological examiners adhering to standardized directions obtained informed consent and administered at least 2 developmental screening measures (the Brigance Screens, the Battelle Developmental Inventory Screening Test, the Denver-II, and the Parents' Evaluations of Developmental Status) and a concurrent battery of diagnostic measures, including tests of intelligence, language, and academic achievement (for children aged 2(1/2) years and older). The performance on diagnostic measures of children who failed screening but were not found to have a disability (false positives) was compared with that of children who passed screening and did not have a disability on diagnostic testing (true negatives). Children with false-positive scores performed significantly (P<.001) lower on diagnostic measures than did children with true-negative scores. The false-positive group had scores in adaptive behavior, language, intelligence, and academic achievement that were 9 to 14 points lower than the scores of those in the true-negative group. When viewing the likelihood of scoring below the 25th percentile on diagnostic measures, children with false-positive scores had a relative risk of 2.6 in adaptive behavior (95% confidence interval [CI], 1.67-4.21), 3.1 in language skills (95% CI, 1.90-5.20), 6.7 on intelligence tests (95% CI, 3.28-13.50), and 4.9 on academic measures (95% CI, 2.61-9.28). Overall, 151 (70%) of the children with false-positive results scored below the 25th percentile on 1 or more diagnostic measures (the point at which most children have difficulty benefiting from typical classroom instruction) in contrast with 64 (29%) of the children with true-negative scores (odds ratio, 5.6; 95% CI, 3.73-8.49). Children with false-positive scores were also more likely to be nonwhite and to have parents who had not graduated from high school. Performance differences between children with true-negative scores and children with false-positive scores continued to be significant (P<.001) even after adjusting for sociodemographic differences between groups. Children overreferred for diagnostic testing by developmental screens perform substantially lower than children with true-negative scores on measures of intelligence, language, and academic achievement-the 3 best predictors of school success. These children also carry more psychosocial risk factors, such as limited parental education and minority status. Thus, children with false-positive screening results are an at-risk group for whom diagnostic testing may not be an unnecessary expense but rather a beneficial and needed service that can help focus intervention efforts. Although such testing will not indicate a need for special education placement, it can be useful in identifying children's needs for other programs known to improve language, cognitive, and academic skills, such as Head Start, Title I services, tutoring, private speech-language therapy, and quality day care.
NASA Astrophysics Data System (ADS)
Young, Jerry Wayne
The purpose of this study was to determine the effects of four instructional methods (direct instruction, computer-aided instruction, video observation, and microcomputer-based lab activities), gender, and time of testing (pretest, immediate posttest for determining the immediate effect of instruction, and a delayed posttest two weeks later to determine the retained effect of the instruction) on the achievement of sixth graders who were learning to interpret graphs of displacement and velocity. The dependent variable of achievement was reflected in the scores earned by students on a testing instrument of established validity and reliability. The 107 students participating in the study were divided by gender and were then randomly assigned to the four treatment groups, each taught by a different teacher. Each group had approximately equal numbers of males and females. The students were pretested and then involved in two class periods of the instructional method which was unique to their group. Immediately following treatment they were posttested and two weeks later they were posttested again. The data in the form of test scores were analyzed with a two-way split-plot analysis of variance to determine if there was significant interaction among technique, gender, and time of testing. When significant interaction was indicated, the Tukey HSD test was used to determine specific mean differences. The results of the analysis indicated no gender effect. Only students in the direct instruction group and the microcomputer-based laboratory group had significantly higher posttest-1 scores than pretest scores. They also had significantly higher posttest-2 scores than pretest scores. This suggests that the learning was retained. The other groups experienced no significant differences among pretest, posttest-1, and posttest-2 scores. Recommendations are that direct instruction and microcomputer-based laboratory activities should be considered as effective stand-alone methods for teaching sixth grade students to interpret graphs of displacement and velocity. However, video and computer instruction may serve as supplemental activities.
Roth, Alexandra K; Denney, Douglas R; Lynch, Sharon G
2015-01-01
The Attention Network Test (ANT) assesses attention in terms of discrepancies between response times to items that differ in the burden they place on some facet of attention. However, simple arithmetic difference scores commonly used to capture these discrepancies fail to provide adequate control for information processing speed, leading to distorted findings when patient and control groups differ markedly in the speed with which they process and respond to stimulus information. This study examined attention networks in patients with multiple sclerosis (MS) using simple difference scores, proportional scores, and residualized scores that control for processing speed through statistical regression. Patients with relapsing-remitting (N = 20) or secondary progressive (N = 20) MS and healthy controls (N = 40) of similar age, education, and gender completed the ANT. Substantial differences between patients and controls were found on all measures of processing speed. Patients exhibited difficulties in the executive control network, but only when difference scores were considered. When deficits in information processing speed were adequately controlled using proportional or residualized score, deficits in the alerting network emerged. The effect sizes for these deficits were notably smaller than those for overall information processing speed and were also limited to patients with secondary progressive MS. Deficits in processing speed are more prominent in MS than those involving attention, and when the former are properly accounted for, differences in the latter are confined to the alerting network.
Video as an Effective Method to Deliver Pre-Test Information for Rapid HIV Testing
Clark, Melissa A.; Mayer, Kenneth H.; Seage, George R.; DeGruttola, Victor G.; Becker, Bruce M.
2008-01-01
Objectives Video-based delivery of HIV pre-test information might assist in streamlining HIV screening and testing efforts in the emergency department (ED). The objectives of this study were to determine if the video “Do you know about rapid HIV testing?” is an acceptable alternative to an in-person information session on rapid HIV pre-test information, in regards to comprehension of rapid HIV pre-test fundamentals; and to identify patients who might have difficulties in comprehending pre-test information. Methods This was a non-inferiority trial of 574 participants in an ED opt-in rapid HIV screening program who were randomly assigned to receive identical pre-test information from either an animated and live-action 9.5-minute video, or an in-person information session. Pre-test information comprehension was assessed using a questionnaire. The video would be accepted as not inferior to the in-person information session if the 95% confidence interval (CI) of the difference (Δ) in mean scores on the questionnaire between the two information groups was less than a 10% decrease in the in-person information session arm's mean score. Linear regression models were constructed to identify patients with lower mean scores based upon study arm assignment, demographic characteristics, and history of prior HIV testing. Results The questionnaire mean scores were 20.1 (95% CI = 19.7 to 20.5) for the video arm and 20.8 (95% CI = 20.4 to 21.2) for the in-person information session arm. The difference in mean scores compared to the mean score for the in-person information session met the non-inferiority criterion for this investigation (Δ = 0.68; 95% CI = 0.18 to 1.26). In a multivariable linear regression model, Blacks/African Americans, Hispanics, and those with Medicare and Medicaid insurance exhibited slightly lower mean scores, regardless of the pre-test information delivery format. There was a strong relationship between fewer years of formal education and lower mean scores on the questionnaire. Age, gender, type of insurance, partner/marital status, and history of prior HIV testing were not predictive of scores on the questionnaire. Conclusions In terms of patient comprehension of rapid HIV pre-test information fundamentals, the video was an acceptable substitute to pre-test information delivered by an HIV test counselor. Both the video and in-person information session were less effective in providing pre-test information for patients with fewer years of formal education. PMID:19120050
NASA Astrophysics Data System (ADS)
Jeffery, Samuel Shird
There is a correlation between the socioeconomic status of secondary schools and scores on the State of Ohio's mandated secondary science proficiency tests. In low scoring schools many reasons effectively explain the low test scores as a result of the low socioeconomics. For example, one reason may be that many students are working late hours after school to help with family finances; parents may simply be too busy providing family income to realize the consequences of the testing program. There are many other personal issues students face that may cause them to score poorly an the test. The perceptions of their teachers regarding the science proficiency test program may be one significant factor. These teacher perceptions are the topic of this study. Two sample groups ware established for this study. One group was science teachers from secondary schools scoring 85% or higher on the 12th grade proficiency test in the academic year 1998--1999. The other group consisted of science teachers from secondary schools scoring 35% or less in the same academic year. Each group of teachers responded to a survey instrument that listed several items used to determine teachers' perceptions of the secondary science proficiency test. A significant difference in the teacher' perceptions existed between the two groups. Some of the ranked items on the form include teachers' opinions of: (1) Teaching to the tests; (2) School administrators' priority placed on improving average test scores; (3) Teacher incentive for improving average test scores; (4) Teacher teaching style change as a result of the testing mandate; (5) Teacher knowledge of State curriculum model; (6) Student stress as a result of the high-stakes test; (7) Test cultural bias; (8) The tests in general.
Specific and diversive curiosity in gifted elementary students.
Johnson, L; Beer, J
1992-10-01
Twenty-nine gifted students in Grades 2 to 6 from the small school districts in north central Kansas completed the Maze test and the Which-to-Discuss test. Background information such as age, sex, grade, and marital status of parents was also collected. There were no significant differences between boys and girls or for students from divorced and nondivorced parents on either the Which-to-Discuss test (specific curiosity) or the Maze test scores (diversive curiosity). The students scored significantly higher on the former test than chance guessing which suggests the students were displaying specific curiosity. Scores of these gifted students on these two tests of curiosity were significantly and positively correlated.
Reedman, Sarah Elizabeth; Beagley, Simon; Sakzewski, Leanne; Boyd, Roslyn N
2016-08-01
The aim of this pilot study was to evaluate reproducibility of the Jebsen Taylor Test of Hand Function (JTTHF) in children. Eighty-seven typically developing children 5 to 10 years old were included from five Outside School Hours Care centers in the Greater Brisbane Region, Australia. Hand function was assessed on two occasions with a modified JTTHF, then reproducibility was assessed using Intraclass Correlation Coefficient (ICC [3,1]) and the Standard Error of Measurement (SEM). Total scores for male and female children were not significantly different. Five-year-old children were significantly different to all other age groups and were excluded from further analysis. Results for 71 children, 6 to 10 years old were analyzed (mean age 8.31 years (SD 1.32); 33 males). Test-retest reliability for total scores on the dominant and nondominant hands were ICC 0.74 (95% CI 0.61, 0.83) and ICC 0.72 (95% CI 0.59, 0.82), respectively. 'Writing' and 'Simulated Feeding' subtests demonstrated poor reproducibility. The Smallest Real Difference was 5.09 seconds for total score on the dominant hand. Findings indicate good test-retest reliability for the JTTHF total score to measure hand function in typically developing children aged 6 to 10 years.
Glaister, Mark; Stone, Michael H; Stewart, Andrew M; Hughes, Michael; Moir, Gavin L
2004-08-01
The purpose of the present study was to assess the reliability and validity of fatigue measures, as derived from 4 separate formulae, during tests of repeat sprint ability. On separate days over a 3-week period, 2 groups of 7 recreationally active men completed 6 trials of 1 of 2 maximal (20 x 5 seconds) intermittent cycling tests with contrasting recovery periods (10 or 30 seconds). All trials were conducted on a friction-braked cycle ergometer, and fatigue scores were derived from measures of mean power output for each sprint. Apart from formula 1, which calculated fatigue from the percentage difference in mean power output between the first and last sprint, all remaining formulae produced fatigue scores that showed a reasonably good level of test-retest reliability in both intermittent test protocols (intraclass correlation range: 0.78-0.86; 95% likely range of true values: 0.54-0.97). Although between-protocol differences in the magnitude of the fatigue scores suggested good construct validity, within-protocol differences highlighted limitations with each formula. Overall, the results support the use of the percentage decrement score as the most valid and reliable measure of fatigue during brief maximal intermittent work.
Making the Cut in Gifted Selection: Score Combination Rules and Their Impact on Program Diversity
ERIC Educational Resources Information Center
Lakin, Joni M.
2018-01-01
The recommendation of using "multiple measures" is common in policy guidelines for gifted and talented assessment systems. However, the integration of multiple test scores in a system that uses cut-scores requires choosing between different methods of combining quantitative scores. Past research has indicated that OR combination rules…
Chevalier, Thérèse M.; Stewart, Garth; Nelson, Monty; McInerney, Robert J.; Brodie, Norman
2016-01-01
It has been well documented that IQ scores calculated using Canadian norms are generally 2–5 points lower than those calculated using American norms on the Wechsler IQ scales. However, recent findings have demonstrated that the difference may be significantly larger for individuals with certain demographic characteristics, and this has prompted discussion about the appropriateness of using the Canadian normative system with a clinical population in Canada. This study compared the interpretive effects of applying the American and Canadian normative systems in a clinical sample. We used a multivariate analysis of variance (ANOVA) to calculate differences between IQ and Index scores in a clinical sample, and mixed model ANOVAs to assess the pattern of differences across age and ability level. As expected, Full Scale IQ scores calculated using Canadian norms were systematically lower than those calculated using American norms, but differences were significantly larger for individuals classified as having extremely low or borderline intellectual functioning when compared with those who scored in the average range. Implications of clinically different conclusions for up to 52.8% of patients based on these discrepancies highlight a unique dilemma facing Canadian clinicians, and underscore the need for caution when choosing a normative system with which to interpret WAIS-IV results in the context of a neuropsychological test battery in Canada. Based on these findings, we offer guidelines for best practice for Canadian clinicians when interpreting data from neuropsychological test batteries that include different normative systems, and suggestions to assist with future test development. PMID:27246955
Umphress, Thomas B
2008-06-01
Twenty people with suspected intellectual disability took the Reynolds Intellectual Assessment Scales (RIAS; C. R. Reynolds & R. W. Kamphaus, 1998) and the Wechsler Adult Intelligence Scale-3rd Edition (WAIS-III; D. Wechsler, 1997) to see if the 2 IQ tests produced comparable results. A t test showed that the RIAS Composite Intelligence Index scores were significantly higher than WAIS-III Full Scale IQ scores at the alpha level of .01. There was a significant difference between the RIAS Nonverbal Intelligence and WAIS-III Performance Scale, but there was no significant difference between the RIAS Verbal Intelligence Index and the WAIS-III Verbal Scale IQ. The results raise questions concerning test selection for diagnosing intellectual disability and the use of the correlation statistic for comparing intelligence tests.
Sargénius, Hanna L; Bylsma, Frederick W; Lydersen, Stian; Hestad, Knut
2017-01-01
The aims of this study were to investigate visual-construction and organizational strategy among individuals with severe obesity, as measured by the Rey Complex Figure Test (RCFT), and to examine the validity of the Q-score as a measure for the quality of performance on the RCFT. Ninety-six non-demented morbidly obese (MO) patients and 100 healthy controls (HC) completed the RCFT. Their performance was calculated by applying the standard scoring criteria. The quality of the copying process was evaluated per the directions of the Q-score scoring system. Results revealed that the MO did not perform significantly lower than the HC on Copy accuracy (mean difference -0.302, CI -1.374 to 0.769, p = 0.579). In contrast, the groups did statistically differ from each other, with MO performing poorer than the HC on the Q-score (mean -1.784, CI -3.237 to -0.331, p = 0.016) and the Unit points (mean -1.409, CI -2.291 to -0.528, p = 0.002), but not on the Order points score (mean -0.351, CI -0.994 to 0.293, p = 0.284). Differences on the Unit score and the Q-score were slightly reduced when adjusting for gender, age, and education. This study presents evidence supporting the presence of inefficiency in visuospatial constructional ability among MO patients. We believe we have found an indication that the Q-score captures a wider range of cognitive processes that are not described by traditional scoring methods. Rather than considering accuracy and placement of the different elements only, the Q-score focuses more on how the subject has approached the task.
ERIC Educational Resources Information Center
Guo, Hongwen; Zu, Jiyun; Kyllonen, Patrick; Schmitt, Neal
2016-01-01
In this report, systematic applications of statistical and psychometric methods are used to develop and evaluate scoring rules in terms of test reliability. Data collected from a situational judgment test are used to facilitate the comparison. For a well-developed item with appropriate keys (i.e., the correct answers), agreement among various…
Real Time Cockpit Resource Management (CRM) Training
2010-10-01
to post-test. Table 4 Learning Scores for the Five Spiral 1 Classes Spiral 1 Class Pilots Sensors Pretest Posttest Difference Pretest Posttest ...results from the five Spiral 1 classes. Table 6 Pretest / Posttest Gain Scores Associated with Each Learning Test Item Test Item Class Item...SMALL BUSINESS INNOVATION RESEARCH (SBIR) PHASE II REPORT. Distribution A: Approved for public release; distribution unlimited. (Approval given
The Recognition Memory Test Examination of ethnic differences and norm validity.
O'Bryant, Sid E; Hilsabeck, Robin C; McCaffrey, Robert J; Drew Gouvier, Wm
2003-03-01
The possibility of racial bias in neuropsychological test materials has received increasing attention in recent years. The purpose of the present study was to investigate whether an own-race recognition bias would provide an advantage for Caucasian participants over African American participants on the Faces subtest of the Recognition Memory Test (RMT). Thirty Caucasian and 30 African American undergraduates completed the RMT, Shipley Institute of Living Scale (SILS), and Symbol Digit Modalities Test (SDMT). No significant group difference was found on RMT Faces. However, mean RMT Faces scores for both groups were below the 10th percentile in spite of average scores on the SDMT and SILS. A second study was conducted to further examine the validity of the RMT norms for this age range (i.e., 18-24) and to provide 2-week test-retest reliabilities. The mean RMT Faces subtest score was 39.78 (10th percentile), and 28% of the sample scored at or below the fifth percentile. Test-retest reliabilities were.63 and.64 for RMT Words and Faces, respectively. Results of these studies suggest that re-examination of the current norms for RMT Faces is warranted for adults aged.
Sharma, Geeta; Noohu, Majumi Mohamad
2014-09-01
Cryotherapy, in the form of ice massge is used to reduce inflammation after acute musculoskeletal injury or trauma. The potential negative effects of ice massage on proprioception are unknown, despite equivocal evidence supporting its effectiveness. The purpose of the study was to test the influence of cooling on weight discrimination ability and hence the performance in footballers. The study was of same subject experimental design (pretest-posttest design). Thirty male collegiate football players, whose mean age was 21.07 years, participated in the study. The participants were assessed for two functional performance tests, single leg hop test and crossed over hop test and weight discrimination ability before and after ice massage for 5 minutes on hamstrings muscle tendon. Pre cooling scores of Single Leg Hop Test of the dominant leg in the subjects was 166.65 (± 10.16) cm and post cooling scores of the dominant leg was 167.25 (± 11.77) cm. Pre cooling scores of Crossed Over Hop Test of the dominant leg in the subjects was 174.14 (± 8.60) cm and post cooling scores of the dominant leg was 174.45 (± 9.28) cm. Pre cooling scores of Weight Discrimination Differential Threshold of the dominant leg in the subjects was 1.625 ± 1.179 kg compared with post cooling scores of the dominant leg 1.85 (± 1.91) kg. Pre cooling scores of single leg hop and crossed over hop test of the dominant leg in the subjects compared with post cooling scores of the dominant leg showed no significant differences and it was also noted that the weight discrimination ability (weight discrimination differential threshold) didn't show any significant difference. All the values are reported as mean ± SD. This study provides additional evidence that proprioceptive acuity in the hamstring muscles (biceps femoris) remains largely unaffected after ice application to the hamstrings tendon (biceps femoris).
Hong, Hye Jeong; Kim, Jin Sung; Seo, Wan Seok; Koo, Bon Hoon; Bai, Dai Seg; Jeong, Jin Young
2010-01-01
Objective We investigated executive functions (EFs), as evaluated by the Wisconsin Card Sorting Test (WCST), and other EF between lower grades (LG) and higher grades (HG) in elementary-school-age attention deficit hyperactivity disorder (ADHD) children. Methods We classified a sample of 112 ADHD children into 4 groups (composed of 28 each) based on age (LG vs. HG) and WCST performance [lower vs. higher performance on WCST, defined by the number of completed categories (CC)] Participants in each group were matched according to age, gender, ADHD subtype, and intelligence. We used the Wechsler intelligence Scale for Children 3rd edition to test intelligence and the Computerized Neurocognitive Function Test-IV, which included the WCST, to test EF. Results Comparisons of EFs scores in LG ADHD children showed statistically significant differences in performing digit spans backward, some verbal learning scores, including all memory scores, and Stroop test scores. However, comparisons of EF scores in HG ADHD children did not show any statistically significant differences. Correlation analyses of the CC and EF variables and stepwise multiple regression analysis in LG ADHD children showed a combination of the backward form of the Digit span test and Visual span test in lower-performance ADHD participants significantly predicted the number of CC (R2=0.273, p<0.001). Conclusion This study suggests that the design of any battery of neuropsychological tests for measuring EF in ADHD children should first consider age before interpreting developmental variations and neuropsychological test results. Researchers should consider the dynamics of relationships within EF, as measured by neuropsychological tests. PMID:20927306
The Black-White Test Score Gap.
ERIC Educational Resources Information Center
Jencks, Christopher, Ed.; Phillips, Meredith, Ed.
The 15 chapters of this book address issues related to the continuing test score gap between black and white students. The editors argue against traditional explanations which emphasize differences in economic resources and demographic factors, and they urge that more emphasis be put on psychological and cultural factors. The book suggests studies…
Academic performance in adolescents born after ART-a nationwide registry-based cohort study.
Spangmose, A L; Malchau, S S; Schmidt, L; Vassard, D; Rasmussen, S; Loft, A; Forman, J; Pinborg, A
2017-02-01
Is academic performance in adolescents aged 15-16 years and conceived after ART, measured as test scores in ninth grade, comparable to that for spontaneously conceived (SC) adolescents? ART singletons had a significantly lower mean test score in the adjusted analysis when compared with SC singletons, yet the differences were small and probably not of clinical relevance. Previous studies have shown similar intelligence quotient (IQ) levels in ART and SC children, but only a few have been on adolescents. Academic performance measured with standardized national tests has not previously been explored in a complete national cohort of adolescents conceived after ART. A Danish national registry-based cohort including all 4766 ART adolescents (n = 2836 singletons and n = 1930 twins) born in 1995-1998 were compared with two SC control cohorts: a randomly selected singleton population (n = 5660) and all twins (n = 7064) born from 1995 to 1998 in Denmark. Nine children who died during the follow-up period were excluded from the study. Mean test scores on a 7-point-marking scale from -3 to 12 were compared, and adjustments were made for relevant reproductive and socio-demographic covariates including occupational and educational level of the parents. The crude mean test score was higher in both ART singletons and ART twins compared with SC adolescents. The crude mean differences were +0.41 (95% CI 0.30-0.53) and +0.45 (95% CI 0.28-0.62) between ART and SC singletons and between ART and SC twins, respectively. However, the adjusted mean overall test score was significantly lower for ART singletons compared with SC singletons (adjusted mean difference -0.15 (95% CI -0.29-(-0.02))). For comparison, the adjusted mean difference was +2.05 (95% CI 1.82-2.28) between the highest and the lowest parental educational level, suggesting that the effect of ART is weak compared with the conventional predictors. The adjusted analyses showed significantly lower mean test scores in mathematics and physics/chemistry for ART singletons compared with SC singletons. Comparing ART twins with SC twins yielded no difference in academic performance in the adjusted analyses. Similar crude and adjusted overall mean test scores were found when comparing ART singletons and ART twins. Missing data on educational test scores occurred in 6.6% of adolescents aged 15-16 years for the birth cohorts 1995-1997, where all of the children according to their age should have passed the ninth grade exam at the time of data retrieval. As sensitivity analyses yielded no significant difference in the adjusted risk of having missing test scores between any of the groups, it is unlikely that this should bias our results. Adjustment for body mass index and smoking during pregnancy was not possible. As our results are based on national data, our findings can be applied to other populations. The findings of this paper suggest that a possible small negative effect of parental subfertility or ART treatment is counterbalanced by the higher educational level in the ART parents. The Danish Medical Association in Copenhagen (KMS) funded this study with a scholarship grant. None of the authors had any competing interests. 704676. © The Author 2017. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Gu, Yuqi; Witter, Tobias; Livingston, Patty; Rao, Purnima; Varshney, Terry; Kuca, Tom; Dylan Bould, M
2017-12-01
As simulator fidelity (i.e., realism) increases from low to high, the simulator more closely resembles the real environment, but it also becomes more expensive. It is generally assumed that the use of high-fidelity simulators results in better learning; however, the effect of fidelity on learning non-technical skills (NTS) is unknown. This was a non-inferiority trial comparing the efficacy of high- vs low-fidelity simulators on learning NTS. Thirty-six postgraduate medical trainees were recruited for the trial. During the pre-test phase, the trainees were randomly assigned to manage a scenario using either a high-fidelity simulator (HFS) or a low-fidelity simulator (LFS), followed by expert debriefing. All trainees then underwent a video recorded post-test scenario on a HFS, and the NTS were assessed between the two groups. The primary outcome was the overall post-test Ottawa Global Rating Scale (OGRS), while controlling for overall pre-test OGRS scores. Non-inferiority between the LFS and HFS was based on a non-inferiority margin of greater than 1. For our primary outcome, the mean (SD) post-test overall OGRS score was not significantly different between the HFS and LFS groups after controlling for pre-test overall OGRS scores [3.8 (0.9) vs 4.0 (0.9), respectively; mean difference, 0.2; 95% confidence interval, -0.4 to 0.8; P = 0.48]. For our secondary outcomes, the post-test total OGRS score was not significantly different between the HFS and LFS groups after controlling for pre-test total OGRS scores (P = 0.33). There were significant improvements in mean overall (P = 0.01) and total (P = 0.003) OGRS scores from pre-test to post-test. There were no significant associations between postgraduate year (P = 0.82) and specialty (P = 0.67) on overall OGRS performance. This study suggests that low-fidelity simulators are non-inferior to the more costly high-fidelity simulators for teaching NTS to postgraduate medical trainees.
NASA Astrophysics Data System (ADS)
Ingram, Sandra W.
This quantitative comparative descriptive study involved analyzing archival data from end-of-course (EOC) test scores in biology of English language learners (ELLs) taught or not taught using the sheltered instruction observation protocol (SIOP) model. The study includes descriptions and explanations of the benefits of the SIOP model to ELLs, especially in content area subjects such as biology. Researchers have shown that ELLs in high school lag behind their peers in academic achievement in content area subjects. Much of the research on the SIOP model took place in elementary and middle school, and more research was necessary at the high school level. This study involved analyzing student records from archival data to describe and explain if the SIOP model had an effect on the EOC test scores of ELLs taught or not taught using it. The sample consisted of 527 Hispanic students (283 females and 244 males) from Grades 9-12. An independent sample t-test determined if a significant difference existed in the mean EOC test scores of ELLs taught using the SIOP model as opposed to ELLs not taught using the SIOP model. The results indicated that a significant difference existed between EOC test scores of ELLs taught using the SIOP model and ELLs not taught using the SIOP model (p = .02). A regression analysis indicated a significant difference existed in the academic performance of ELLs taught using the SIOP model in high school science, controlling for free and reduced-price lunch (p = .001) in predicting passing scores on the EOC test in biology at the school level. The data analyzed for free and reduced-price lunch together with SIOP data indicated that both together were not significant (p = .175) for predicting passing scores on the EOC test in high school biology. Future researchers should repeat the study with student-level data as opposed to school-level data, and data should span at least three years.
Bernard, Larry C; Walsh, R Patricia
2002-10-01
The present study replicated and extended earlier research on temporal sampling effects in university subject pools. Data were obtained from 236 participants, 79 men and 157 women, in a university subject pool during a 15-wk. semester. Without knowing the purpose of the study, participants self-selected to participate earlier (Weeks 4 and 5; n = 105) or later (Weeks 14 and 15; n = 131). Three hypotheses were investigated: (1) that the personality patterns of earlier and later participants on the NEO Personality Inventory-Revised and the Personality Research Form differ significantly, with earlier participants scoring higher on the latter scales reflecting social responsibility and higher on former Conscientiousness and Neuroticism scales; (2) that there are similar significant differences between participants in the earlier and later groups compared to the male and female college normative samples for the two tests: and (3) that earlier participants will have higher actual Scholastic Assessment Test scores and Grade Point Averages. Also investigated was whether participants' foreknowledge that their actual Scholastic Assessment Test scores and Grade Point Averages would be obtained would affect their accuracy of self-report. In contrast to prior research, neither the first nor second hypothesis was supported by the current study; there do not appear to be consistent differences on personality variables. However, the third hypothesis was supported. Earlier participants had higher actual high school Grade Point Average, college Grade Point Average, and Scholastic Assessment Test Verbal scores. Foreknowledge that actual Scholastic Assessment Test scores and Grade Point Averages would be obtained did not affect the accuracy of self-report. In addition, later participants significantly over-reported their scores, and significantly more women than men and more first-year than senior-year subjects participated in the early group.
Endarti, Dwi; Riewpaiboon, Arthorn; Thavorncharoensap, Montarat; Praditsitthikorn, Naiyana; Hutubessy, Raymond; Kristina, Susi Ari
2018-05-01
To gain insight into the most suitable foreign value set among Malaysian, Singaporean, Thai, and UK value sets for calculating the EuroQol five-dimensional questionnaire index score (utility) among patients with cervical cancer in Indonesia. Data from 87 patients with cervical cancer recruited from a referral hospital in Yogyakarta province, Indonesia, from an earlier study of health-related quality of life were used in this study. The differences among the utility scores derived from the four value sets were determined using the Friedman test. Performance of the psychometric properties of the four value sets versus visual analogue scale (VAS) was assessed. Intraclass correlation coefficients and Bland-Altman plots were used to test the agreement among the utility scores. Spearman ρ correlation coefficients were used to assess convergent validity between utility scores and patients' sociodemographic and clinical characteristics. With respect to known-group validity, the Kruskal-Wallis test was used to examine the differences in utility according to the stages of cancer. There was significant difference among utility scores derived from the four value sets, among which the Malaysian value set yielded higher utility than the other three value sets. Utility obtained from the Malaysian value set had more agreements with VAS than the other value sets versus VAS (intraclass correlation coefficients and Bland-Altman plot tests results). As for the validity, the four value sets showed equivalent psychometric properties as those that resulted from convergent and known-group validity tests. In the absence of an Indonesian value set, the Malaysian value set was more preferable to be used compared with the other value sets. Further studies on the development of an Indonesian value set need to be conducted. Copyright © 2018. Published by Elsevier Inc.
ERIC Educational Resources Information Center
Bynum, K. Megan
2017-01-01
This study examined the relationship between personality differences between preceptor and athletic training student to evaluation scores. The personality differences of seven preceptors and their paired ATS were measured using the Myers-Briggs Type Indicator test. From the quantitative findings, we cannot conclude at this time a relationship…
Evaluating Equity at the Local Level Using Bootstrap Tests. Research Report 2016-4
ERIC Educational Resources Information Center
Kim, YoungKoung; DeCarlo, Lawrence T.
2016-01-01
Because of concerns about test security, different test forms are typically used across different testing occasions. As a result, equating is necessary in order to get scores from the different test forms that can be used interchangeably. In order to assure the quality of equating, multiple equating methods are often examined. Various equity…
Oren, Carmel; Kennet-Cohen, Tamar; Turvall, Elliot; Allalouf, Avi
2014-01-01
The Psychometric Entrance Test (PET), used for admission to higher education in Israel together with the Matriculation (Bagrut), had in the past one general (total) score in which the weights for its domains: Verbal, Quantitative and English, were 2:2:1, respectively. In 2011, two additional total scores were introduced, with different weights for the Verbal and the Quantitative domains. This study compares the predictive validity of the three general scores of PET, and demonstrates validity in terms of utility. 100,863 freshmen students of all Israeli universities over the classes of 2005-2009. Regression weights and correlations of the predictors with FYGPA were computed. Simulations based on these results supplied the utility estimates. On average, PET is slightly more predictive than the Bagrut; using them both yields a better tool than either of them alone. Assigning differential weights to the components in the respective schools further improves the validity. The introduction of the new general scores of PET is validated by gathering and analyzing evidence based on relations of test scores to other variables. The utility of using the test can be demonstrated in ways different from correlations.
[Validity criteria of a short test to assess speech and language competence in 4-year-olds].
Euler, H A; Holler-Zittlau, I; Minnen, S; Sick, U; Dux, W; Zaretsky, Y; Neumann, K
2010-11-01
A psychometrically constructed short test as a prerequisite for screening was developed on the basis of a revision of the Marburger Speech Screening to assess speech/language competence among children in Hessen (Germany). A total of 257 children (age 4.0 to 4.5 years) performed the test battery for speech/language competence; 214 children repeated the test 1 year later. Test scores correlated highly with scores of two competing language screenings (SSV, HASE) and with a combined score from four diagnostic tests of individual speech/language competences (Reynell III, patholinguistic diagnostics in impaired language development, PLAKSS, AWST-R). Validity was demonstrated by three comparisons: (1) Children with German family language had higher scores than children with another language. (2) The 3-month-older children achieved higher scores than younger children. (3) The difference between the children with German family language and those with another language was higher for the 3-month-older than for the younger children. The short test assesses the speech/language competence of 4-year-olds quickly, validly, and comprehensively.
The Kernel Levine Equipercentile Observed-Score Equating Function. Research Report. ETS RR-13-38
ERIC Educational Resources Information Center
von Davier, Alina A.; Chen, Haiwen
2013-01-01
In the framework of the observed-score equating methods for the nonequivalent groups with anchor test design, there are 3 fundamentally different ways of using the information provided by the anchor scores to equate the scores of a new form to those of an old form. One method uses the anchor scores as a conditioning variable, such as the Tucker…
Walters, Steven O; Weaver, Kenneth A
2003-06-01
The Kaufman Brief Intelligence Test detects learning problems of young students and is a screen for whether a more comprehensive test of intelligence is needed. A study to assess whether this test was valid as an adult intelligence test was conducted with 20 undergraduate psychology majors. The correlations between the Kaufman Brief Intelligence Test's Composite, Vocabulary, and Matrices test scores and their corresponding Wechsler Adult Intelligence Scale-Third Edition test scores, the Full Scale (r=.88), Verbal (r=.77), and Performance scores (r=.87), indicated very strong relationships. In addition, no significant differences were obtained between the Composite, Vocabulary, and Matrices means of the Kaufman Brief Intelligence Test and the Full Scale, Verbal, and Performance means of the WAIS-III. The Kaufman Brief Intelligence Test appears to be a valid test of intelligence for adults.
ERIC Educational Resources Information Center
Bracey, Gerald W.
1997-01-01
Singapore students scored highest on the Third International Mathematics and Science Study. Any nation that "outsources" its poverty (Malaysian street sweepers) and its low-achievers (who study in Malaysia) can get high test scores. U.S./Japan score differences stem from Japan's effective teaching practices. Among 13 occupations in the…
Moore, Tyler M.; Reise, Steven P.; Roalf, David R.; Satterthwaite, Theodore D.; Davatzikos, Christos; Bilker, Warren B.; Port, Allison M.; Jackson, Chad T.; Ruparel, Kosha; Savitt, Adam P.; Baron, Robert B.; Gur, Raquel E.; Gur, Ruben C.
2016-01-01
Traditional “paper-and-pencil” testing is imprecise in measuring speed and hence limited in assessing performance efficiency, but computerized testing permits precision in measuring itemwise response time. We present a method of scoring performance efficiency (combining information from accuracy and speed) at the item level. Using a community sample of 9,498 youths age 8-21, we calculated item-level efficiency scores on four neurocognitive tests, and compared the concurrent, convergent, discriminant, and predictive validity of these scores to simple averaging of standardized speed and accuracy-summed scores. Concurrent validity was measured by the scores' abilities to distinguish men from women and their correlations with age; convergent and discriminant validity were measured by correlations with other scores inside and outside of their neurocognitive domains; predictive validity was measured by correlations with brain volume in regions associated with the specific neurocognitive abilities. Results provide support for the ability of itemwise efficiency scoring to detect signals as strong as those detected by standard efficiency scoring methods. We find no evidence of superior validity of the itemwise scores over traditional scores, but point out several advantages of the former. The itemwise efficiency scoring method shows promise as an alternative to standard efficiency scoring methods, with overall moderate support from tests of four different types of validity. This method allows the use of existing item analysis methods and provides the convenient ability to adjust the overall emphasis of accuracy versus speed in the efficiency score, thus adjusting the scoring to the real-world demands the test is aiming to fulfill. PMID:26866796
Statistical Assessment of Estimated Transformations in Observed-Score Equating
ERIC Educational Resources Information Center
Wiberg, Marie; González, Jorge
2016-01-01
Equating methods make use of an appropriate transformation function to map the scores of one test form into the scale of another so that scores are comparable and can be used interchangeably. The equating literature shows that the ways of judging the success of an equating (i.e., the score transformation) might differ depending on the adopted…
ERIC Educational Resources Information Center
Chen, Haiwen
2012-01-01
In this article, linear item response theory (IRT) observed-score equating is compared under a generalized kernel equating framework with Levine observed-score equating for nonequivalent groups with anchor test design. Interestingly, these two equating methods are closely related despite being based on different methodologies. Specifically, when…
Vista, Alvin; Care, Esther
2011-06-01
Research on gender differences in intelligence has focused mostly on samples from Western countries and empirical evidence on gender differences from Southeast Asia is relatively sparse. This article presents results on gender differences in variance and means on a non-verbal intelligence test using a national sample of public school students from the Philippines. More than 2,700 sixth graders from public schools across the country were tested with the Naglieri Non-verbal Ability Test (NNAT). Variance ratios (VRs) and log-transformed VRs were computed. Proportion ratios for each of the ability levels were also calculated and a chi-square goodness-of-fit test was performed. An analysis of variance was performed to determine the overall gender difference in mean scores as well as within each of three age subgroups. Our data show non-existent or trivial gender difference in mean scores. However, the tails of the distributions show differences between the males and females, with greater variability among males in the upper half of the distribution and greater variability among females in the lower half of the distribution. Descriptions of the results and their implications are discussed. Results on mean score differences support the hypothesis that there are no significant gender differences in cognitive ability. The unusual results regarding differences in variance and the male-female proportion in the tails require more complex investigations. ©2010 The British Psychological Society.
Spofford, Christina M; Bayman, Emine O; Szeluga, Debra J; From, Robert P
2012-01-01
Novel methods for teaching are needed to enhance the efficiency of academic anesthesia departments as well as provide approaches to learning that are aligned with current trends and advances in technology. A video was produced that taught the key elements of anesthesia machine checkout and room set up. Novice learners were randomly assigned to receive either the new video format or traditional lecture-based format for this topic during their regularly scheduled lecture series. Primary outcome was the difference in written examination score before and after teaching between the two groups. Secondary outcome was the satisfaction score of the trainees in the two groups. Forty-two students assigned to the video group and 36 students assigned to the lecture group completed the study. Students in each group similar interest in anesthesia, pre-test scores, post-test scores, and final exam scores. The median posttest to pretest difference was greater in the video groups (3.5 (3.0-5.0) vs 2.5 (2.0-3.0), for video and lecture groups respectively, p 0.002). Despite improved test scores, students reported higher satisfaction the traditional, lecture-based format (22.0 (18.0-24.0) vs 24.0 (20.0-28.0), for video and lecture groups respectively, p <0.004). Higher pre-test to post-test improvements were observed among students in the video-based teaching group, however students rated traditional, live lectures higher than newer video-based teaching.
Foroudi, Farshad; Pham, Daniel; Bressel, Mathias; Tongs, David; Rolfo, Aldo; Styles, Colin; Gill, Suki; Kron, Tomas
2013-10-01
An e-Learning programme appeared useful for providing training and information regarding a multi-centre image guided radiotherapy trial. The aim of this study is to demonstrate the utility of this e-Learning programme. Modules were created on relevant pelvic anatomy, Cone Beam CT soft tissue recognition and trial details. Radiation therapist participants' knowledge and confidence were evaluated before, at the end of, and after at least 6 weeks of e-Learning (long term). One hundred and eighty-five participants were recruited from 12 centres, with 118 in the first, and 67 in the second cohort. One hundred and forty-six participants had two tests (pre and post e-Learning) and 39 of these had three tests (pre, post, and long term). There was an increase confidence after completion of modules (p<0.001). The first cohort pre scores increased from 67 ± 11 to 79 ± 8 (p<0.001) post. The long term same question score was 73 ± 14 (p=0.025, comparing to pre-test), and different questions' score was 77 ± 13 (p=0.014). In the second cohort, pre-test scores were 64 ± 10, post-test same question score 78 ± 9 (p<0.001) and different questions' score 81 ± 11 (p<0.001). e-Learning for a multi-centre clinical trial was feasible and improved confidence and knowledge. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
ERIC Educational Resources Information Center
Austin Independent School District, TX.
Designed for parents of kindergarten and elementary school children in Austin, Texas, this brochure explains the structure and function of the Iowa Tests of Basic Skills. A question and answer format is used to provide information on the scope and purposes of the tests, grade level differences in testing, meaning and accuracy of the scores, and…
Test Bias and the Culturally Different Early Adolescent.
ERIC Educational Resources Information Center
Roberts, Eileen; DeBlassie, Richard R.
1983-01-01
Defines test bias as a phenomenon in which test scores result in negative outcomes for certain groups, often lower socioeconomic groups and minorities. Discusses three manifestations of test bias including content, atmosphere, and use bias and presents recommendations for remedying bias problems in testing the culturally different. (JAC)
Nursing workload in public and private intensive care units
Nogueira, Lilia de Souza; Koike, Karina Mitie; Sardinha, Débora Souza; Padilha, Katia Grillo; de Sousa, Regina Marcia Cardoso
2013-01-01
Objective This study sought to compare patients at public and private intensive care units according to the nursing workload and interventions provided. Methods This retrospective, comparative cohort study included 600 patients admitted to 4 intensive care units in São Paulo. The nursing workload and interventions were assessed using the Nursing Activities Score during the first and last 24 hours of the patient's stay at the intensive care unit. Pearson's chi-square test, Fisher's exact test, the Mann-Whitney test, and Student's t test were used to compare the patient groups. Results The average Nursing Activities Score upon admission to the intensive care unit was 61.9, with a score of 52.8 upon discharge. Significant differences were found among the patients at public and private intensive care units relative to the average Nursing Activities Score upon admission, as well as for 12 out of 23 nursing interventions performed during the first 24 hours of stay at the intensive care units. The patients at the public intensive care units exhibited a higher average score and overall more frequent nursing interventions, with the exception of those involved in the "care of drains", "mobilization and positioning", and "intravenous hyperalimentation". The groups also differed with regard to the evolution of the Nursing Activities Score among the total case series as well as the groups of survivors from the time of admission to discharge from the intensive care unit. Conclusion Patients admitted to public and private intensive care units exhibit differences in their nursing care demands, which may help managers with nursing manpower planning. PMID:24213086
de Vreede, Paul L; Samson, Monique M; van Meeteren, Nico L; Duursma, Sijmen A; Verhaar, Harald J
2006-08-01
The Assessment of Daily Activity Performance (ADAP) test was developed, and modeled after the Continuous-scale Physical Functional Performance (CS-PFP) test, to provide a quantitative assessment of older adults' physical functional performance. The aim of this study was to determine the intra-examiner reliability and construct validity of the ADAP in a community-living older population, and to identify the importance of tester experience. Forty-three community-dwelling, older women (mean age 75 yr +/-4.3) were randomized to the test-retest reliability study (n=19) or validation study (n=24). The intra-examiner reliability of an experienced (tester 1) and an inexperienced tester (tester 2) was assessed by comparing test and retest scores of 19 participants. Construct validity was assessed by comparing the ADAP scores of 24 participants with self-perceived function by the SF-36 Health Survey, muscle function tests, and the Timed Up and Go test (TUG). Tester 1 had good consistency and reliability scores (mean difference between test and retest scores (DIF), -1.05+/-1.99; 95% confidence interval (CI), -2.58 to 0.48; Cronbach's alpha (alpha) range, 0.83 to 0.98; intraclass correlation (ICC) range, 0.75 to 0.96; Limits of Agreement (LoA), -2.58 to 4.95). Tester 2 had lower reliability scores (DIF, -2.45+/-4.36; 95% CI, -5.56 to 0.67; alpha range, 0.53 to 0.94; ICC range, 0.36 to 0.90; LoA, -6.09 to 10.99), with a systematic difference between test and retest scores for the ADAP domain lower-body strength (-3.81; 95% CI, -6.09 to -1.54), ADAP correlated with SF-36 Physical Functioning scale (r=0.67), TUG test (r=-0.91) and with isometric knee extensor strength (r=0.80). The ADAP test is a reliable and valid instrument. Our results suggest that testers should practise using the test, to improve reliability, before applying it to clinical settings.
Test Score Equating Using a Mini-Version Anchor and a Midi Anchor: A Case Study Using SAT[R] Data
ERIC Educational Resources Information Center
Liu, Jinghua; Sinharay, Sandip; Holland, Paul W.; Curley, Edward; Feigenbaum, Miriam
2011-01-01
This study explores an anchor that is different from the traditional miniature anchor in test score equating. In contrast to a traditional "mini" anchor that has the same spread of item difficulties as the tests to be equated, the studied anchor, referred to as a "midi" anchor (Sinharay & Holland), has a smaller spread of…
ERIC Educational Resources Information Center
Öztürk-Gübes, Nese; Kelecioglu, Hülya
2016-01-01
The purpose of this study was to examine the impact of dimensionality, common-item set format, and different scale linking methods on preserving equity property with mixed-format test equating. Item response theory (IRT) true-score equating (TSE) and IRT observed-score equating (OSE) methods were used under common-item nonequivalent groups design.…
ERIC Educational Resources Information Center
Kim, Sooyeon; Robin, Frederic
2017-01-01
In this study, we examined the potential impact of item misfit on the reported scores of an admission test from the subpopulation invariance perspective. The target population of the test consisted of 3 major subgroups with different geographic regions. We used the logistic regression function to estimate item parameters of the operational items…
A weighted generalized score statistic for comparison of predictive values of diagnostic tests
Kosinski, Andrzej S.
2013-01-01
Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations which are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we present, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic which incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, it always reduces to the score statistic in the independent samples situation, and it preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the weighted generalized score test statistic in a general GEE setting. PMID:22912343
Gottlieb, Daniel H.; Capitanio, John P.
2012-01-01
The human intruder test is a testing paradigm designed to measure rhesus macaques’ behavioral responses to a stressful and threatening situation. In the test, an unfamiliar human positions him/herself in various threatening positions relative to a caged macaque. This paradigm has been utilized for over twenty years to measure a variety of behavioral constructs, including fear and anxiety, behavioral inhibition, emotionality, and aggression. To date there have been no attempts to evaluate comprehensively the structure of the behavioral responses to the test. Our first goal was to identify the underlying latent factors affecting the different responses among subjects, and our second goal was determine if rhesus reared in different environments respond differently in this testing paradigm. To accomplish this, we first performed exploratory and confirmatory factor analyses on the behavioral responses of 3–4 month-old rhesus macaques, utilizing data from over 2,000 separate tests conducted between 2001–2007. Using the resulting model, we then tested to see whether early rearing experience affected responses in the test. Our first analyses suggested that most of the variation in infant behavioral responses to the human intruder test could be explained by four latent factors: “Activity,” “Emotionality,” “Aggression,” and “Displacement.” Our second analyses revealed a significant effect of rearing condition for each factor score (P < 0.001); most notable socially-reared animals had the lowest Activity score (P < 0.001), indoor mother-reared animals had the highest Displacement score (P < 0.001), and nursery-reared animals had the highest Emotionality (P < 0.001) and lowest Aggression scores (P < 0.001). These results demonstrate that this standardized testing paradigm reveals multiple patterns of response, which are influenced by an animal’s rearing history. PMID:23229557
Gottlieb, Daniel H; Capitanio, John P
2013-04-01
The human intruder test is a testing paradigm designed to measure rhesus macaques' behavioral responses to a stressful and threatening situation. In the test, an unfamiliar human positions him/herself in various threatening positions relative to a caged macaque. This paradigm has been utilized for over 20 years to measure a variety of behavioral constructs, including fear and anxiety, behavioral inhibition, emotionality, and aggression. To date, there have been no attempts to evaluate comprehensively the structure of the behavioral responses to the test. Our first goal was to identify the underlying latent factors affecting the different responses among subjects, and our second goal was to determine if rhesus reared in different environments respond differently in this testing paradigm. To accomplish this, we first performed exploratory and confirmatory factor analyses on the behavioral responses of 3- to 4-month-old rhesus macaques, utilizing data from over 2,000 separate tests conducted between 2001-2007. Using the resulting model, we then tested to see whether early rearing experience affected responses in the test. Our first analyses suggested that most of the variation in infant behavioral responses to the human intruder test could be explained by four latent factors: "activity," "emotionality," "aggression," and "displacement." Our second analyses revealed a significant effect of rearing condition for each factor score (P < 0.001); most notable socially reared animals had the lowest activity score (P < 0.001), indoor mother-reared animals had the highest displacement score (P < 0.001), and nursery-reared animals had the highest emotionality (P < 0.001) and lowest aggression scores (P < 0.001). These results demonstrate that this standardized testing paradigm reveals multiple patterns of response, which are influenced by an animal's rearing history. © 2012 Wiley Periodicals, Inc.
Miller, Justin B; Axelrod, Bradley N; Rapport, Lisa J; Hanks, Robin A; Bashem, Jesse R; Schutte, Christian
2012-01-01
Two common measures used to evaluate verbal learning and memory are the Verbal Paired Associates (VPA) subtest from the Wechsler Memory Scales (WMS) and the second edition of the California Verbal Learning Test (CVLT-II). For the fourth edition of the WMS, scores from the CVLT-II can be substituted for VPA; the present study sought to examine the validity of the substitution. For each substitution, paired-samples t tests were conducted between original VPA scaled scores and scaled scores obtained from the CVLT-II substitution to evaluate comparability. Similar comparisons were made at the index score level. At the index score level, substitution resulted in significantly lower scores for the AMI (p = .03; r = .13) but not for the IMI (p = .29) or DMI (p = .09). For the subtest scores, substituted scaled scores for VPA were not significantly different from original scores for the immediate recall condition (p = .20) but were significantly lower at delayed recall (p = .01). These findings offer partial support for the substitution. For both the immediate and delayed conditions, the substitution produced generally lower subtest scores compared to original VPA subtest scores.
Medical ethical standards in dermatology: an analytical study of knowledge, attitudes and practices.
Mostafa, W Z; Abdel Hay, R M; El Lawindi, M I
2015-01-01
Dermatology practice has not been ethically justified at all times. The objective of the study was to find out dermatologists' knowledge about medical ethics, their attitudes towards regulatory measures and their practices, and to study the different factors influencing the knowledge, the attitude and the practices of dermatologists. This is a cross-sectional comparative study conducted among 214 dermatologists, from five Academic Universities and from participants in two conferences. A 54 items structured anonymous questionnaire was designed to describe the demographical characteristics of the study group as well as their knowledge, attitude and practices regarding the medical ethics standards in clinical and research settings. Five scoring indices were estimated regarding knowledge, attitude and practice. Inferential statistics were used to test differences between groups as indicated. The Student's t-test and analysis of variance were carried out for quantitative variables. The chi-squared test was conducted for qualitative variables. The results were considered statistically significant at a P > 0.05. Analysis of the possible factors having impact on the overall scores revealed that the highest knowledge scores were among dermatologists who practice in an academic setting plus an additional place; however, this difference was statistically non-significant (P = 0.060). Female dermatologists showed a higher attitude score compared to males (P = 0.028). The highest significant attitude score (P = 0.019) regarding clinical practice was recorded among those practicing cosmetic dermatology. The different studied groups of dermatologists revealed a significant impact on the attitude score (P = 0.049), and the evidence-practice score (P < 0.001). Ethical practices will improve the quality and integrity of dermatology research. © 2014 European Academy of Dermatology and Venereology.
Differences in Cognitive Function between Women and Men with HIV.
Maki, Pauline M; Rubin, Leah H; Springer, Gayle; Seaberg, Eric C; Sacktor, Ned; Miller, Eric N; Valcour, Victor; Young, Mary A; Becker, James T; Martin, Eileen M
2018-05-25
Women may be more vulnerable than men to HIV-related cognitive dysfunction due to sociodemographic, lifestyle, mental health, and biological factors. However, studies to date have yielded inconsistent findings on the existence, magnitude and pattern of sex differences. We examined these issues using longitudinal data from two large, prospective, multisite, observational studies of U.S. women and men with and without HIV. Women's Interagency HIV Study (WIHS) and Multicenter AIDS Cohort Study (MACS). HIV-infected (HIV+) and uninfected (HIV-) WIHS and MACS participants completed tests of psychomotor speed, executive function, and fine motor skills. Groups were matched on HIV status, sex, age, education, and black race. Generalized linear mixed models were used to examine group differences on continuous and categorical demographically-corrected T-scores. Results were adjusted for other confounding factors. The sample (n=1420) included 710 women (429 HIV+) and 710 men (429 HIV+) (67% NonHispanic-Black; 53% high school or less). For continuous T-scores, Sex by HIV Serostatus interactions were observed on the Trail Making Test (TMT) Parts A&B, Grooved Pegboard, and Symbol Digit Modalities Test. For these tests, HIV+ women scored lower than HIV+ men, with no sex differences in HIV- individuals. In analyses of categorical scores, particularly TMT Part A and Grooved Pegboard Non-Dominant, HIV+ women also had a higher odds of impairment compared to HIV+ men. Sex differences were constant over time. Although sex differences are generally under-studied, HIV+ women versus men show cognitive disadvantages. Elucidating the mechanisms underlying these differences is critical for tailoring cognitive interventions.
Racial Differences in Mathematics Test Scores for Advanced Mathematics Students
ERIC Educational Resources Information Center
Minor, Elizabeth Covay
2016-01-01
Research on achievement gaps has found that achievement gaps are larger for students who take advanced mathematics courses compared to students who do not. Focusing on the advanced mathematics student achievement gap, this study found that African American advanced mathematics students have significantly lower test scores and are less likely to be…
Teachers' Perceptions and Expectations and the Black-White Test Score Gap.
ERIC Educational Resources Information Center
Ferguson, Ronald F.
2003-01-01
Evaluates how schools can positively affect the test score gap between black and white students by examining two potential sources for this difference: teachers and students. Offers evidence for the proposition that teachers' perceptions, expectations, and behaviors interact with students' beliefs, behaviors, and work habits in ways that help to…
Comparing Standard Deviation Effects across Contexts
ERIC Educational Resources Information Center
Ost, Ben; Gangopadhyaya, Anuj; Schiman, Jeffrey C.
2017-01-01
Studies using tests scores as the dependent variable often report point estimates in student standard deviation units. We note that a standard deviation is not a standard unit of measurement since the distribution of test scores can vary across contexts. As such, researchers should be cautious when interpreting differences in the numerical size of…
Linking School Goals and Learning Standards to Teacher Evaluation and Compensation.
ERIC Educational Resources Information Center
Mathis, William J.
It is possible to tie teacher compensation to professional growth, without reference to standardized test scores. Tying pay to students' achievement scores does not account for the different levels of students, and teacher testing does not separate good teachers from bad. In Rutland Northeast, Vermont, each school has its own locally elected…
An explorative study of school performance and antipsychotic medication.
van der Schans, J; Vardar, S; Çiçek, R; Bos, H J; Hoekstra, P J; de Vries, T W; Hak, E
2016-09-21
Antipsychotic therapy can reduce severe symptoms of psychiatric disorders, however, data on school performance among children on such treatment are lacking. The objective was to explore school performance among children using antipsychotic drugs at the end of primary education. A cross-sectional study was conducted using the University Groningen pharmacy database linked to academic achievement scores at the end of primary school (Dutch Cito-test) obtained from Statistics Netherlands. Mean Cito-test scores and standard deviations were obtained for children on antipsychotic therapy and reference children, and statistically compared using analyses of covariance. In addition, differences in subgroups as boys versus girls, ethnicity, household income, and late starters (start date within 12 months of the Cito-test) versus early starters (start date > 12 months before the Cito-test) were tested. In all, data from 7994 children could be linked to Cito-test scores. At the time of the Cito-test, 45 (0.6 %) were on treatment with antipsychotics. Children using antipsychotics scored on average 3.6 points lower than the reference peer group (534.5 ± 9.5). Scores were different across gender and levels of household income (p < 0.05). Scores of early starters were significantly higher than starters within 12 months (533.7 ± 1.7 vs. 524.1 ± 2.6). This first exploration showed that children on antipsychotic treatment have lower school performance compared to the reference peer group at the end of primary school. This was most noticeable for girls, but early starters were less affected than later starters. Due to the observational cross-sectional nature of this study, no causality can be inferred, but the results indicate that school performance should be closely monitored and causes of underperformance despite treatment warrants more research.
Guilloux, Jean-Philippe; Seney, Marianne; Edgar, Nicole; Sibille, Etienne
2011-01-01
Defining anxiety- and depressive-like states in mice (“emotionality”) is best characterized by the use of complementary tests, leading sometimes to puzzling discrepancies and lack of correlation between similar paradigms. To address this issue, we hypothesized that integrating measures along the same behavioral dimensions in different tests would reduce the intrinsic variability of single tests and provide a robust characterization of the underlying “emotionality” of individual mouse, similarly as mood and related syndromes are defined in humans through various related symptoms over time. We describe the use of simple mathematical and integrative tools to help phenotype animals across related behavioral tests (syndrome diagnosis) and experiments (meta-analysis). We applied z-normalization across complementary measures of emotionality in different behavioral tests after unpredictable chronic mild stress (UCMS) or prolonged corticosterone exposure - two approaches to induce anxious-/depressive-like states in mice. Combining z-normalized test values, lowered the variance of emotionality measurement, enhanced the reliability of behavioral phenotyping, and increased analytical opportunities. Comparing integrated emotionality scores across studies revealed a robust sexual dimorphism in the vulnerability to develop high emotionality, manifested as higher UCMS-induced emotionality z-scores, but lower corticosterone-induced scores in females compared to males. Interestingly, the distribution of individual z-scores revealed a pattern of increased baseline emotionality in female mice, reminiscent of what is observed in humans. Together, we show that the z-scoring method yields robust measures of emotionality across complementary tests for individual mice and experimental groups, hence facilitating the comparison across studies and refining the translational applicability of these models. PMID:21277897
Guilloux, Jean-Philippe; Seney, Marianne; Edgar, Nicole; Sibille, Etienne
2011-04-15
Defining anxiety- and depressive-like states in mice (emotionality) is best characterized by the use of complementary tests, leading sometimes to puzzling discrepancies and lack of correlation between similar paradigms. To address this issue, we hypothesized that integrating measures along the same behavioral dimensions in different tests would reduce the intrinsic variability of single tests and provide a robust characterization of the underlying "emotionality" of individual mouse, similarly as mood and related syndromes are defined in humans through various related symptoms over time. We describe the use of simple mathematical and integrative tools to help phenotype animals across related behavioral tests (syndrome diagnosis) and experiments (meta-analysis). We applied z-normalization across complementary measures of emotionality in different behavioral tests after unpredictable chronic mild stress (UCMS) or prolonged corticosterone exposure - two approaches to induce anxious-/depressive-like states in mice. Combining z-normalized test values, lowered the variance of emotionality measurement, enhanced the reliability of behavioral phenotyping, and increased analytical opportunities. Comparing integrated emotionality scores across studies revealed a robust sexual dimorphism in the vulnerability to develop high emotionality, manifested as higher UCMS-induced emotionality z-scores, but lower corticosterone-induced scores in females compared to males. Interestingly, the distribution of individual z-scores revealed a pattern of increased baseline emotionality in female mice, reminiscent of what is observed in humans. Together, we show that the z-scoring method yields robust measures of emotionality across complementary tests for individual mice and experimental groups, hence facilitating the comparison across studies and refining the translational applicability of these models. Copyright © 2011 Elsevier B.V. All rights reserved.
Cid, Jaime A; von Davier, Alina A
2015-05-01
Test equating is a method of making the test scores from different test forms of the same assessment comparable. In the equating process, an important step involves continuizing the discrete score distributions. In traditional observed-score equating, this step is achieved using linear interpolation (or an unscaled uniform kernel). In the kernel equating (KE) process, this continuization process involves Gaussian kernel smoothing. It has been suggested that the choice of bandwidth in kernel smoothing controls the trade-off between variance and bias. In the literature on estimating density functions using kernels, it has also been suggested that the weight of the kernel depends on the sample size, and therefore, the resulting continuous distribution exhibits bias at the endpoints, where the samples are usually smaller. The purpose of this article is (a) to explore the potential effects of atypical scores (spikes) at the extreme ends (high and low) on the KE method in distributions with different degrees of asymmetry using the randomly equivalent groups equating design (Study I), and (b) to introduce the Epanechnikov and adaptive kernels as potential alternative approaches to reducing boundary bias in smoothing (Study II). The beta-binomial model is used to simulate observed scores reflecting a range of different skewed shapes.
ERIC Educational Resources Information Center
Tong, Ye; Kolen, Michael J.
2010-01-01
"Scaling" is the process of constructing a score scale that associates numbers or other ordered indicators with the performance of examinees. Scaling typically is conducted to aid users in interpreting test results. This module describes different types of raw scores and scale scores, illustrates how to incorporate various sources of…
Romero, Roberto; Kadar, Nicholas; Miranda, Jezid; Korzeniewski, Steven J.; Schwartz, Alyse G.; Chaemsaithong, Piya; Rogers, Wade; Soto, Eleazar; Gotsch, Francesca; Yeo, Lami; Hassan, Sonia S.; Chaiworapongsa, Tinnakorn
2018-01-01
Objective Intra-amniotic infection/inflammation are major causes of spontaneous preterm labor and delivery. However, diagnosis of intra-amniotic infection is challenging because most are subclinical and amniotic fluid (AF) cultures take several days before results are available. Several tests have been proposed for the rapid diagnosis of microbial invasion of the amniotic cavity (MIAC) or intra-amniotic inflammation. The aim of this study was to examine the diagnostic performance of the AF Mass Restricted (MR) score in comparison with interleukin-6 (IL-6) and matrix metalloproteinase-8 (MMP-8) for the identification of MIAC or inflammation. Methods AF samples were collected from patients with singleton gestations and symptoms of preterm labor (n = 100). Intra-amniotic inflammation was defined as >100 white blood cells/mm3 (WBCs) in AF; MIAC was defined as a positive AF culture. AF IL-6 and MMP-8 were determined using ELISA. The MR score was obtained using the Surface-Enhanced Laser Desorption Ionization Time of Flight (SELDI-TOF) mass spectrometry. Sensitivity and specificity were calculated and logistic regression models were fit to construct receiver-operating characteristic (ROC) curves for the identification of each outcome. The McNemar’s test and paired sample non-parametric statistical techniques were used to test for differences in diagnostic performance metrics. Results (1) The prevalence of MIAC and intra-amniotic inflammation was 34% (34/100) and 40% (40/100), respectively; (2) there were no significant differences in sensitivity of the three tests under study (MR score, IL-6 or MMP-8) in the identification of either MIAC or intra-amniotic inflammation (using the following cutoffs: MR score >2, IL-6 >11.4 ng/mL, and MMP-8 >23 ng/mL); (3) there was no significant difference in the sensitivity among the three tests for the same outcomes when the false positive rate was fixed at 15%; (4) the specificity for IL-6 was not significantly different from that of the MR score in identifying either MIAC or intra-amniotic inflammation when using previously reported thresholds; and (5) there were no significant differences in the area under the ROC curve when comparing the MR score, IL-6 or MMP-8 in the identification of these outcomes. Conclusions IL-6 and the MR score have equivalent diagnostic performance in the identification of MIAC or intra-amniotic inflammation. Selection from among these three tests (MR score, IL-6 and MMP-8) for diagnostic purposes should be based on factors such as availability, reproducibility, and cost. The MR score requires a protein chip and a SELDI-TOF instrument which are not widely available or considered “state of the art”. In contrast, immunoassays for IL-6 can be performed in the majority of clinical laboratories. PMID:24028673
Romero, Roberto; Kadar, Nicholas; Miranda, Jezid; Korzeniewski, Steven J; Schwartz, Alyse G; Chaemsaithong, Piya; Rogers, Wade; Soto, Eleazar; Gotsch, Francesca; Yeo, Lami; Hassan, Sonia S; Chaiworapongsa, Tinnakorn
2014-05-01
Intra-amniotic infection/inflammation are major causes of spontaneous preterm labor and delivery. However, diagnosis of intra-amniotic infection is challenging because most are subclinical and amniotic fluid (AF) cultures take several days before results are available. Several tests have been proposed for the rapid diagnosis of microbial invasion of the amniotic cavity (MIAC) or intra-amniotic inflammation. The aim of this study was to examine the diagnostic performance of the AF Mass Restricted (MR) score in comparison with interleukin-6 (IL-6) and matrix metalloproteinase-8 (MMP-8) for the identification of MIAC or inflammation. AF samples were collected from patients with singleton gestations and symptoms of preterm labor (n = 100). Intra-amniotic inflammation was defined as >100 white blood cells/mm(3) (WBCs) in AF; MIAC was defined as a positive AF culture. AF IL-6 and MMP-8 were determined using ELISA. The MR score was obtained using the Surface-Enhanced Laser Desorption Ionization Time of Flight (SELDI-TOF) mass spectrometry. Sensitivity and specificity were calculated and logistic regression models were fit to construct receiver-operating characteristic (ROC) curves for the identification of each outcome. The McNemar's test and paired sample non-parametric statistical techniques were used to test for differences in diagnostic performance metrics. (1) The prevalence of MIAC and intra-amniotic inflammation was 34% (34/100) and 40% (40/100), respectively; (2) there were no significant differences in sensitivity of the three tests under study (MR score, IL-6 or MMP-8) in the identification of either MIAC or intra-amniotic inflammation (using the following cutoffs: MR score >2, IL-6 >11.4 ng/mL, and MMP-8 >23 ng/mL); (3) there was no significant difference in the sensitivity among the three tests for the same outcomes when the false positive rate was fixed at 15%; (4) the specificity for IL-6 was not significantly different from that of the MR score in identifying either MIAC or intra-amniotic inflammation when using previously reported thresholds; and (5) there were no significant differences in the area under the ROC curve when comparing the MR score, IL-6 or MMP-8 in the identification of these outcomes. IL-6 and the MR score have equivalent diagnostic performance in the identification of MIAC or intra-amniotic inflammation. Selection from among these three tests (MR score, IL-6 and MMP-8) for diagnostic purposes should be based on factors such as availability, reproducibility, and cost. The MR score requires a protein chip and a SELDI-TOF instrument which are not widely available or considered "state of the art". In contrast, immunoassays for IL-6 can be performed in the majority of clinical laboratories.
Bronchiectasis: correlation of high-resolution CT findings with health-related quality of life.
Eshed, I; Minski, I; Katz, R; Jones, P W; Priel, I E
2007-02-01
To evaluate the relationship between the severity of bronchiectatic diseases, as evident on high-resolution computed tomography (HRCT) and the patient's quality of life measured using the St George's Respiratory Questionnaire (SGRQ). Forty-six patients (25 women, 21 men, mean age: 63 years) with bronchiectatic disease as evident on recent HRCT examinations were recruited. Each patient completed the SGRQ and underwent respiratory function tests. HRCT findings were blindly and independently scored by two radiologists, using the modified Bhalla scoring system. The relationships between HRCT scores, SGRQ scores and pulmonary function tests were evaluated. The patients' total CT score did not correlate with the SGRQ scores. However, patients with more advanced disease on HRCT, significantly differed in their SGRQ scores from patients with milder bronchiectatic disease. A significant correlation was found between the CT scores for the middle and distal lung zones and the activity, impacts and total SGRQ scores. No correlation was found between CT scores and respiratory function test indices. However, a significant correlation was found between the SGRQ scores and most of the respiratory function test indices. A correlation between the severity of bronchiectatic disease as expressed in HRCT and the health-related quality of life exists in patients with a more severe bronchiectatic disease but not in patients with mild disease. Such correlation depends on the location of the bronchiectasis in the pulmonary tree.
Race, Socioeconomic Status, and Implicit Bias: Implications for Closing the Achievement Gap
NASA Astrophysics Data System (ADS)
Schlosser, Elizabeth Auretta Cox
This study accessed the relationship between race, socioeconomic status, age and the race implicit bias held by middle and high school science teachers in Mobile and Baldwin County Public School Systems. Seventy-nine participants were administered the race Implicit Association Test (race IAT), created by Greenwald, A. G., Nosek, B. A., & Banaji, M. R., (2003) and a demographic survey. Quantitative analysis using analysis of variances, ANOVA and t-tests were used in this study. An ANOVA was performed comparing the race IAT scores of African American science teachers and their Caucasian counterparts. A statically significant difference was found (F = .4.56, p = .01). An ANOVA was also performed using the race IAT scores comparing the age of the participants; the analysis yielded no statistical difference based on age. A t-test was performed comparing the race IAT scores of African American teachers who taught at either Title I or non-Title I schools; no statistical difference was found between groups (t = -17.985, p < .001). A t-test was also performed comparing the race IAT scores of Caucasian teachers who taught at either Title I or non-Title I schools; a statistically significant difference was found between groups ( t = 2.44, p > .001). This research examines the implications of the achievement gap among African American and Caucasian students in science.
Sachan, D; Gupta, N; Agarwal, P; Chaudhary, R
2011-08-01
Heparin-induced thrombocytopenia (HIT) should be diagnosed clinically as well as by laboratory assays for timely recognition, prevention and management of complications. To evaluate the clinical utility of pre-test clinical scoring system in combination with two immunoassays for the diagnosis of HIT in cardiac surgery patients. A total of 100 consecutive patients undergoing cardiac surgery were studied. Pre-test clinical scoring was carried out in patients with thrombocytopenia and further tested by two immunoassays, i.e., Heparin platelet factor 4 (H-PF4) enzyme-linked immunosorbent assay (ELISA) and particle gel immunoassay (PaGIA). Of the 100 patients studied, 42 patients developed thrombocytopenia post-operatively. On pre-test clinical scoring, low T-score was observed in 6 patients, intermediate in 28 and high score in 8 patients, whereas 19 patients (45.2%) were positive by H-PF4 ELISA and 10 (23.8%) by PaGIA for H-PF4 antibody. The difference in the incidence of clinically significant HIT antibodies in the three categories was statistically significant. A good correlation was also observed with ELISA optical density, T-scoring and PaGIA. Pre-test clinical scoring correlates well with the development of H-PF4 antibodies which are incriminated in the causation of thrombotic complications in patients with HIT. We also propose a protocol for diagnosing patients with clinical suspicion of HIT using pre-test clinical scoring and immunoassay. © 2011 The Authors. Transfusion Medicine © 2011 British Blood Transfusion Society.
Strom, Suzanne L; Anderson, Craig L; Yang, Luanna; Canales, Cecilia; Amin, Alpesh; Lotfipour, Shahram; McCoy, C Eric; Osborn, Megan Boysen; Langdorf, Mark I
2015-11-01
Traditional Advanced Cardiac Life Support (ACLS) courses are evaluated using written multiple-choice tests. High-fidelity simulation is a widely used adjunct to didactic content, and has been used in many specialties as a training resource as well as an evaluative tool. There are no data to our knowledge that compare simulation examination scores with written test scores for ACLS courses. To compare and correlate a novel high-fidelity simulation-based evaluation with traditional written testing for senior medical students in an ACLS course. We performed a prospective cohort study to determine the correlation between simulation-based evaluation and traditional written testing in a medical school simulation center. Students were tested on a standard acute coronary syndrome/ventricular fibrillation cardiac arrest scenario. Our primary outcome measure was correlation of exam results for 19 volunteer fourth-year medical students after a 32-hour ACLS-based Resuscitation Boot Camp course. Our secondary outcome was comparison of simulation-based vs. written outcome scores. The composite average score on the written evaluation was substantially higher (93.6%) than the simulation performance score (81.3%, absolute difference 12.3%, 95% CI [10.6-14.0%], p<0.00005). We found a statistically significant moderate correlation between simulation scenario test performance and traditional written testing (Pearson r=0.48, p=0.04), validating the new evaluation method. Simulation-based ACLS evaluation methods correlate with traditional written testing and demonstrate resuscitation knowledge and skills. Simulation may be a more discriminating and challenging testing method, as students scored higher on written evaluation methods compared to simulation.
NASA Astrophysics Data System (ADS)
Harris, Michael W.
This study examined the effectiveness of a specific instructional strategy employed to improve performance on the end-of-the-year Criterion-Referenced Competency Test (CRCT) as mandated by the No Child Left Behind (NCLB) Act of 2001. A growing body of evidence suggests that the perceived pressure to produce adequate aggregated scores on the CRCT causes teachers to neglect other relevant aspects of teaching and attend less to individualized instruction. Rooted in constructivist theory, inquiry-based programs provide a o developmental plan of instruction that affords the opportunity for each student to understand their academic needs and strengths. However, the utility of inquiry-based instruction is largely unknown due to the lack of evaluation studies. To address this problem, this quantitative evaluation measured the impact of the Audet and Jordan inquiry-based instructional model on CRCT test scores of 102 students in a sixth-grade science classroom in one north Georgia school. A series of binomial tests of proportions tested differences between CRCT scores of the program participants and those of a matched control sample selected from other district schools that did not adopt the program. The study found no significant differences on CRCT test scores between the treatment and control groups. The study also found no significant performance differences among genders in the sample using inquiry instruction. This implies that the utility of inquiry education might exist outside the domain of test scores. This study can contribute to social change by informing a reevaluation of the instructional strategies that ideally will serve NCLB high-stakes assessment mandates, while also affording students the individual-level skills needed to become productive members of society.
A new discussion of the cutaneous vascular reactivity in sensitive skin: A sub-group of SS?
Chen, S Y; Yin, J; Wang, X M; Liu, Y Q; Gao, Y R; Liu, X P
2018-02-02
Sensitive skin (SS) seems not to be a one-dimensional condition and many scholars concentrate on skin barrier disruption or sensorineural change, but few focus on its increased vascular reactivity. This study explored the possibility of using the different selection methods and measurement methods to verify a high vascular reactivity in SS without an impaired cutaneous barrier function. Sixty "self-perceived sensitive skin" volunteers were enlisted and each one completed three kinds of screening tests: assess cutaneous sensory using questionnaire survey and Lactic Acid Sting Test (LAST); assess barrier function using Sodium lauryl sulphate (SLS) skin irritation test and assess cutaneous vascular reactivity using 98% DMSO test and non-invasive measurement. Volunteers were divided into different groups based on response to SLS. The DMSO clinical score and the biophysical parameters obtained by non-invasive measurement were subsequently analysed. (1) The positive correlations could be seen between sum LAST score and sum DMSO score regardless of the observation time; (2) The biological parameters (CBF、a*values and L* values) are all keeping with DMSO score; (3) If the participants were divided into SLS reactors and non-reactors, a composition ratio of DMSO score was significant difference in these two groups and in SLS non-reactors, there were still seven participants showed high reaction to DMSO. There is a sub-group of SS for characteristics of a high vascular reactivity without an impaired cutaneous barrier function. The DMSO test and novel non-invasive measurements which are conducive to assess cutaneous vascular reactivity, combined with SLS skin irritation test could help us to screen this kind of SS. © 2018 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Gispen, Fiona E; Magid, Donna
2016-05-01
Correct selection of imaging tests is essential f or clinicians but until recently has been largely neglected in medical education. How and when students acquire such non-interpretive skills are unknown. This study will assess student knowledge of imaging test selection before and after a general radiology elective. Between 2008 and 2015, an unannounced 13-item test was administered to second, third, and fourth-year students on the first and last days of the Johns Hopkins School of Medicine radiology elective. Scores (0–13) were based on the American College of Radiology Appropriateness Criteria. Pre- and posttest means were compared using paired samples t tests. Whether performance on the pretest and posttest differed by class year was assessed using analysis of variance and Kruskal-Wallis, respectively, and whether year was associated with posttest score after controlling for pretest score was assessed using analysis of covariance. Posttest means were significantly higher than pretest means for students in all years (P values <.0001). Pretest scores differed by year (F(2, 360) = 66.85, P <.0001): fourth-year students scored highest (mean = 9.96 of 13) and second-year students scored lowest (mean = 7.01 of 13). Posttest scores did not differ (χ2(2, 270) = 0.348, P = .841). Year in school had no independent effect on posttest score (F(2, 239) = 0.45, P = .637). Knowledge of modality selection increases with clinical training, but room for improvement remains. A general radiology elective increases this knowledge. Second-year students improve most, suggesting that taking radiology early is efficient, but further research to evaluate retention of this knowledge is needed. Medical student education in radiology must increasingly recognize and address non-interpretive skills and intelligent imaging utilization.
Cook, David A; Gelula, Mark H; Dupras, Denise M; Schwartz, Alan
2007-09-01
Adapting web-based (WB) instruction to learners' individual differences may enhance learning. Objectives This study aimed to investigate aptitude-treatment interactions between learning and cognitive styles and WB instructional methods. We carried out a factorial, randomised, controlled, crossover, post-test-only trial involving 89 internal medicine residents, family practice residents and medical students at 2 US medical schools. Parallel versions of a WB course in complementary medicine used either active or reflective questions and different end-of-module review activities ('create and study a summary table' or 'study an instructor-created table'). Participants were matched or mismatched to question type based on active or reflective learning style. Participants used each review activity for 1 course module (crossover design). Outcome measurements included the Index of Learning Styles, the Cognitive Styles Analysis test, knowledge post-test, course rating and preference. Post-test scores were similar for matched (mean +/- standard error of the mean 77.4 +/- 1.7) and mismatched (76.9 +/- 1.7) learners (95% confidence interval [CI] for difference - 4.3 to 5.2l, P = 0.84), as were course ratings (P = 0.16). Post-test scores did not differ between active-type questions (77.1 +/- 2.1) and reflective-type questions (77.2 +/- 1.4; P = 0.97). Post-test scores correlated with course ratings (r = 0.45). There was no difference in post-test subscores for modules completed using the 'construct table' format (78.1 +/- 1.4) or the 'table provided' format (76.1 +/- 1.4; CI - 1.1 to 5.0, P = 0.21), and wholist and analytic styles had no interaction (P = 0.75) or main effect (P = 0.18). There was no association between activity preference and wholist or analytic scores (P = 0.37). Cognitive and learning styles had no apparent influence on learning outcomes. There were no differences in outcome between these instructional methods.
NASA Astrophysics Data System (ADS)
Adams, Kenneth Mark
The purpose of this research was to investigate the relationship between the learning style perceptual preferences of fourth grade urban students and the attainment of selected physical science concepts for three simple machines as taught using learning cycle methodology. The sample included all fourth grade children from one urban elementary school (N = 91). The research design followed a quasi-experimental format with a single group, equivalent teacher demonstration and student investigation materials, and identical learning cycle instructional treatment. All subjects completed the Understanding Simple Machines Test (USMT) prior to instructional treatment, and at the conclusion of treatment to measure student concept attainment related to the pendulum, the lever and fulcrum, and the inclined plane. USMT pre and post-test scores, California Achievement Test (CAT-5) percentile scores, and Learning Style Inventory (LSI) standard scores for four perceptual elements for each subject were held in a double blind until completion of the USMT post-test. The hypothesis tested in this study was: Learning style perceptual preferences of fourth grade students as measured by the Dunn, Dunn, and Price Learning Style Inventory (LSI) are significant predictors of success in the acquisition of physical science concepts taught through use of the learning cycle. Analysis of pre and post USMT scores, 18.18 and 30.20 respectively, yielded a significant mean gain of +12.02. A controlled stepwise regression was employed to identify significant predictors of success on the USMT post-test from among USMT pre-test, four CAT-5 percentile scores, and four LSI perceptual standard scores. The CAT -5 Total Math and Total Reading accounted for 64.06% of the variance in the USMT post-test score. The only perceptual element to act as a significant predictor was the Kinesthetic standard score, accounting for 1.72% of the variance. The study revealed that learning cycle instruction does not appear to be sensitive to different perceptual preferences. Students with different preferences for auditory, visual, and tactile modalities, when learning, seem to benefit equally from learning cycle exposure. Increased use of a double blind for future learning styles research was recommended.
Jenkinson, Toni-Marie; Muncer, Steven; Wheeler, Miranda; Brechin, Don; Evans, Stephen
2018-06-01
Neuropsychological assessment requires accurate estimation of an individual's premorbid cognitive abilities. Oral word reading tests, such as the test of premorbid functioning (TOPF), and demographic variables, such as age, sex, and level of education, provide a reasonable indication of premorbid intelligence, but their ability to predict other related cognitive abilities is less well understood. This study aimed to develop regression equations, based on the TOPF and demographic variables, to predict scores on tests of verbal fluency and naming ability. A sample of 119 healthy adults provided demographic information and were tested using the TOPF, FAS, animal naming test (ANT), and graded naming test (GNT). Multiple regression analyses, using the TOPF and demographics as predictor variables, were used to estimate verbal fluency and naming ability test scores. Change scores and cases of significant impairment were calculated for two clinical samples with diagnosed neurological conditions (TBI and meningioma) using the method in Knight, McMahon, Green, and Skeaff (). Demographic variables provided a significant contribution to the prediction of all verbal fluency and naming ability test scores; however, adding TOPF score to the equation considerably improved prediction beyond that afforded by demographic variables alone. The percentage of variance accounted for by demographic variables and/or TOPF score varied from 19 per cent (FAS), 28 per cent (ANT), and 41 per cent (GNT). Change scores revealed significant differences in performance in the clinical groups, particularity the TBI group. Demographic variables, particularly education level, and scores on the TOPF should be taken into consideration when interpreting performance on tests of verbal fluency and naming ability. © 2017 The British Psychological Society.
The Effect of Prior Knowledge and Gender on Physics Achievement
NASA Astrophysics Data System (ADS)
Stewart, John; Henderson, Rachel
2017-01-01
Gender differences on the Conceptual Survey in Electricity and Magnetism (CSEM) have been extensively studied. Ten semesters (N=1621) of CSEM data is presented showing male students outperform female students on the CSEM posttest by 5 % (p < . 001). Male students also outperform female students on qualitative in-semester test questions by 3 % (p = . 004), but no significant difference between male and female students was found on quantitative test questions. Male students enter the class with superior prior preparation in the subject and score 4 % higher on the CSEM pretest (p < . 001). If the sample is restricted to students with little prior knowledge who answer no more than 8 of the 32 questions correctly (N=822), male and female differences on the CSEM and qualitative test questions cease to be significant. This suggests no intrinsic gender bias exists in the CSEM itself and that gender differences are the result of prior preparation measured by CSEM pretest score. Gender differences between male and female students increase with pretest score. Regression analyses are presented to further explore interactions between preparation, gender, and achievement.
Clock Drawing as a Screen for Impaired Driving in Aging and Dementia: Is It Worth the Time?
Manning, Kevin J.; Davis, Jennifer D.; Papandonatos, George D.; Ott, Brian R.
2014-01-01
Clock drawing is recommended by medical and transportation authorities as a screening test for unsafe drivers. The objective of the present study was to assess the usefulness of different clock drawing systems as screening measures of driving performance in 122 healthy and cognitively impaired older drivers. Clock drawing was measured using four different scoring systems. Driving outcomes included global ratings of safety and the error rate on a standardized on-road test. Findings revealed that clock drawing was significantly correlated with the driving score on the road test for each of the scoring systems. However, receiver operator curve analyses showed limited clinical utility for clock drawing as a screening instrument for impaired on-road driving performance with the area under the curve ranging from 0.53 to 0.61. Results from this study indicate that clock drawing has limited utility as a solitary screening measure of on-road driving, even when considering a variety of scoring approaches. PMID:24296110
Clock drawing as a screen for impaired driving in aging and dementia: is it worth the time?
Manning, Kevin J; Davis, Jennifer D; Papandonatos, George D; Ott, Brian R
2014-02-01
Clock drawing is recommended by medical and transportation authorities as a screening test for unsafe drivers. The objective of the present study was to assess the usefulness of different clock drawing systems as screening measures of driving performance in 122 healthy and cognitively impaired older drivers. Clock drawing was measured using four different scoring systems. Driving outcomes included global ratings of safety and the error rate on a standardized on-road test. Findings revealed that clock drawing was significantly correlated with the driving score on the road test for each of the scoring systems. However, receiver operator curve analyses showed limited clinical utility for clock drawing as a screening instrument for impaired on-road driving performance with the area under the curve ranging from 0.53 to 0.61. Results from this study indicate that clock drawing has limited utility as a solitary screening measure of on-road driving, even when considering a variety of scoring approaches.
Comparing usability testing outcomes and functions of six electronic nursing record systems.
Cho, Insook; Kim, Eunman; Choi, Woan Heui; Staggers, Nancy
2016-04-01
This study examined the usability of six differing electronic nursing record (ENR) systems on the efficiency, proficiency and available functions for documenting nursing care and subsequently compared the results to nurses' perceived satisfaction from a previous study. The six hospitals had different ENR systems, all with narrative nursing notes in use for more than three years. Stratified by type of nursing unit, 54 staff nurses were digitally recorded during on-site usability testing by employing validated patient care scenarios and think-aloud protocols. The time to complete specific tasks was also measured. Qualitative performance data were converted into scores on efficiency (relevancy), proficiency (accuracy), and a competency index using scoring schemes described by McGuire and Babbott. Six nurse managers and the researchers completed assessments of available ENR functions and examined computerized nursing process components including the linkages among them. For the usability test, participants' mean efficiency score was 94.2% (95% CI, 91.4-96.9%). The mean proficiency was 60.6% (95% CI, 54.3-66.8%), and the mean competency index was 59.5% (95% CI, 52.9-66.0). Efficiency scores were significantly different across ENRs as was the time to complete tasks, ranging from 226.3 to 457.2s (χ(2)=12.3, P=0.031; χ(2)=11.2, P=0.048). No significant differences were seen for proficiency scores. The coverage of the various ENRs' nursing process ranged from 67% to 100%, but only two systems had complete integration of nursing components. Two systems with high efficiency and proficiency scores had much lower usability test scores and perceived user satisfaction along with more complex navigation patterns. In terms of system usability and functions, different levels of sophistication of and interaction performance with ENR systems exist in practice. This suggests that ENRs may have variable impacts on clinical outcomes and care quality. Future studies are needed to explore ENR impact on nursing care quality, efficiency, and safety. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Repeater Analysis for Combining Information from Different Assessments
ERIC Educational Resources Information Center
Haberman, Shelby; Yao, Lili
2015-01-01
Admission decisions frequently rely on multiple assessments. As a consequence, it is important to explore rational approaches to combine the information from different educational tests. For example, U.S. graduate schools usually receive both TOEFL iBT® scores and GRE® General scores of foreign applicants for admission; however, little guidance…
Popovic-Maneski, Lana; Aleksic, Antonina; Metani, Amine; Bergeron, Vance; Cobeljic, Radoje; Popovic, Dejan B
2018-01-01
Increased muscle tone and exaggerated tendon reflexes characterize most of the individuals after a spinal cord injury (SCI). We estimated seven parameters from the pendulum test and used them to compare with the Ashworth modified scale of spasticity grades in three populations (retrospective study) to assess their spasticity. Three ASIA B SCI patients who exercised on a stationary FES bicycle formed group F, six ASIA B SCI patients who received only conventional therapy were in the group C, and six healthy individuals constituted the group H. The parameters from the pendulum test were used to form a single measure, termed the PT score, for each subject. The pendulum test parameters show differences between the F and C groups, but not between the F and H groups, however, statistical significance was limited due to the small study size. Results show a small deviation from the mean for all parameters in the F group and substantial deviations from the mean for the parameters in the C group. PT scores show significant differences between the F and C groups and the C and H groups and no differences between the F and C groups. The correlation between the PT score and Ashworth score was 0.88.
Observed Score Linear Equating with Covariates
ERIC Educational Resources Information Center
Branberg, Kenny; Wiberg, Marie
2011-01-01
This paper examined observed score linear equating in two different data collection designs, the equivalent groups design and the nonequivalent groups design, when information from covariates (i.e., background variables correlated with the test scores) was included. The main purpose of the study was to examine the effect (i.e., bias, variance, and…
ERIC Educational Resources Information Center
Arendasy, Martin E.; Sommer, Markus
2013-01-01
Allowing respondents to retake a cognitive ability test has shown to increase their test scores. Several theoretical models have been proposed to explain this effect, which make distinct assumptions regarding the measurement invariance of psychometric tests across test administration sessions with regard to narrower cognitive abilities and general…
ERIC Educational Resources Information Center
Kesan, Cenk; Ozkalkan, Zuhal; Iric, Hamdullah; Kaya, Deniz
2012-01-01
In the exams based on limits and derivatives, in this study, it was tried to determine that if there was any difference in students' test scores according to the type of music listened to and environment without music. For this purpose, the achievement test including limits and derivatives and whose reliability coefficient of Cronbach Alpha is…
Gunner, Jessica H; Miele, Andrea S; Lynch, Julie K; McCaffrey, Robert J
2012-06-01
There is currently no standard criterion for determining abnormal test scores in neuropsychology; thus, a number of different criteria are commonly used. We investigated base rates of abnormal scores in healthy older adults using raw and T-scores from indices of the Wisconsin Card Sorting Test and Stroop Color-Word Test. Abnormal scores were examined cumulatively at seven cutoffs including >1.0, >1.5, >2.0, >2.5, and >3.0 standard deviations (SD) from the mean as well as those below the 10th and 5th percentiles. In addition, the number of abnormal scores at each of the seven cutoffs was also examined. Results showed when considering raw scores, ∼15% of individuals obtained scores>1.0 SD from the mean, around 10% were less than the 10th percentile, and 5% fell >1.5 SD or <5th percentile from the mean. Using T-scores, approximately 15%-20% and 5%-10% of scores were >1.0 and >1.5 SD from the mean, respectively. Roughly 15% and 5% fell at the <10th and <5th percentiles, respectively. Both raw and T-scores>2.0 SD from the mean were infrequent. Although the presence of a single abnormal score at 1.0 and 1.5 SD from the mean or at the 10th and 5th percentiles was not unusual, the presence of ≥2 abnormal scores using any criteria was uncommon. Consideration of base rate data regarding the percentage of healthy individuals scoring in the abnormal range should help avoid classifying normal variability as neuropsychological impairment.
Wolf, Timothy J; Dahl, Abigail; Auen, Colleen; Doherty, Meghan
2017-07-01
The objective of this study was to evaluate the inter-rater reliability, test-retest reliability, concurrent validity, and discriminant validity of the Complex Task Performance Assessment (CTPA): an ecologically valid performance-based assessment of executive function. Community control participants (n = 20) and individuals with mild stroke (n = 14) participated in this study. All participants completed the CTPA and a battery of cognitive assessments at initial testing. The control participants completed the CTPA at two different times one week apart. The intra-class correlation coefficient (ICC) for inter-rater reliability for the total score on the CTPA was .991. The ICCs for all of the sub-scores of the CTPA were also high (.889-.977). The CTPA total score was significantly correlated to Condition 4 of the DKEFS Color-Word Interference Test (p = -.425), and the Wechsler Test of Adult Reading (p = -.493). Finally, there were significant differences between control subjects and individuals with mild stroke on the total score of the CTPA (p = .007) and all sub-scores except interpretation failures and total items incorrect. These results are also consistent with other current executive function performance-based assessments and indicate that the CTPA is a reliable and valid performance-based measure of executive function.
The effect of short-term workshop on improving clinical reasoning skill of medical students
Yousefichaijan, Parsa; Jafari, Farshad; Kahbazi, Manijeh; Rafiei, Mohammad; Pakniyat, AbdolGhader
2016-01-01
Background: Clinical reasoning process leads clinician to get purposeful steps from signs and symptoms toward diagnosis and treatment. This research intends to investigate the effect of teaching clinical reasoning on problem-solving skills of medical students. Methods: This research is a semi-experimental study. Nineteen Medical student of the pediatric ward as case group participated in a two-day workshop for training clinical reasoning. Before the workshop, they filled out Diagnostic Thinking Inventory (DTI) questionnaires. Fifteen days after the workshop the DTI questionnaire completed and "key feature" (KF) test and "clinical reasoning problem" (CRP) test was held. 23 Medical student as the control group, without passing the clinical reasoning workshop DTI questionnaire completed, and KF test and CRP test was held. Results: The average score of the DTI questionnaire in the control group was 162.04 and in the case group before the workshop was 153.26 and after the workshop was 181.68. Compare the average score of the DTI questionnaire before and after the workshop there is a significant difference. The difference between average KF test scores in the control and the case group was not significant but between average CRP test scores was significant. Conclusion: Clinical reasoning workshop is effectiveness in promoting problem-solving skills of students. PMID:27579286
The effect of short-term workshop on improving clinical reasoning skill of medical students.
Yousefichaijan, Parsa; Jafari, Farshad; Kahbazi, Manijeh; Rafiei, Mohammad; Pakniyat, AbdolGhader
2016-01-01
Clinical reasoning process leads clinician to get purposeful steps from signs and symptoms toward diagnosis and treatment. This research intends to investigate the effect of teaching clinical reasoning on problem-solving skills of medical students. This research is a semi-experimental study. Nineteen Medical student of the pediatric ward as case group participated in a two-day workshop for training clinical reasoning. Before the workshop, they filled out Diagnostic Thinking Inventory (DTI) questionnaires. Fifteen days after the workshop the DTI questionnaire completed and "key feature" (KF) test and "clinical reasoning problem" (CRP) test was held. 23 Medical student as the control group, without passing the clinical reasoning workshop DTI questionnaire completed, and KF test and CRP test was held. The average score of the DTI questionnaire in the control group was 162.04 and in the case group before the workshop was 153.26 and after the workshop was 181.68. Compare the average score of the DTI questionnaire before and after the workshop there is a significant difference. The difference between average KF test scores in the control and the case group was not significant but between average CRP test scores was significant. Clinical reasoning workshop is effectiveness in promoting problem-solving skills of students.
Smith, Neil R; Kelly, Yvonne J; Nazroo, James Y
2016-05-01
Differences in cognitive development have been observed across a variety of ethnic minority groups but relatively little is known about the persistence of these developmental inequalities over time or generations. A repeat cross-sectional analysis assessed cognitive ability scores of children aged 3, 5 and 7 years from the longitudinal UK Millennium Cohort Study (white UK born n=7630; Indian n=248; Pakistani n=328; Bangladeshi n=87; black Caribbean n=172; and black African n=136). Linear regression estimated ethnic differences in age normed scores at each time point. Multivariable logistic regression estimated within-group generational differences in test scores at each age adjusting stepwise for sociodemographic factors, maternal health behaviours, indicators of the home learning environment and parenting styles. The majority of ethnic minority groups scored lower than the white UK born reference group at 3 years with these differences narrowing incrementally at ages 5 and 7 years. However, the black Caribbean group scored significantly lower than the white UK born reference group throughout early childhood. At 3 years, Pakistani, black Caribbean and black African children with UK born mothers had significantly higher test scores than those with foreign born mothers after baseline adjustment for maternal age and child gender. Controlling for social, behavioural and parenting factors attenuated this generational advantage. By 7 years there were no significant generational differences in baseline models. Ethnic differences in cognitive development diminish throughout childhood for the majority of groups. Cumulative exposure to the UK environment may be associated with higher cognitive development scores. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Integral criteria for large-scale multiple fingerprint solutions
NASA Astrophysics Data System (ADS)
Ushmaev, Oleg S.; Novikov, Sergey O.
2004-08-01
We propose the definition and analysis of the optimal integral similarity score criterion for large scale multmodal civil ID systems. Firstly, the general properties of score distributions for genuine and impostor matches for different systems and input devices are investigated. The empirical statistics was taken from the real biometric tests. Then we carry out the analysis of simultaneous score distributions for a number of combined biometric tests and primary for ultiple fingerprint solutions. The explicit and approximate relations for optimal integral score, which provides the least value of the FRR while the FAR is predefined, have been obtained. The results of real multiple fingerprint test show good correspondence with the theoretical results in the wide range of the False Acceptance and the False Rejection Rates.
Covert, S. Alex
2001-01-01
The U.S. Geological Survey (USGS) and Ohio Environmental Protection Agency (OEPA) collected data on fish from 10 stream sites in 1996 and 3 stream sites in 1997 as part of a comparative study of fish community assessment methods. The sites sampled represent a wide range of basin sizes (ranging from 132?6,330 square kilometers) and surrounding land-use types (urban, agricultural, and mixed). Each agency used its own fish-sampling protocol. Using the Index of Biotic Integrity and Modified Index of Well-Being, differences between data sets were tested for significance by means of the Wilcoxon signed-ranks test (a = 0.05). Results showed that the median of Index of Biotic Integrity differences between data sets was not significantly different from zero (p = 0.2521); however, the same statistical test showed the median differences in the Modified Index of Well-Being scores to be significantly different from zero (p = 0.0158). The differences observed in the Index of Biotic Integrity scores are likely due to natural variability, increased variability at sites with degraded water quality, differences in sampling methods, and low-end adjustments in the Index of Biotic Integrity calculation when fewer than 50 fish were collected. The Modified Index of Well-Being scores calculated by OEPA were significantly higher than those calculated by the USGS. This finding was attributed to the comparatively large numbers and biomass of fish collected by the OEPA. By combining the two indices and viewing them in terms of the percentage attainment of Ohio Warmwater Habitat criteria, the two agencies? data seemed comparable, although the Index of Biotic Integrity scores were more similar than the Modified Index of Well-Being scores.
Van Hoozer, H; Brink, P J; Oppliger, R
1989-04-01
This experimental study tested the effects of overhead transparency design in conjunction with live lecture on retention, recall, and application of data analysis content over three occasions using a Solomon Four-Group, pretest-posttest design. Pretested subjects showed significant (p less than .001) gains in test scores from pre to posttest. No significant differences were found in pretest scores between control and experimental treatment groups or among the posttest scores of either the experimental or control groups.
Test-retest stability of the Task and Ego Orientation Questionnaire.
Lane, Andrew M; Nevill, Alan M; Bowes, Neal; Fox, Kenneth R
2005-09-01
Establishing stability, defined as observing minimal measurement error in a test-retest assessment, is vital to validating psychometric tools. Correlational methods, such as Pearson product-moment, intraclass, and kappa are tests of association or consistency, whereas stability or reproducibility (regarded here as synonymous) assesses the agreement between test-retest scores. Indexes of reproducibility using the Task and Ego Orientation in Sport Questionnaire (TEOSQ; Duda & Nicholls, 1992) were investigated using correlational (Pearson product-moment, intraclass, and kappa) methods, repeated measures multivariate analysis of variance, and calculating the proportion of agreement within a referent value of +/-1 as suggested by Nevill, Lane, Kilgour, Bowes, and Whyte (2001). Two hundred thirteen soccer players completed the TEOSQ on two occasions, 1 week apart. Correlation analyses indicated a stronger test-retest correlation for the Ego subscale than the Task subscale. Multivariate analysis of variance indicated stability for ego items but with significant increases in four task items. The proportion of test-retest agreement scores indicated that all ego items reported relatively poor stability statistics with test-retest scores within a range of +/-1, ranging from 82.7-86.9%. By contrast, all task items showed test-retest difference scores ranging from 92.5-99%, although further analysis indicated that four task subscale items increased significantly. Findings illustrated that correlational methods (Pearson product-moment, intraclass, and kappa) are influenced by the range in scores, and calculating the proportion of agreement of test-retest differences with a referent value of +/-1 could provide additional insight into the stability of the questionnaire. It is suggested that the item-by-item proportion of agreement method proposed by Nevill et al. (2001) should be used to supplement existing methods and could be especially helpful in identifying rogue items in the initial stages of psychometric questionnaire validation.
Social Development in Six-Year-Old Identical and Fraternal Twins.
ERIC Educational Resources Information Center
Schave, Barbara; And Others
Four null hypotheses were formulated to test for relationships between pairs of identical and fraternal twins and their parents on measures of locus of control. Two additional hypotheses were formulated to test for differences between mean scores of identical and fraternal twins and scores of their parents on these same constructs. Twenty pairs of…
Real Cost-Benefit Analysis Is Needed in American Public Education
ERIC Educational Resources Information Center
Stoneberg, Bert D.
2015-01-01
Public school critics often point to rising expenditures and relatively flat test scores to justify their school reform agendas. The claims are flawed because their analyses fail to account for the difference in data types between dollars (ratio) and test scores (interval). A cost-benefit analysis using dollars as a common metric for both costs…
An Analysis of Test Equating Models for the Alabama High School Graduation Examination.
ERIC Educational Resources Information Center
Glowacki, Margaret L.
The purpose of this study was to determine which equating models are appropriate for the Alabama High School Graduation Examination (AHSGE) by equating two previously administered fall forms for each subject area of the AHSGE and determining whether differences exist in the test score distributions or passing scores resulting from the equating…
Falling Behind: New Evidence on the Black-White Achievement Gap
ERIC Educational Resources Information Center
Levitt, Steven D.; Fryer, Roland G.
2004-01-01
On average, black students typically score one standard deviation below white students on standardized tests--roughly the difference in performance between the average 4th grader and the average 8th grader. Historically, what has come to be known as the black-white test-score gap has emerged before children enter kindergarten and has tended to…
Leveraging Gender Differences to Boost Test Scores
ERIC Educational Resources Information Center
Costello, Bill
2008-01-01
According to the 2004 National Assessment of Educational Progress, males who have made it through 12 years of school have significantly poorer reading skills than their female peers. In every age group, boys have been scoring lower than girls annually for more than three decades on U.S. Department of Education reading tests. The longer boys are in…
Test Scores, Dropout Rates, and Transfer Rates as Alternative Indicators of High School Performance
ERIC Educational Resources Information Center
Rumberger, Russell W.; Palardy, Gregory J.
2005-01-01
This study investigated the relationships among several different indicators of high school performance: test scores, dropout rates, transfer rates, and attrition rates. Hierarchical linear models were used to analyze panel data from a sample of 14,199 students who took part in the National Education Longitudinal Survey of 1988. The results…
ERIC Educational Resources Information Center
Dixon-Roman, Ezekiel J.; Everson, Howard T.; McArdle, John J.
2013-01-01
Background: Educational policy makers and test critics often assert that standardized test scores are strongly influenced by factors beyond individual differences in academic achievement such as family income and wealth. Unfortunately, few empirical studies consider the simultaneous and related influences of family income, parental education, and…
The Effect of Grade Norms in College Students: Using the Woodcock-Johnson III Tests of Achievement
ERIC Educational Resources Information Center
Cressman, Markus N.; Liljequist, Laura
2014-01-01
The "Woodcock-Johnson III" Tests of Achievement grade norms versus age norms were examined in the calculation of discrepancy scores in 202 college students. Difference scores were calculated between the "Wechsler Adult Intelligence Scale-3rd Edition" Full Scale IQ and the "Woodcock-Johnson III" Total Achievement,…
Does Television Rot Your Brain? New Evidence from the Coleman Study. NBER Working Paper No. 12021
ERIC Educational Resources Information Center
Gentzkow, Matthew; Shapiro, Jesse M.
2006-01-01
We use heterogeneity in the timing of television's introduction to different local markets to identify the effect of preschool television exposure on standardized test scores later in life. Our preferred point estimate indicates that an additional year of preschool television exposure raises average test scores by about .02 standard deviations. We…
Effects of handcuffs on neuropsychological testing: Implications for criminal forensic evaluations.
Biddle, Christine M; Fazio, Rachel L; Dyshniku, Fiona; Denney, Robert L
2018-01-01
Neuropsychological evaluations are increasingly performed in forensic contexts, including in criminal settings where security sometimes cannot be compromised to facilitate evaluation according to standardized procedures. Interpretation of nonstandardized assessment results poses significant challenges for the neuropsychologist. Research is limited in regard to the validation of neuropsychological test accommodation and modification practices that deviate from standard test administration; there is no published research regarding the effects of hand restraints upon neuropsychological evaluation results. This study provides preliminary results regarding the impact of restraints on motor functioning and common neuropsychological tests with a motor component. When restrained, performance on nearly all tests utilized was significantly impacted, including Trail Making Test A/B, a coding test, and several tests of motor functioning. Significant performance decline was observed in both raw scores and normative scores. Regression models are also provided in order to help forensic neuropsychologists adjust for the effect of hand restraints on raw scores of these tests, as the hand restraints also resulted in significant differences in normative scores; in the most striking case there was nearly a full standard deviation of discrepancy.
Estimation of Occupational Test Norms from Job Analysis Data.
ERIC Educational Resources Information Center
Mecham, Robert C.
Occupational norms exist for some tests, and differences in the distributions of test scores by occupation are evident. Sampling error (SE), situationally specific factors (SSFs), and differences in job content (DIJCs) were explored as possible reasons for the observed differences. SE was explored by analyzing 742 validity studies performed by the…
Comparison of individual answer and group answer with and without structured peer assessment
NASA Astrophysics Data System (ADS)
Kablan, Zeynel
2014-09-01
Background:Cooperative learning activities provide active participation of students leading to better learning. The literature suggests that cooperative learning activities need to be structured for a more effective and productive interaction. Purpose: This study aimed to test the differences among three instructional conditions in terms of science achievement. Sample:A total of 79 fifth-grade students, 42 males (53%) and 37 females (47%), participated in the study. Design and Methods:In the first condition, students answered the teacher's questions individually by raising hands. In the second condition, students discussed the answer in groups and came up with a single group answer. In this condition, the teacher provided only verbal directions to the groups without using any strategy or material. In the third condition, students used a 'peer assessment form' before giving the group answer. A pre-/post-test experimental design was used. Multiple-choice and open-ended tests were used for data collection. One-way analysis of variance (ANOVA) was conducted to test the differences in the test scores between the three groups (individual answer, unstructured group answer and structured group answer). Results:Results showed that there were no significant differences among the three learning conditions in terms of their multiple-choice test scores. In terms of the open-ended test scores, students in the structured group answer condition scored significantly higher than the students in the individual answer condition. Conclusions:Structuring the group work through peer assessment helped to monitor the group discussion, provided a better learning compared to the individual answer condition, and helped students to participate in the activity equally.
[Cognitive markers to discriminate between mild cognitive impairment and normal ageing].
Rodríguez Rodríguez, Nely; Juncos-Rabadán, Onésimo; Facal Mayo, David
2008-01-01
mild cognitive impairment (MCI) has been characterized as a transitional stage between normal ageing and dementia. The aim of the present study was to examine differences between normal ageing and MCI in the performance of several cognitive tests. These differences might serve as differential markers. we performed a longitudinal study (24 months) with two evaluations at 12-monthly intervals using the CAMCOG-R and a verbal learning test [test de aprendizaje verbal España-Complutense (TAVEC)]. The sample was composed of 25 persons aged more than 50 years old (five men and 20 women), distributed into two groups: the control group and the MCI group. To assign persons to either of the two groups, Petersen's MCI criteria were applied to Mini-Mental State Examination (MMSE) scores. repeated measures ANOVA (2 groups x 2 assessment) showed significant differences between the MCI and control group in the CAMCOG-R scores in orientation, language, memory, abstract thinking, executive function and global score and in the TAVEC scores for immediate recall and short- and long-term free and clued recall. No significant differences were found between the first and second assessment or in the interaction group assessment. the results of the present study confirm that the CAMCOG-R and the TAVEC effectively discriminate between normal ageing and MCI and can be used complementarily.
Hessen, Erik
2011-10-01
A repeated observation during memory assessment with the Rey Auditory Verbal Learning Test (RAVLT) is that patients who spontaneously employ a memory rehearsal strategy by repeating the word list more than once achieve better scores than patients who only repeat the word list once. This observation led to concern about the ability of the standard test procedure of RAVLT and similar tests in eliciting the best possible recall scores. The purpose of the present study was to test the hypothesis that a rehearsal recall strategy of repeating the word list more than once would result in improved scores of recall on the RAVLT. We report on differences in outcome after standard administration and after experimental administration on Immediate and Delayed Recall measures from the RAVLT of 50 patients. The experimental administration resulted in significantly improved scores for all the variables employed. Additionally, it was found that patients who failed effort screening showed significantly poorer improvement on Delayed Recall compared with those who passed the effort screening. The general clear improvement both in raw scores and T-scores demonstrates that recall performance can be significantly influenced by the strategy of the patient or by small variations in instructions by the examiner.
Cultural and age differences of three groups of Taiwanese young children's creativity and drawing.
Wei, Mei-Hue; Dzeng, Annie
2013-06-01
This study investigated the cultural and age effects on children's overall creativity and drawing. 1,055 children ages 6 to 8 from three groups--urban and rural Taiwanese children and Taiwanese children of immigrant mothers, all in public schools--were given a creativity test, a people-drawing test, and a free-drawing test. The results showed that the older Taiwanese children scored higher than the young Taiwanese children on people-drawing and free-drawing, but not overall creativity. Drawing and creativity scores increased in accordance with age. In the six-year-old group, a group difference was found only on the scale of people-drawing. Urban Taiwanese children in the eight-year-old group scored higher than the other two groups of children on creativity and free-drawing. Results are discussed in terms of educational opportunities.
Lippert, Christoph; Xiang, Jing; Horta, Danilo; Widmer, Christian; Kadie, Carl; Heckerman, David; Listgarten, Jennifer
2014-11-15
Set-based variance component tests have been identified as a way to increase power in association studies by aggregating weak individual effects. However, the choice of test statistic has been largely ignored even though it may play an important role in obtaining optimal power. We compared a standard statistical test-a score test-with a recently developed likelihood ratio (LR) test. Further, when correction for hidden structure is needed, or gene-gene interactions are sought, state-of-the art algorithms for both the score and LR tests can be computationally impractical. Thus we develop new computationally efficient methods. After reviewing theoretical differences in performance between the score and LR tests, we find empirically on real data that the LR test generally has more power. In particular, on 15 of 17 real datasets, the LR test yielded at least as many associations as the score test-up to 23 more associations-whereas the score test yielded at most one more association than the LR test in the two remaining datasets. On synthetic data, we find that the LR test yielded up to 12% more associations, consistent with our results on real data, but also observe a regime of extremely small signal where the score test yielded up to 25% more associations than the LR test, consistent with theory. Finally, our computational speedups now enable (i) efficient LR testing when the background kernel is full rank, and (ii) efficient score testing when the background kernel changes with each test, as for gene-gene interaction tests. The latter yielded a factor of 2000 speedup on a cohort of size 13 500. Software available at http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm/. heckerma@microsoft.com Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Boone, Kyle Brauer; Victor, Tara L; Wen, Johnny; Razani, Jill; Pontón, Marcel
2007-03-01
The relationship between ethnicity and cognitive test performance was examined in a sample of 161 patients referred for evaluation at a public hospital-affiliated neuropsychology clinic; 83 patients were Caucasian (non-Hispanic), 31 were African-American, 30 were Hispanic, and 17 were Asian. Significant group differences were present on some measures of language (Boston Naming Test), attention (Digit Span ACSS), constructional ability (Rey-Osterrieth [RO] copy), nonverbal processing speed (Trails A), and executive skills (Wisconsin Card Sorting Test [WCST]). Comparison of those who spoke English as a first language (or who learned English concurrently with a second language) versus those who spoke English as a second language (ESL) revealed significantly higher performance in the non-ESL group for Digit Span, Boston Naming Test, and FAS, and a higher score in the ESL group for RO copy. Boston Naming Test scores were significantly related to years educated in the United States; Boston Naming Test and Digit Span scores were significantly correlated with age at which conversational English was first learned and number of years in the United States; and finally, FAS scores were also significantly related to number of years in the United States. These findings are consistent with data from published literature on ethnic differences and the effects of acculturation on cognitive test performance in nonpatients, and also indicate that these observations are not attenuated by the presence of psychiatric or neurologic illness. The results further caution that normative data derived on Caucasian samples may not be appropriate for use with other ethnic groups.
Galetta, Matthew S; Galetta, Kristin M; McCrossin, Jim; Wilson, James A; Moster, Stephen; Galetta, Steven L; Balcer, Laura J; Dorshimer, Gary W; Master, Christina L
2013-05-15
The Sports Concussion Assessment Tool 2 (SCAT2) and King-Devick (K-D) tests have both been proposed as sideline tools to detect sports-related concussion. We performed an exploratory analysis to determine the relation of SCAT2 components, particularly the Standardized Assessment of Concussion (SAC), to K-D test scores in a professional ice hockey team cohort during pre-season baseline testing. We also examined changes in scores for two athletes who developed concussion and had rinkside testing. A modified SCAT2 (no balance testing) and the K-D test, a brief measure of rapid number naming, were administered to 27 members of a professional ice hockey team during the 2011-2012 pre-season. Athletes with concussion also underwent rinkside testing. Lower (worse) scores for the SCAT2 SAC Immediate Memory Score and the overall SAC score were associated with greater (worse) times required to complete the K-D test at baseline. On average, for every 1-point reduction in SAC Immediate Memory Score, we found a corresponding increase (worsening) of K-D time score of 7.3s (95% CI 4.9, 9.7, p<0.001, R(2)=0.62, linear regression, accounting for age). For the overall SAC score, 1-point reductions were associated with K-D score worsening of 2.2s (95% CI 0.6, 3.8, p=0.01, R(2)=0.25, linear regression). In two players tested rinkside immediately following concussion, K-D test scores worsened from baseline by 4.2 and 6.4s. These athletes had no differences found for SCAT2 SAC components, but reported symptoms of concussion. In this study of professional athletes, scores for the K-D test, a measure for which saccadic (fast) eye movements are required for the task of rapid number naming, were associated with reductions in Immediate Memory at a pre-season baseline. Both working memory and saccadic eye movements share closely related anatomical structures, including the dorsolateral prefrontal cortex (DLPFC). A composite of brief rapid sideline tests, including SAC and K-D (and balance testing for non-ice hockey sports), is likely to provide an effective clinical tool to assess the athlete with suspected concussion. Copyright © 2013 Elsevier B.V. All rights reserved.
Bernstein, Lynne E.; Eberhardt, Silvio P.; Auer, Edward T.
2014-01-01
Training with audiovisual (AV) speech has been shown to promote auditory perceptual learning of vocoded acoustic speech by adults with normal hearing. In Experiment 1, we investigated whether AV speech promotes auditory-only (AO) perceptual learning in prelingually deafened adults with late-acquired cochlear implants. Participants were assigned to learn associations between spoken disyllabic C(=consonant)V(=vowel)CVC non-sense words and non-sense pictures (fribbles), under AV and then AO (AV-AO; or counter-balanced AO then AV, AO-AV, during Periods 1 then 2) training conditions. After training on each list of paired-associates (PA), testing was carried out AO. Across all training, AO PA test scores improved (7.2 percentage points) as did identification of consonants in new untrained CVCVC stimuli (3.5 percentage points). However, there was evidence that AV training impeded immediate AO perceptual learning: During Period-1, training scores across AV and AO conditions were not different, but AO test scores were dramatically lower in the AV-trained participants. During Period-2 AO training, the AV-AO participants obtained significantly higher AO test scores, demonstrating their ability to learn the auditory speech. Across both orders of training, whenever training was AV, AO test scores were significantly lower than training scores. Experiment 2 repeated the procedures with vocoded speech and 43 normal-hearing adults. Following AV training, their AO test scores were as high as or higher than following AO training. Also, their CVCVC identification scores patterned differently than those of the cochlear implant users. In Experiment 1, initial consonants were most accurate, and in Experiment 2, medial consonants were most accurate. We suggest that our results are consistent with a multisensory reverse hierarchy theory, which predicts that, whenever possible, perceivers carry out perceptual tasks immediately based on the experience and biases they bring to the task. We point out that while AV training could be an impediment to immediate unisensory perceptual learning in cochlear implant patients, it was also associated with higher scores during training. PMID:25206344
Bernstein, Lynne E; Eberhardt, Silvio P; Auer, Edward T
2014-01-01
Training with audiovisual (AV) speech has been shown to promote auditory perceptual learning of vocoded acoustic speech by adults with normal hearing. In Experiment 1, we investigated whether AV speech promotes auditory-only (AO) perceptual learning in prelingually deafened adults with late-acquired cochlear implants. Participants were assigned to learn associations between spoken disyllabic C(=consonant)V(=vowel)CVC non-sense words and non-sense pictures (fribbles), under AV and then AO (AV-AO; or counter-balanced AO then AV, AO-AV, during Periods 1 then 2) training conditions. After training on each list of paired-associates (PA), testing was carried out AO. Across all training, AO PA test scores improved (7.2 percentage points) as did identification of consonants in new untrained CVCVC stimuli (3.5 percentage points). However, there was evidence that AV training impeded immediate AO perceptual learning: During Period-1, training scores across AV and AO conditions were not different, but AO test scores were dramatically lower in the AV-trained participants. During Period-2 AO training, the AV-AO participants obtained significantly higher AO test scores, demonstrating their ability to learn the auditory speech. Across both orders of training, whenever training was AV, AO test scores were significantly lower than training scores. Experiment 2 repeated the procedures with vocoded speech and 43 normal-hearing adults. Following AV training, their AO test scores were as high as or higher than following AO training. Also, their CVCVC identification scores patterned differently than those of the cochlear implant users. In Experiment 1, initial consonants were most accurate, and in Experiment 2, medial consonants were most accurate. We suggest that our results are consistent with a multisensory reverse hierarchy theory, which predicts that, whenever possible, perceivers carry out perceptual tasks immediately based on the experience and biases they bring to the task. We point out that while AV training could be an impediment to immediate unisensory perceptual learning in cochlear implant patients, it was also associated with higher scores during training.
Ballesteros-Peña, Sendoa; Vallejo-De la Hoz, Gorka; Fernández-Aedo, Irrintzi
2017-12-23
To analyse vein catheterisation and blood gas test-related pain among adult patients in the emergency department and to explore pain score-related factors. An observational and multicentre research study was performed. Patients undergoing vein catheterisation or arterial puncture for gas test were included consecutively. After each procedure, patients scored the pain experienced using the NRS-11. 780 vein catheterisations and 101 blood gas tests were analysed. Venipuncture was scored with an average score of 2.8 (95% CI: 2.6-3), and arterial puncture with 3.6 (95%CI 3.1-4). Iatrogenic pain scores were associated with moderate - high difficulty procedures (P<.001); with the choice of the humeral rather than the radial artery (P=.02) in the gas test and correlated to baseline pain in venipunctures (P<.001). Pain scores related to other variables such as sex, place of origin or needle gauge did not present statistically significant differences. Vein catheterisation and blood gas test-related pain can be considered mild to moderately and moderately painful procedures, respectively. The pain score is associated with certain variables such as the difficulty of the procedure, the anatomic area of the puncture or baseline pain. A better understanding of painful effects related to emergency nursing procedures and the factors associated with pain self-perception could help to determine when and how to act to mitigate this undesired effect. Copyright © 2017 Elsevier España, S.L.U. All rights reserved.
NASA Astrophysics Data System (ADS)
Madsen, Adrian; McKagan, Sarah B.; Sayre, Eleanor C.
2013-12-01
We review the literature on the gender gap on concept inventories in physics. Across studies of the most commonly used mechanics concept inventories, the Force Concept Inventory and Force and Motion Conceptual Evaluation, men’s average pretest scores are always higher than women’s, and in most cases men’s posttest scores are higher as well. The weighted average gender difference on these tests is 13% for pretest scores, 12% for posttest scores, and 6% for normalized gain. This difference is much smaller than the average difference in normalized gain between traditional lecture and interactive engagement (25%), but it is large enough that it could impact the results of studies comparing the effectiveness of different teaching methods. There is sometimes a gender gap on commonly used electricity and magnetism concept inventories, the Brief Electricity and Magnetism Assessment and Conceptual Survey of Electricity and Magnetism, but it is usually much smaller and sometimes is zero or favors women. The weighted average gender difference on these tests is 3.7% for pretest scores, 8.5% for posttest scores, and 6% for normalized gain. There are far fewer studies of the gender gap on electricity and magnetism concept inventories and much more variation in the existing studies. Based on our analysis of 26 published articles comparing the impact of 30 factors that could potentially influence the gender gap, no single factor is sufficient to explain the gap. Several high-profile studies that have claimed to account for or reduce the gender gap have failed to be replicated in subsequent studies, suggesting that isolated claims of explanations of the gender gap should be interpreted with caution. For example, claims that the gender gap could be eliminated through interactive engagement teaching methods or through a “values affirmation writing exercise” were not supported by subsequent studies. Suggestions that the gender gap might be reduced by changing the wording of “male-oriented” questions or refraining from asking demographic questions before administering the test are not supported by the evidence. Other factors, such as gender differences in background preparation, scores on different kinds of assessment, and splits between how students respond to test questions when answering for themselves or for a “scientist” do contribute to a difference between male and female responses, but the size of these differences is smaller than the size of the overall gender gap, suggesting that the gender gap is most likely due to the combination of many small factors rather than any one factor that can easily be modified.
Aslam, Tariq M; Tahir, Humza J; Parry, Neil R A; Murray, Ian J; Kwak, Kun; Heyes, Richard; Salleh, Mahani M; Czanner, Gabriela; Ashworth, Jane
2016-10-01
To report on the utility of a computer tablet-based method for automated testing of visual acuity in children based on the principles of game design. We describe the testing procedure and present repeatability as well as agreement of the score with accepted visual acuity measures. Reliability and validity study. Setting: Manchester Royal Eye Hospital Pediatric Ophthalmology Outpatients Department. Total of 112 sequentially recruited patients. For each patient 1 eye was tested with the Mobile Assessment of Vision by intERactIve Computer for Children (MAVERIC-C) system, consisting of a software application running on a computer tablet, housed in a bespoke viewing chamber. The application elicited touch screen responses using a game design to encourage compliance and automatically acquire visual acuity scores of participating patients. Acuity was then assessed by an examiner with a standard chart-based near ETDRS acuity test before the MAVERIC-C assessment was repeated. Reliability of MAVERIC-C near visual acuity score and agreement of MAVERIC-C score with near ETDRS chart for visual acuity. Altogether, 106 children (95%) completed the MAVERIC-C system without assistance. The vision scores demonstrated satisfactory reliability, with test-retest VA scores having a mean difference of 0.001 (SD ±0.136) and limits of agreement of 2 SD (LOA) of ±0.267. Comparison with the near EDTRS chart showed agreement with a mean difference of -0.0879 (±0.106) with LOA of ±0.208. This study demonstrates promising utility for software using a game design to enable automated testing of acuity in children with ophthalmic disease in an objective and accurate manner. Copyright © 2016 Elsevier Inc. All rights reserved.
Patiraki, Elisabeth I; Papathanassoglou, Elizabeth D E; Tafas, Cheryl; Akarepi, Vasiliki; Katsaragakis, Stelios G; Kampitsi, Anjuleta; Lemonidou, Chrysoula
2006-12-01
The purpose of this randomized controlled study was to explore the effectiveness of an educational intervention on nurses' attitudes and knowledge regarding pain management and to explore associations with nurses' characteristics. A four Solomon group experimental design was employed to assess the effect of the intervention and potential effects of pre-intervention testing. One hundred and twelve nurses were randomized to two intervention and two control groups. The intervention was based on viewing a series of educational videotapes and case scenarios. The Validated Hellenic version of the Nurses Knowledge and Attitudes Survey Regarding Pain (GV-NKASRP) was used. Pre-intervention scores revealed various limitations in regard to pain assessment and management. At the pre-test, the average number of correct answers was 17.58+/-7.58 (45.1%+/-19.3% of total questions). Pre-intervention scores differed significantly among participants with different educational backgrounds (P < 0.0001). A significant effect of pain education on total knowledge scores as well as regarding specific questions was detected. Intervention group participants provided 6.11+/-5.55 additional correct answers (15.66%+/-14.23% improvement, P < 0.0001), and they exhibited significantly improved post-test scores compared to controls (26.49+/-5.24 vs. 18.75+/-4.48; P < 0.0001). A potential negative effect of pre-test on knowledge gain for specific items and for total scores was detected. These findings suggest low pre-test knowledge scores among Hellenic oncology nurses and a significant effect of the intervention.
ERIC Educational Resources Information Center
Duong, Minh Quang
2011-01-01
Testing programs often use multiple test forms of the same test to control item exposure and to ensure test security. Although test forms are constructed to be as similar as possible, they often differ. Test equating techniques are those statistical methods used to adjust scores obtained on different test forms of the same test so that they are…
Richardson, L I; Thurman, R L; Bassler, O C
1978-07-01
The Peabody Mathematics Readiness Test was developed to assess mathematics readiness and identify children who would encounter difficulty in first-grade mathematics. In the present study, we compared performances of mentally retarded subjects and first-grade subjects on this test. Retarded subjects' mean scores were significantly lower than those of the nonretarded subjects on the drawing test; however, there were no significant differences between the mean scores of the groups on the other five subscales.
Derakhshandeh, Zahra; Amini, Mitra; Kojuri, Javad; Dehbozorgian, Marziyeh
2018-01-01
Clinical reasoning is one of the most important skills in the process of training a medical student to become an efficient physician. Assessment of the reasoning skills in a medical school program is important to direct students' learning. One of the tests for measuring the clinical reasoning ability is Clinical Reasoning Problems (CRPs). The major aim of this study is to measure psychometric qualities of CRPs and define correlation between this test and routine MCQ in cardiology department of Shiraz medical school. This study was a descriptive study conducted on total cardiology residents of Shiraz Medical School. The study population consists of 40 residents in 2014. The routine CRPs and the MCQ tests was designed based on similar objectives and were carried out simultaneously. Reliability, item difficulty, item discrimination, and correlation between each item and the total score of CRPs were all measured by Excel and SPSS software for checking psycometeric CRPs test. Furthermore, we calculated the correlation between CRPs test and MCQ test. The mean differences of CRPs test score between residents' academic year [second, third and fourth year] were also evaluated by Analysis of variances test (One Way ANOVA) using SPSS software (version 20)(α=0.05). The mean and standard deviation of score in CRPs was 10.19 ±3.39 out of 20; in MCQ, it was 13.15±3.81 out of 20. Item difficulty was in the range of 0.27-0.72; item discrimination was 0.30-0.75 with question No.3 being the exception (that was 0.24). The correlation between each item and the total score of CRP was 0.26-0.87; the correlation between CRPs test and MCQ test was 0.68 (p<0.001). The reliability of the CRPs was 0.72 as calculated by using Cronbach's alpha. The mean score of CRPs was different among residents based on their academic year and this difference was statistically significant (p<0.001). The results of this present investigation revealed that CRPs could be reliable test for measuring clinical reasoning in residents. It can be included in cardiology residency assessment programs.
Winton, Lisa M; Ferguson, Elizabeth M N; Hsu, Chiu-Hsieh; Agee, Neal; Eubanks, Ryan D; O'Neill, Patrick J; Goldberg, Ross F; Kopelman, Tammy R; Nodora, Jesse N; Caruso, Daniel M; Komenaka, Ian K
To determine whether use of self-assessment (SA) questions affects the effectiveness of weekly didactic grand rounds presentations. From 26 consecutive grand rounds presentations from August 2013 to April 2014, a 52-question multiple-choice test was administered based on 2 questions from each presentation. Community teaching institution. General surgery residents, students, and attending physicians. The test was administered to 66 participants. The mean score was 41.8%. There was no difference in test score based on experience with similar scores for junior residents, senior residents, and attending surgeons (43%, 46%, and 44%; p = 0.13). Most participants felt they would be most interested in presentations directly related to their surgical specialty. Participants, however, did not score differently on topics which were the focus of the program (40% vs. 42%; p = 0.85). Journal club presentations (39% vs. others 42%; p = 0.33) also did not affect the score. The Pearson correlation coefficient for attendance was 0.49 (p < 0.0001) demonstrated that attendance was very important. Participation in the weekly SA was significantly associated with improved score as those who participated in SA scored over 20% higher than those who did not (59% vs. 38%; p < 0.0001). Based on multiple linear regression for mean score, SA explained the variation in score more than attendance. The current study found that without preparation approximately 40% of material presented is retained after 10 months. Participation in weekly SA significantly improved retention of information from grand rounds presentations. Copyright © 2016 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
Engquist, Katherine D; Smith, Craig A; Chimera, Nicole J; Warren, Meghan
2015-08-01
Although various studies have assessed performance of athletes on the Functional Movement Screen (FMS) and the Y Balance Test (YBT), no study to date has directly evaluated a comparison of performance between athletes and members of the general population. Thus, to better understand the application of the FMS and the YBT to general college students, this study examined whether or not general college students performed similarly to student-athletes on the FMS (composite and movement pattern scores) and the YBT (composite and reach directions). This study evaluated 167 Division I student-athletes and 103 general college students from the same university on the FMS and the YBT. No difference was found in FMS composite scores between student-athletes and general college students. For FMS movement patterns, female student-athletes scored higher than general college students in the deep squat. No difference was found for men in any FMS movement pattern. Female student-athletes scored higher than female general college students in YBT composite scores; no difference was found for men in YBT composite scores. In analysis of YBT reach directions, female student-athletes scored higher than female general college students in all reach directions, whereas no difference was found in men. Existing research on the FMS composite score in athletic populations may apply to a general college population for the purposes of preparticipation screening, injury prediction, etc. Existing research on the YBT in male athletic populations is expected to apply equally to general college males for the purposes of preparticipation screening, injury prediction, etc.
Khan, Asaduzzaman; Chien, Chi-Wen; Bagraith, Karl S
2015-04-01
To investigate whether using a parametric statistic in comparing groups leads to different conclusions when using summative scores from rating scales compared with using their corresponding Rasch-based measures. A Monte Carlo simulation study was designed to examine between-group differences in the change scores derived from summative scores from rating scales, and those derived from their corresponding Rasch-based measures, using 1-way analysis of variance. The degree of inconsistency between the 2 scoring approaches (i.e. summative and Rasch-based) was examined, using varying sample sizes, scale difficulties and person ability conditions. This simulation study revealed scaling artefacts that could arise from using summative scores rather than Rasch-based measures for determining the changes between groups. The group differences in the change scores were statistically significant for summative scores under all test conditions and sample size scenarios. However, none of the group differences in the change scores were significant when using the corresponding Rasch-based measures. This study raises questions about the validity of the inference on group differences of summative score changes in parametric analyses. Moreover, it provides a rationale for the use of Rasch-based measures, which can allow valid parametric analyses of rating scale data.
What Makes Nations Intelligent?
Hunt, Earl
2012-05-01
Modern society is driven by the use of cognitive artifacts: physical instruments or styles of reasoning that amplify our ability to think. The artifacts range from writing systems to computers. In everyday life, a person demonstrates intelligence by showing skill in using these artifacts. Intelligence tests and their surrogates force examinees to exhibit some of these skills but not others. This is why test scores correlate substantially but not perfectly with a variety of measures of socioeconomic success. The same thing is true at the international level. Nations can be evaluated by the extent to which their citizens score well on cognitive tests, including both avowed intelligence tests and a variety of tests of academic achievement. The resulting scores are substantially correlated with various indices of national wealth, health, environmental quality, and schooling and with a vaguer variable, social commitment to innovation. These environmental variables are suggested as causes of the differences in general cognitive skills between national populations. It is conceivable that differences in gene pools also contribute to international and, within nations, group differences in cognitive skills, but at present it is impossible to evaluate the extent of genetic influences. © The Author(s) 2012.
Chen, Hui-Ya; Tang, Pei-Fang
2016-03-01
Dual-task Timed "Up & Go" (TUG) tests are likely to have applications different from those of a single-task TUG test and may have different contributing factors. The purpose of this study was to compare factors contributing to performance on single- and dual-task TUG tests. This investigation was a cross-sectional study. Sixty-four adults who were more than 50 years of age and dwelled in the community were recruited. Interviews and physical examinations were performed to identify potential contributors to TUG test performance. The time to complete the single-task TUG test (TUGsingle) or the dual-task TUG test, which consisted of completing the TUG test while performing a serial subtraction task (TUGcognitive) or while carrying water (TUGmanual), was measured. Age, hip extensor strength, walking speed, general mental function, and Stroop scores for word and color were significantly associated with performance on all TUG tests. Hierarchical multiple regression models, without the input of walking speed, revealed different independent factors contributing to TUGsingle performance (Mini-Mental Status Examination score, β=-0.32), TUGmanual performance (age, β=0.35), and TUGcognitive performance (Stroop word score, β=-0.40; Mini-Mental Status Examination score, β=-0.31). At least 40% of the variance in the performance on the 3 TUG tests was not explained by common clinical measures, even when the factor of walking speed was considered. However, this study successfully identified some important factors contributing to performance on different TUG tests, and other studies have reported similar findings for single-task TUG test and dual-task gait performance. Although the TUGsingle and the TUGcognitive shared general mental function as a common factor, the TUGmanual was uniquely influenced by age and the TUGcognitive was uniquely influenced by focused attention. These results suggest that both common and unique factors contribute to performance on single- and dual-task TUG tests and suggest important applications of the combined use of the 3 TUG tests. © 2016 American Physical Therapy Association.
Ott, Summer; Schatz, Philip; Solomon, Gary; Ryan, Joseph J
2014-03-01
This study documented baseline neurocognitive performance of 23,815 athletes on the Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) test. Specifically, 9,733 Hispanic, Spanish-speaking athletes who completed the ImPACT test in English and 2,087 Hispanic, Spanish-speaking athletes who completed the test in Spanish were compared with 11,955 English-speaking athletes who completed the test in English. Athletes were assigned to age groups (13-15, 16-18). Results revealed a significant effect of language group (p < .001; partial η(2) = 0.06) and age (p < .001; partial η(2) = 0.01) on test performance. Younger athletes performed more poorly than older athletes, and Spanish-speaking athletes completing the test in Spanish scored more poorly than Spanish-speaking and English-speaking athletes completing the test in English, on all Composite scores and Total Symptom scores. Spanish-speaking athletes completing the test in English also performed more poorly than English-speaking athletes completing the test in English on three Composite scores. These differences in performance and reported symptoms highlight the need for caution in interpreting ImPACT test data for Hispanic Americans.
Bayley-III: Cultural differences and language scale validity in a Danish sample.
Krogh, Marianne T; Vaever, Mette S
2016-12-01
The purpose of this study was to investigate cultural differences between Danish and American children at 2 and 3 years as measured with the developmental test Bayley-III, and to investigate the Bayley-III Language Scale validity. The Danish children (N = 43) were tested with the Bayley-III and their parents completed an additional language questionnaire (the MacArthur-Bates CDI). Results showed that scores from the Danish children did not differ significantly from the American norms on the Cognitive or Motor Scale, but the Danish sample scored significantly higher on the Language Scale. A comparison of the Bayley-III Language subtests with the CDI showed that the two measures correlated significantly, but the percentile score from the CDI was significantly higher than the percentile score from the Bayley-III Language subtests. This could be because the two instruments measure slightly different areas of language development, or because the Bayley-III overestimates language development in Danish children. However, due to the limitations of the current study, further research is needed to clarify this issue. © 2016 Scandinavian Psychological Associations and John Wiley & Sons Ltd.
A Tale of Two Curricula: The performance of two thousand students in introductory electromagnetism
NASA Astrophysics Data System (ADS)
Schatz, Michael; Kohlmyer, Matthew; Caballero, Marcos; Chabay, Ruth; Sherwood, Bruce; Catrambone, Richard; Marr, Marcus; Haugen, Mark; Ding, Lin
2009-03-01
Student performance in introductory calculus-based electromagnetism (E&M) courses at four large research universities was measured using the Brief Electricity and Magnetism Assessment (BEMA). Two different curricula were used at these universities: a traditional E&M curriculum and the Matter & Interactions (M&I) curriculum. At each university, post-instruction BEMA test averages were significantly higher for the M&I curriculum than for the traditional curriculum. The differences in post-test averages cannot be explained by differences in variables such as pre-instruction BEMA scores, grade point average, or SAT scores.
Gitau, Tabither M; Micklesfield, Lisa K; Pettifor, John M; Norris, Shane A
2014-01-01
This cross-sectional study of urban high schools in Johannesburg, South Africa, sought to examine eating attitudes, body image and self-esteem among male adolescents (n = 391). Anthropometric measurements, Eating Attitudes Test-26 (EAT-26), Rosenberg self-esteem, body image satisfaction and perception of females were collected at age 13, 15 and 17 years. Descriptive analysis was done to describe the sample, and non-parametric Wilcoxon Mann-Whitney test was used to test for significant differences between data that were not normally distributed (EAT-26). Spearman's rank correlation coefficient analyses were conducted to test for associations between self-esteem scores and eating attitudes, body mass indices and body image satisfaction scores. To assess the differences between groups that were normally distributed chi-square tests were carried out. Ethnic differences significantly affected adolescent boys' body mass index (BMI), eating attitudes and self-esteem; White boys had higher self-esteem, BMI and normal eating attitudes than the Black boys did. BMI was positively associated with self-esteem (p = 0.01, r = 0.134) and negatively with dieting behaviour in White boys (p = 0.004, r = -0.257), and with lower EAT-26 bulimic and oral control scores in Black boys. In conclusion, the findings highlight ethnic differences and a need to better understand cultural differences that influence adolescent attitudes and behaviour.
Fukui, Yuriko; Noda, Saeko; Okada, Midori; Mihara, Nakako; Kawakami, Yoriko; Bore, Miles; Munro, Don; Powis, David
2014-01-01
The Personal Qualities Assessment (PQA), developed by the University of Newcastle, Australia to assess the aptitude of future medical professionals, has been used in Western countries. The objective was to investigate whether the PQA is appropriate for Japanese medical school applicants. Two of the PQA tests, Libertarian-Dual-Communitarian moral orientations (Mojac) and Narcissism, Aloofness, Confidence, and Empathy (NACE), were translated into Japanese, and administered at the Tokyo Women's Medical University entrance examinations from 2007 to 2009. The distributions of the applicants' Mojac and NACE scores were close to the normal distribution, and the mean scores did not exhibit a large difference from those in Western countries. The only significant difference was that the mean score of the NACE test was slightly lower than the Western norm. The translated PQA tests may be appropriate for use with Japanese applicants, though further research considering cultural differences is required.
Hale, Corinne R; Casey, Joseph E; Ricciardi, Philip W R
2014-02-01
Wechsler Intelligence Test for Children-IV core subtest scores of 472 children were cluster analyzed to determine if reliable and valid subgroups would emerge. Three subgroups were identified. Clusters were reliable across different stages of the analysis as well as across algorithms and samples. With respect to external validity, the Globally Low cluster differed from the other two clusters on Wechsler Individual Achievement Test-II Word Reading, Numerical Operations, and Spelling subtests, whereas the latter two clusters did not differ from one another. The clusters derived have been identified in studies using previous WISC editions. Clusters characterized by poor performance on subtests historically associated with the VIQ (i.e., VCI + WMI) and PIQ (i.e., POI + PSI) did not emerge, nor did a cluster characterized by low scores on PRI subtests. Picture Concepts represented the highest subtest score in every cluster, failing to vary in a predictable manner with the other PRI subtests.
Lin, Deng-Juin; Li, Ya-Hsin; Pai, Jar-Yuan; Sheu, Ing-Cheau; Glen, Robert; Chou, Ming-Jen; Lee, Ching-Yi
2009-12-19
Chronic kidney disease (CKD) is a serious public health problem in Taiwan and the world. The most effective, affordable treatments involve early prevention/detection/intervention, requiring screening. Successfully implementing CKD programs requires good patient participation, affected by patient perceptions of screening service quality. Service quality improvements can help make such programs more successful. Thus, good tools for assessing service quality perceptions are important. to investigate using a modified SERVQUAL questionnaire in assessing patient expectations, perceptions, and loyalty towards kidney disease screening service quality. 1595 kidney disease screening program patients in Taichung City were requested to complete and return a modified kidney disease screening SERVQUAL questionnaire. 1187 returned them. Incomplete ones (102) were culled and 1085 were chosen as effective for use. Paired t-tests, correlation tests, ANOVA, LSD test, and factor analysis identified the characteristics and factors of service quality. The paired t-test tested expectation score and perception score gaps. A structural equation modeling system examined satisfaction-based components' relationships. The effective response rate was 91.4%. Several methods verified validity. Cronbach's alpha on internal reliability was above 0.902. On patient satisfaction, expectation scores are high: 6.50 (0.82), but perception scores are significantly lower 6.14 (1.02). Older patients' perception scores are lower than younger patients'. Expectation and perception scores for patients with different types of jobs are significantly different. Patients higher on education have lower scores for expectation (r = -0.09) and perception (r = -0.26). Factor analysis identified three factors in the 22 item SERVQUAL form, which account for 80.8% of the total variance for the expectation scores and 86.9% of the total variance for the satisfaction scores. Expectation and perception score gaps in all 22 items are significant. The goodness-of-fit summary of the SEM results indicates that expectations and perceptions are positively correlated, perceptions and loyalty are positively correlated, but expectations and loyalty are not positively correlated. The results of this research suggest that the SERVQUAL instrument is a useful measurement tool in assessing and monitoring service quality in kidney disease screening services, enabling the staff to identify where service improvements are needed from the patients' perspectives.
NASA Astrophysics Data System (ADS)
Black, Alice A. (Jill)
Research has shown the presence of many Earth science misconceptions and conceptual difficulties that may impede concept understanding, and has also identified a number of categories of spatial ability. Although spatial ability has been linked to high performance in science, some researchers believe it has been overlooked in traditional education. Evidence exists that spatial ability can be improved. This correlational study investigated the relationship among Earth science conceptual understanding, three types of spatial ability, and psychological gender, a self-classification that reflects socially-accepted personality and gender traits. A test of Earth science concept understanding, the Earth Science Concepts (ESC) test, was developed and field tested from 2001 to 2003 in 15 sections of university classes. Criterion validity was .60, significant at the .01 level. Spearman/Brown reliability was .74 and Kuder/Richardson reliability was .63. The Purdue Visualization of Rotations (PVOR) (mental rotation), the Group Embedded Figures Test (GEFT) (spatial perception), the Differential Aptitude Test: Space Relations (DAT) (spatial visualization), and the Bem Inventory (BI) (psychological gender) were administered to 97 non-major university students enrolled in undergraduate science classes. Spearman correlations revealed moderately significant correlations at the .01 level between ESC scores and each of the three spatial ability test scores. Stepwise regression analysis indicated that PVOR scores were the best predictor of ESC scores, and showed that spatial ability scores accounted for 27% of the total variation in ESC scores. Spatial test scores were moderately or weakly correlated with each other. No significant correlations were found among BI scores and other test scores. Scantron difficulty analysis of ESC items produced difficulty ratings ranging from 33.04 to 96.43, indicating the percentage of students who answered incorrectly. Mean score on the ESC was 34%, indicating that the non-majors tested exhibited many Earth science misconceptions and conceptual difficulties. A number of significant results were found when independent t-tests and correlations were conducted among test scores and demographic variables. The number of previous university Earth science courses was significantly related to ESC scores. Preservice elementary/middle majors differed significantly in several ways from other non-majors, and several earlier results were not supported. Results of this study indicate that an important opportunity may exist to improve Earth science conceptual understanding by focusing on spatial ability, a cognitive ability that has heretofore not been directly addressed in schools.
Association between lateral bias and personality traits in the domestic dog (Canis familiaris).
Barnard, Shanis; Wells, Deborah L; Hepper, Peter G; Milligan, Adam D S
2017-08-01
Behavioral laterality reflects the cerebral functional asymmetry. Measures of laterality have been associated with emotional stress, problem-solving, and personality in some vertebrate species. Thus far, the association between laterality and personality in the domestic dog has been largely overlooked. In this study, we investigated whether lateralized (left or right) and ambilateral dogs differed in their behavioral response to a standardized personality test. The dog's preferred paw to hold a Kong ball filled with food and the first paw used to step-off from a standing position were scored as laterality measures. The Dog Mentality Assessment (DMA) test was used to assess 5 personality traits (e.g., sociability, aggressiveness) and a broader shy-boldness dimension. No differences emerged between left- and right-biased dogs on any personality trait. Instead, ambilateral dogs, scored using the Kong test, scored higher on their playfulness (Z = -1.98, p = .048) and Aggressiveness (Z = -2.10, p = .036) trait scores than did lateralized (irrespective of side) dogs. Also, ambilateral dogs assessed by using the First-Stepping test scored higher than lateralized dogs on the Sociability (Z = -2.83, p = .005) and Shy-Boldness (Z = -2.34, p = .019) trait scores. Overall, we found evidence of a link between canine personality and behavioral laterality, and this was especially true for those traits relating to stronger emotional reactivity, such as aggressiveness, fearfulness, and sociability. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Gardner, Ryan M; Yengo-Kahn, Aaron; Bonfield, Christopher M; Solomon, Gary S
2017-02-01
Baseline and post-concussion neurocognitive testing is useful in managing concussed athletes. Attention deficit hyperactivity disorder (ADHD) and stimulant medications are recognized as potential modifiers of performance on neurocognitive testing by the Concussion in Sport Group. Our goal was to assess whether individuals with ADHD perform differently on post-concussion testing and if this difference is related to the use of stimulants. Retrospective case-control study in which 4373 athletes underwent baseline and post-concussion testing using the ImPACT battery. 277 athletes self-reported a history of ADHD, of which, 206 reported no stimulant treatment and 69 reported stimulant treatment. Each group was matched with participants reporting no history of ADHD or stimulant use on several biopsychosocial characteristics. Non-parametric tests were used to assess ImPACT composite score differences between groups. Participants with ADHD had worse verbal memory, visual memory, visual motor speed, and reaction time scores than matched controls at baseline and post-concussion, all with p ≤ .001 and |r|≥ 0.100. Athletes without stimulant treatment had lower verbal memory, visual memory, visual motor speed, and reaction time scores than controls at baseline (p ≤ 0.01, |r|≥ 0.100 [except verbal memory, r = -0.088]) and post-concussion (p = 0.000, |r|> 0.100). Athletes with stimulant treatment had lower verbal memory (Baseline: p = 0.047, r = -0.108; Post-concussion: p = 0.023, r = -0.124) and visual memory scores (Baseline: p = 0.013, r = -0.134; Post-concussion: p = 0.003, r = -0.162) but equivalent visual motor speed and reaction time scores versus controls at baseline and post-concussion. ADHD-specific baseline and post-concussion neuropsychological profiles, as well as stimulant medication status, may need to be considered when interpreting ImPACT test results. Further investigation into the effects of ADHD and stimulant use on recovery from sport-related concussion (SRC) is warranted.
Gros, Auriane; Manera, Valeria; Daumas, Anaïs; Guillemin, Sophie; Rouaud, Olivier; Martin, Martine Lemesle; Giroud, Maurice; Béjot, Yannick
2016-01-01
Objective: At present emotional experience and implicit emotion regulation (IER) abilities are mainly assessed though self-reports, which are subjected to several biases. The aim of the present studies was to validate the Clock’N test, a recently developed time estimation task employing emotional priming to assess implicitly emotional reactivity and IER. Methods: In Study 1, the Clock’N test was administered to 150 healthy participants with different age, laterality and gender, in order to ascertain whether these factors affected the test results. In phase 1 participant were asked to judge the duration of seven sounds. In phase 2, before judging the duration of the same sounds, participants were presented with short arousing video-clip used as emotional priming stimuli. Time warp was calculated as the difference in time estimation between phase 2 and phase 1, and used to assess how emotions affected subjective time estimations. In study 2, a representative sample was selected to provide normative scores to be employed to assess emotional reactivity (Score 1) and IER (Score 2), and to calculate statistical cutoffs, based on the 10th and 90th score distribution percentiles. Results: Converging with previous findings, the results of study 1 suggested that the Clock’N test can be employed to assess both emotional reactivity, as indexed by an initial time underestimation, and IER, as indexed by a progressive shift to time overestimation. No effects of gender, age and laterality were found. Conclusions: These results suggest that the Clock’N test is adapted to assess emotional reactivity and IER. After collection of data on the test discriminant and convergent validity, this test may be employed to assess deficits in these abilities in different clinical populations. PMID:26903825
Delgado, Cherlene; Bentley, Ellison; Hetzel, Scott; Smith, Lesley J
2015-01-01
Objective To compare analgesia provided by carprofen or tramadol in dogs after enucleation. Design Randomized, masked trial Animals Forty-three dogs Procedures Client-owned dogs admitted for routine enucleation were randomly assigned to receive either carprofen or tramadol orally 2 hours prior to surgery and 12 hours after the first dose. Dogs were scored for pain at baseline, and postoperatively at 0.25, 0.5, 1, 2, 4, 6, 8, 24, and 30 hours after extubation. Dogs received identical premedication and inhalation anesthesia regimens, including premedication with hydromorphone. If the total pain score was ≥9, if there was a score ≥ 3 in any one category, or if the visual analog scale score (VAS) was ≥35 combined with a palpation score of >0, rescue analgesia (hydromorphone) was administered and treatment failure was recorded. Characteristics between groups were compared with a Student’s t-test and Fisher’s exact test. The incidence of rescue was compared between groups using a log rank test. Pain scores and VAS scores between groups were compared using repeated measures ANOVA. Results There was no difference in age (p=0.493), gender (p=0.366) or baseline pain scores (p=0.288) between groups. Significantly more dogs receiving tramadol required rescue analgesia (6/21) compared to dogs receiving carprofen (1/22; p=0.035). Pain and VAS scores decreased linearly over time (p=0.038, p<0.001, respectively). There were no significant differences in pain (p=0.915) or VAS scores (p=0.372) between groups at any time point (dogs were excluded from analysis after rescue). Conclusions and Clinical Relevance This study suggests that carprofen, with opioid premedication, provides more effective post-operative analgesia than tramadol in dogs undergoing enucleation. PMID:25459482
Effects of Alternate Test Formats in Online Courses
ERIC Educational Resources Information Center
Francis, Alan
2010-01-01
The purpose of this study was to compare differences in methods of testing for two undergraduate online courses to determine the effect of alternate test formats in relation to participant grades. Specific purposes of this study were to determine whether a difference existed in student test scores between the control and treatment groups and…
A weighted generalized score statistic for comparison of predictive values of diagnostic tests.
Kosinski, Andrzej S
2013-03-15
Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting. Copyright © 2012 John Wiley & Sons, Ltd.
The effect of adhesive dressing edges on cutaneous irritancy and skin barrier function.
Dykes, P J
2007-03-01
To assess the effect of repeated application and removal of adhesive edges from wound-care products on cutaneous irritancy and barrier function in normal volunteer subjects. This was a study using a 'repeat-insult patch test'. Adhesive edges from six commonly used wound-care products were applied continuously to the same site (six applications over a 14-day period) in 30 normal volunteer subjects. The test sites were assessed clinically before product reapplication using established ranking scales for cutaneous erythema. The cumulative irritancy score (CIS) for each test site was determined by adding the erythema scores at days 3, 5, 8, 10, 12 and 15. At the study end the barrier function of each test site was assessed by measuring transepidermal water loss (TEWL). The CIS showed that the products fall into two distinct groups, with Mepilex, Tielle and Allevyn giving low scores and Biatain, Comfeel and DuoDERM higher scores. Statistical analysis indicated significant differences (p < 0.05) between Mepilex and Biatain, Mepilex and Comfeel, Mepilex and DuoDERM, Tielle and Biatain, Allevyn and Biatain. The mean TEWL values also indicated that the products fall into two distinct groups: Mepilex, Tielle and Allevyn with low mean values close to that of normal adjacent back skin and Biatain, Comfeel and DuoDERM with much higher mean values. Statistical analysis indicated that Mepilex, Tielle and Allevyn were not significantly different from normal skin (p < 0.05), whereas Biatain, Comfeel and DuoDERM were significantly higher than normal skin and the other products tested. The results show clear differences between products; the clinical scores and TEWL measurements indicate that the products fall into two distinct groups. This novel approach seems able to discriminate between adhesive borders and may be useful during product development and in selecting products for clinical trials.
Lippert, Christoph; Xiang, Jing; Horta, Danilo; Widmer, Christian; Kadie, Carl; Heckerman, David; Listgarten, Jennifer
2014-01-01
Motivation: Set-based variance component tests have been identified as a way to increase power in association studies by aggregating weak individual effects. However, the choice of test statistic has been largely ignored even though it may play an important role in obtaining optimal power. We compared a standard statistical test—a score test—with a recently developed likelihood ratio (LR) test. Further, when correction for hidden structure is needed, or gene–gene interactions are sought, state-of-the art algorithms for both the score and LR tests can be computationally impractical. Thus we develop new computationally efficient methods. Results: After reviewing theoretical differences in performance between the score and LR tests, we find empirically on real data that the LR test generally has more power. In particular, on 15 of 17 real datasets, the LR test yielded at least as many associations as the score test—up to 23 more associations—whereas the score test yielded at most one more association than the LR test in the two remaining datasets. On synthetic data, we find that the LR test yielded up to 12% more associations, consistent with our results on real data, but also observe a regime of extremely small signal where the score test yielded up to 25% more associations than the LR test, consistent with theory. Finally, our computational speedups now enable (i) efficient LR testing when the background kernel is full rank, and (ii) efficient score testing when the background kernel changes with each test, as for gene–gene interaction tests. The latter yielded a factor of 2000 speedup on a cohort of size 13 500. Availability: Software available at http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm/. Contact: heckerma@microsoft.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25075117
Constructing an Online Test Framework, Using the Example of a Sign Language Receptive Skills Test
ERIC Educational Resources Information Center
Haug, Tobias; Herman, Rosalind; Woll, Bencie
2015-01-01
This paper presents the features of an online test framework for a receptive skills test that has been adapted, based on a British template, into different sign languages. The online test includes features that meet the needs of the different sign language versions. Features such as usability of the test, automatic saving of scores, and score…
NASA Astrophysics Data System (ADS)
Chung-Schickler, Genevieve C.
The purpose of this study was to evaluate the effect of cooperative learning strategies on students' attitudes toward science and achievement in BSC 1005L, a non-science majors' general biology laboratory course at an urban community college. Data were gathered on the participants' attitudes toward science and cognitive biology level pre and post treatment in BSC 1005L. Elements of the Learning Together model developed by Johnson and Johnson and the Student Team-Achievement Divisions model created by Slavin were incorporated into the experimental sections of BSC 1005L. Four sections of BSC 1005L participated in this study. Participants were enrolled in the 1998 spring (January) term. Students met weekly in a two hour laboratory session. The treatment was administered to the experimental group over a ten week period. A quasi-experimental pretest-posttest control group design was used. Students in the cooperative learning group (nsb1 = 27) were administered the Test of Science-Related Attitudes (TOSRA) and the cognitive biology test at the same time as the control group (nsb2 = 19) (at the beginning and end of the term). Statistical analyses confirmed that both groups were equivalent regarding ethnicity, gender, college grade point average and number of absences. Independent sample t-tests performed on pretest mean scores indicated no significant differences in the TOSRA scale two or biology knowledge between the cooperative learning group and the control group. The scores of TOSRA scales: one, three, four, five, six, and seven were significantly lower in the cooperative learning group. Independent sample t-tests of the mean score differences did not show any significant differences in posttest attitudes toward science or biology knowledge between the two groups. Paired t-tests did not indicate any significant differences on the TOSRA or biology knowledge within the cooperative learning group. Paired t-tests did show significant differences within the control group on TOSRA scale two and biology knowledge. ANCOVAs did not indicate any significant differences on the post mean scores of the TOSRA or biology knowledge adjusted by differences in the pretest mean scores. Analysis of the research data did not show any significant correlation between attitudes toward science and biology knowledge.
Clinical use of the ABO-Scoring Index: reliability and subtraction frequency.
Lieber, William S; Carlson, Sean K; Baumrind, Sheldon; Poulton, Donald R
2003-10-01
This study tested the reliability and subtraction frequency of the study model-scoring system of the American Board of Orthodontists (ABO). We used a sample of 36 posttreatment study models that were selected randomly from six different orthodontic offices. Intrajudge and interjudge reliability was calculated using nonparametric statistics (Spearman rank coefficient, Wilcoxon, Kruskal-Wallis, and Mann-Whitney tests). We found differences ranging from 3 to 6 subtraction points (total score) for intrajudge scoring between two sessions. For overall total ABO score, the average correlation was .77. Intrajudge correlation was greatest for occlusal relationships and least for interproximal contacts. Interjudge correlation for ABO score averaged r = .85. Correlation was greatest for buccolingual inclination and least for overjet. The data show that some judges, on average, were much more lenient than others and that this resulted in a range of total scores between 19.7 and 27.5. Most of the deductions were found in the buccal segments and most were related to the second molars. We present these findings in the context of clinicians preparing for the ABO phase III examination and for orthodontists in their ongoing evaluation of clinical results.
The criterion and discriminant validity of the Referential Thinking (REF) scale.
Startup, Mike; Sakrouge, Rebecca; Mason, Oliver J
2010-03-01
The Referential Thinking (REF) scale was designed to be a comprehensive self-report measure of both simple and guilty ideas of reference in the general population. One aim of the present study was to test the proposed interpretations of REF scores by comparing REF scores with ratings of delusions among psychotic patients. A 2nd aim was to test whether REF scores are better predicted by the severity of patients' delusions of reference (DoRs) than by the severity of their auditory verbal hallucinations (AVHs), thus supporting the scores' ability to discriminate between proneness to the 2 different symptoms. The REF scale was completed by 56 healthy controls and 53 acutely psychotic patients. The severity of the patients' DoRs and AVHs were assessed in structured clinical interviews. REF scores differed significantly not only between the patients and controls but also between patients with versus without DoRs. REF scores correlated significantly with the severity of the patients' DoRs but not their AVHs. The interpretation of REF scores as a measure of proneness to simple and guilty ideas of reference was supported. PsycINFO Database Record (c) 2010 APA, all rights reserved.
Testing Two Nutrient Profiling Models of Labelled Foods and Beverages Marketed in Turkey.
Dikmen, Derya; Kızıl, Mevlüde; Uyar, Muhemmet Fatih; Pekcan, Gülden
2015-06-01
The objective of this study was to evaluate the nutrient profile of labelled foods and also understand the application of two international nutrient profiling models of labelled foods and beverages. WXYfm and NRF 9.3 nutrient profiling models were used to evaluate 3,171 labelled foods and beverages of 38 food categories and 500 different brands. According to the WXYfm model, pasta, grains and legumes and frozen foods had the best scores whereas oils had the worst scores. According to the NRF 9.3 model per 100 kcal, the best scores were obtained for frozen foods, grains and legumes and milk products whereas the confectionery foods had the worst scores. According to NRF 9.3 per serving size, grains and legumes had the best scores and flavoured milks had the worst scores. A comparison of WXYfm and NRF 9.3 nutrient profiling models ranked scores showed a high positive correlation (p=0.01). The two nutrient models evaluated yielded similar results. Further studies are needed to test other category specific nutrient profiling models in order to understand how different models behave. Copyright© by the National Institute of Public Health, Prague 2015.
Malhotra, Sony; Sankar, Kannan; Sowdhamini, Ramanathan
2014-01-01
Interactions at the molecular level in the cellular environment play a very crucial role in maintaining the physiological functioning of the cell. These molecular interactions exist at varied levels viz. protein-protein interactions, protein-nucleic acid interactions or protein-small molecules interactions. Presently in the field, these interactions and their mechanisms mark intensively studied areas. Molecular interactions can also be studied computationally using the approach named as Molecular Docking. Molecular docking employs search algorithms to predict the possible conformations for interacting partners and then calculates interaction energies. However, docking proposes number of solutions as different docked poses and hence offers a serious challenge to identify the native (or near native) structures from the pool of these docked poses. Here, we propose a rigorous scoring scheme called DockScore which can be used to rank the docked poses and identify the best docked pose out of many as proposed by docking algorithm employed. The scoring identifies the optimal interactions between the two protein partners utilising various features of the putative interface like area, short contacts, conservation, spatial clustering and the presence of positively charged and hydrophobic residues. DockScore was first trained on a set of 30 protein-protein complexes to determine the weights for different parameters. Subsequently, we tested the scoring scheme on 30 different protein-protein complexes and native or near-native structure were assigned the top rank from a pool of docked poses in 26 of the tested cases. We tested the ability of DockScore to discriminate likely dimer interactions that differ substantially within a homologous family and also demonstrate that DOCKSCORE can distinguish correct pose for all 10 recent CAPRI targets. PMID:24498255
Malhotra, Sony; Sankar, Kannan; Sowdhamini, Ramanathan
2014-01-01
Interactions at the molecular level in the cellular environment play a very crucial role in maintaining the physiological functioning of the cell. These molecular interactions exist at varied levels viz. protein-protein interactions, protein-nucleic acid interactions or protein-small molecules interactions. Presently in the field, these interactions and their mechanisms mark intensively studied areas. Molecular interactions can also be studied computationally using the approach named as Molecular Docking. Molecular docking employs search algorithms to predict the possible conformations for interacting partners and then calculates interaction energies. However, docking proposes number of solutions as different docked poses and hence offers a serious challenge to identify the native (or near native) structures from the pool of these docked poses. Here, we propose a rigorous scoring scheme called DockScore which can be used to rank the docked poses and identify the best docked pose out of many as proposed by docking algorithm employed. The scoring identifies the optimal interactions between the two protein partners utilising various features of the putative interface like area, short contacts, conservation, spatial clustering and the presence of positively charged and hydrophobic residues. DockScore was first trained on a set of 30 protein-protein complexes to determine the weights for different parameters. Subsequently, we tested the scoring scheme on 30 different protein-protein complexes and native or near-native structure were assigned the top rank from a pool of docked poses in 26 of the tested cases. We tested the ability of DockScore to discriminate likely dimer interactions that differ substantially within a homologous family and also demonstrate that DOCKSCORE can distinguish correct pose for all 10 recent CAPRI targets.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, Xueqian; Greuter, Marcel J. W.; Groen, Jaap M.
Purpose: Coronary artery calcium score, traditionally based on electrocardiography (ECG)-triggered computed tomography (CT), predicts cardiovascular risk. However, nontriggered CT is extensively utilized. The study-purpose is to evaluate the in vitro agreement in coronary calcium score between nontriggered thoracic CT and ECG-triggered cardiac CT.Methods: Three artificial coronary arteries containing calcifications of different densities (high, medium, and low), and sizes (large, medium, and small), were studied in a moving cardiac phantom. Two 64-detector CT systems were used. The phantom moved at 0–90 mm/s in nontriggered low-dose CT as index test, and at 0–30 mm/s in ECG-triggered CT as reference. Differences in calciummore » scores between nontriggered and ECG-triggered CT were analyzed by t-test and 95% confidence interval. The sensitivity to detect calcification was calculated as the percentage of positive calcium scores.Results: Overall, calcium scores in nontriggered CT were not significantly different to those in ECG-triggered CT (p > 0.05). Calcium scores in nontriggered CT were within the 95% confidence interval of calcium scores in ECG-triggered CT, except predominantly at higher velocities (≥50 mm/s) for the high-density and large-size calcifications. The sensitivity for a nonzero calcium score was 100% for large calcifications, but 46%± 11% for small calcifications in nontriggered CT.Conclusions: When performing multiple measurements, good agreement in positive calcium scores is found between nontriggered thoracic and ECG-triggered cardiac CT. Agreement decreases with increasing coronary velocity. From this phantom study, it can be concluded that a high calcium score can be detected by nontriggered CT, and thus, that nontriggered CT likely can identify individuals at high risk of cardiovascular disease. On the other hand, a zero calcium score in nontriggered CT does not reliably exclude coronary calcification.« less
Tariq, Nabia; Tayyab, Ali; Jaffery, Tara
2018-04-01
To measure mean empathy scores of Pakistani medical students and to explore any association of empathy scores with gender, medical school year and future career choice. Cross-sectional survey. Shifa College of Medicine, Shifa Tameer-e-Millat University, during the academic year 2015-2016. The student version of Jefferson Scale of Physician Empathy (JSPE) was distributed to the students electronically via the student portal. Response that were completed in full were included in the study. Descriptive statistics was used to analyse student demographic data. The student score on the JSPE was reported as the mean (out of 7) of each item. Independent samples t-test was employed to check the significant differences between genders. Empathy score with advancing year of study was investigated using ANOVA. ANOVA with post-hoc Tukey's test was used to study the relationship between career choice and empathy score. The response rate was 70.94%. The mean score was 4.51 ±0.69. Females obtained greater, but statistically insignificant (p=0.08) empathy score (4.58) as compared to the male students (4.45). No statistically significant difference was seen between scores on the survey across the five academic years (F=0.88, p=0.47). Students who selected medicine and allied as career choice showed a significantly higher empathy score than those who opted for surgery. The internal consistency reliability (Cronbach's alpha) was 0.78. There were low levels of empathy in Pakistani medical students. Students with interest in medicine and allied showed higher empathy scores compared to surgical or technical specialties. No association of empathy scores with gender and medical school year was observed.
Khodaveisi, Masoud; Qaderian, Khosro; Oshvandi, Khodayar; Soltanian, Ali Reza; Vardanjani, Mehdi molavi
2017-01-01
Background and aims learning plays an important role in developing nursing skills and right care-taking. The Present study aims to evaluate two learning methods based on team –based learning and lecture-based learning in learning care-taking of patients with diabetes in nursing students. Method In this quasi-experimental study, 64 students in term 4 in nursing college of Bukan and Miandoab were included in the study based on knowledge and performance questionnaire including 15 questions based on knowledge and 5 questions based on performance on care-taking in patients with diabetes were used as data collection tool whose reliability was confirmed by cronbach alpha (r=0.83) by the researcher. To compare the mean score of knowledge and performance in each group in pre-test step and post-test step, pair –t test and to compare mean of scores in two groups of control and intervention, the independent t- test was used. Results There was not significant statistical difference between two groups in pre terms of knowledge and performance score (p=0.784). There was significant difference between the mean of knowledge scores and diabetes performance in the post-test in the team-based learning group and lecture-based learning group (p=0.001). There was significant difference between the mean score of knowledge of diabetes care in pre-test and post-test in base learning groups (p=0.001). Conclusion In both methods team-based and lecture-based learning approaches resulted in improvement in learning in students, but the rate of learning in the team-based learning approach is greater compared to that of lecture-based learning and it is recommended that this method be used as a higher education method in the education of students.
Toffan, Adam; Alexander, Marion J L; Peeler, Jason
2017-07-28
The purpose of the study was to compare the most effective joint movements, segment velocities and body positions to perform the fastest and most accurate pass of high school and university football quarterbacks. Secondary purposes were to develop a quarterback throwing test to assess skill level, to determine which kinematic variables were different between high school and university athletes as well as to determine which variables were significant predictors of quarterback throwing test performance. Ten high school and ten university athletes were filmed for the study, performing nine passes at a target and two passes for maximum distance. Thirty variables were measured using Dartfish Team Pro 4.5.2 video analysis system, and Microsoft Excel was used for statistical analysis. University athletes scored slightly higher than the high school athletes on the throwing test, however this result was not statistically significant. Correlation analysis and forward stepwise multiple regression analysis was performed on both the high school players and the university players in order to determine which variables were significant predictors of throwing test score. Ball velocity was determined to have the strongest predictive effect on throwing test score (r = 0.900) for the high school athletes, however, position of the back foot at release was also determined to be important (r = 0.661) for the university group. Several significant differences in throwing technique between groups were noted during the pass, however, body position at release showed the greatest differences between the two groups. High school players could benefit from more complete weight transfer and decreased throw time to increase throwing test score. University athletes could benefit from increased throw time and greater range of motion in external shoulder rotation and trunk rotation to increase their throwing test score. Coaches and practitioners will be able to use the findings of this research to help improve these and related throwing variables in their high school and university quarterbacks.
Cho, Jung-Jin; Kim, Ji-Yong
2011-09-01
In-training examination (ITE) is a cognitive examination similar to the written test, but it is different from the Clinical Practice Examination of the Korean Academy of Family Medicine (KAFM) Certification Examination (CE). The objective of this is to estimate the positive predictive value of the KAFM-ITE for identifying residents at risk for poor performance on the three types of KAFM-CE. 372 residents who completed the KAFM-CE in 2011 were included. We compared the mean KAFM-CE scores with ITE experience. We evaluated the correlation and the positive predictive value (PPV) of ITE for the multiple choice question (MCQ) scores of 1st written test & 2nd slide examination, the total clinical practice examination scores, and the total sum of 2nd test. 275 out of 372 residents completed ITE. Those who completed ITE had significantly higher MCQ scores of 1st written test than those who did not. The correlation of ITE scores with 1st written MCQ (0.627) was found to be the highest among the other kinds of CE. The PPV of the ITE score for 1st written MCQ scores was 0.672. The PPV of the ITE score ranged from 0.376 to 0.502. The score of the KAFM ITE has acceptable positive predictive value that could be used as a part of comprehensive evaluation system for residents in cognitive field.
Tarescavage, Anthony M; Alosco, Michael L; Ben-Porath, Yossef S; Wood, Arcangela; Luna-Jones, Lynn
2015-04-01
We investigated the internal structure comparability of Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF) scores derived from the MMPI-2 and MMPI-2-RF booklets in a sample of 320 criminal defendants (229 males and 54 females). After exclusion of invalid protocols, the final sample consisted of 96 defendants who were administered the MMPI-2-RF booklet and 83 who completed the MMPI-2. No statistically significant differences in MMPI-2-RF invalidity rates were observed between the two forms. Individuals in the final sample who completed the MMPI-2-RF did not statistically differ on demographics or referral question from those who were administered the MMPI-2 booklet. Independent t tests showed no statistically significant differences between MMPI-2-RF scores generated with the MMPI-2 and MMPI-2-RF booklets on the test's substantive scales. Statistically significant small differences were observed on the revised Variable Response Inconsistency (VRIN-r) and True Response Inconsistency (TRIN-r) scales. Cronbach's alpha and standard errors of measurement were approximately equal between the booklets for all MMPI-2-RF scales. Finally, MMPI-2-RF intercorrelations produced from the two forms yielded mostly small and a few medium differences, indicating that discriminant validity and test structure are maintained. Overall, our findings reflect the internal structure comparability of MMPI-2-RF scale scores generated from MMPI-2 and MMPI-2-RF booklets. Implications of these results and limitations of these findings are discussed. © The Author(s) 2014.
Abram, Katrin; Bohne, Silvia; Bublak, Peter; Karvouniari, Panagiota; Klingner, Carsten M; Witte, Otto W; Guntinas-Lichius, Orlando; Axer, Hubertus
2016-01-01
Postural instability in patients with normal pressure hydrocephalus (NPH) is a most crucial symptom leading to falls with secondary complications. The aim of the current study was to evaluate the therapeutic effect of spinal tap on postural stability in these patients. Seventeen patients with clinical symptoms of NPH were examined using gait scale, computerized dynamic posturography (CDP), and neuropsychological assessment. Examinations were done before and after spinal tap test. The gait score showed a significant improvement 24 h after spinal tap test in all subtests and in the sum score (p < 0.003), while neuropsychological assessment did not reveal significant differences 72 h after spinal tap test. CDP showed significant improvements after spinal tap test in the Sensory Organization Tests 2 (p = 0.017), 4 (p = 0.001), and 5 (p = 0.009) and the composite score (p = 0.01). Patients showed best performance in somatosensory and worst performance in vestibular dominated tests. Vestibular dominated tests did not improve significantly after spinal tap test, while somatosensory and visual dominated tests did. Postural stability in NPH is predominantly affected by deficient vestibular functions, which did not improve after spinal tap test. Conditions which improved best were mainly independent from visual control and are based on proprioceptive functions.
Park, Eun Jung; Yoon, Young Tak; Hong, Chong Kun; Ha, Young Rock; Ahn, Jung Hwan
2017-07-01
This study evaluated the efficacy of a teaching method using simulated B-lines of hand ultrasound with a wet foam dressing material. This prospective, randomized, noninferiority study was conducted on emergency medical technician students without any relevant training in ultrasound. Following a lecture including simulated (SG) or real video clips (RG) of B-lines, a posttest was conducted and a retention test was performed after 2 months. The test consisted of questions about B-lines in 40 randomly mixed video clips (20 simulated and 20 real videos) with 4 answer scores (R-1 [the correct answer score for the real video clips] vs S-1 [the correct answer score for the simulated video clips] in the posttest, R-2 [the correct answer score for the real video clips] vs S-2 [the correct answer score for the simulated video clips] in the retention test). A total of 77 and 73 volunteers participated in the posttest (RG, 38; SG, 39) and retention test (RG, 36; SG, 37), respectively. There was no significant (P > .05) difference in scores of R-1, S-1, R-2, or S-2 between RG and SG. The mean score differences between RG and SG were -0.6 (95% confidence interval [CI]: -1.49 to 0.11) in R-1, -0.1 (95% CI: -1.04 to 0.86) in S-1, 0 (95% CI: -1.57 to 1.50) in R-2, and -0.2 (95% CI: -1.52 to 0.25) in S-2. The mean differences and 95% CIs for all parameters fell within the noninferiority margin of 2 points (10%). Simulated B-lines of hand ultrasound with a wet foam dressing material were not inferior to real B-lines. They were effective for teaching and simulations. The study was registered with the Clinical Trial Registry of Korea: https://cris.nih.go.kr/cris/index.jsp (KCT0002144).
The Assignment of Raters to Items: Controlling for Rater Effects.
ERIC Educational Resources Information Center
Sykes, Robert C.; Heidorn, Mark; Lee, Guemin
A study was conducted to evaluate the effect of different modes (modalities) of assigning raters to test items. The impact on total constructed response (c.r.) score, and subsequently on total test score, of assigning a single versus multiple raters to an examination reading of a student's set of c.r. responses was evaluated for several mixed-item…
Older Children Have a Greater Chance to Be Accepted to Gifted Student Programmes
ERIC Educational Resources Information Center
Segev, Elad; Cahan, Sorel
2014-01-01
Selection to programmes for gifted students in Israel, performed in the second grade, relies on raw ability and achievement test scores, irrespective of age, thereby ignoring the well-known effect of within-grade age differences on test scores. Employing the entire cohort of third graders of legal age (67,366 students, 1.4% of whom were enrolled…
ERIC Educational Resources Information Center
Kosik, Kenneth S.; Heschong, Lisa
An audiotape presents study analysis of the effect of daylighting on student performance. The study includes a focus on skylighting as a way to isolate daylight as an illumination source, and separate illumination effects from other qualities associated with daylighting from windows. Results from test scores of over 21,000 student records, along…
School Policies and the Black-White Test Score Gap. Working Paper Series. SAN08-03
ERIC Educational Resources Information Center
Ladd, Helen F.
2008-01-01
This paper examines school-related policies and strategies that have been proposed or justified, at least in part, on the basis of their potential for reducing black-white test score gaps. These include strategies, one of which is greater integration, to reduce differences in the quality of teachers faced by black and white students; school and…
The Effect of Poverty on the Verbal Scores of Gifted Students
ERIC Educational Resources Information Center
Kaya, Fatih; Stough, Laura M.; Juntune, Joyce
2016-01-01
A nonexperimental design was used to determine whether the verbal scores of low-income gifted fifth graders (n = 38) differed from those of their higher income peers (n = 83). The Otis-Lennon School Ability Test, Eighth Edition and the Stanford Achievement Test-Tenth Edition were used to collect student data. Results of a MANOVA showed a…
ERIC Educational Resources Information Center
Mead, Tim; Scibora, Lesley
2016-01-01
The purpose of the study was to determine if standardized math test scores improve by administering different types of exercise during math instruction. Three sixth grade classes were assessed on the Measures of Academic Progress (MAP) and the Minnesota Comprehensive Assessment (MCA) standardized math tests during the 2012 and 2013 academic year.…
Fang, Mingying; Oremus, Mark; Tarride, Jean-Eric; Raina, Parminder
2016-07-18
The use of the EQ-5D to asses the economic benefits of health technologies has led to questions about the cross-population transferability of preference weights to calculate health utility scores. The aim of this study is to investigate whether the use of UK and Canadian preference weights will lead to the calculation of different health utility scores in a sample of persons with Alzheimer's disease (AD) and their primary informal caregivers. We recruited 216 patient-caregiver dyads from nine geriatric and memory clinics across Canada. Participants used the EQ-5D-3L to rate their health-related quality-of-life (HRQoL). EQ-5D-3L responses were transformed into health utility scores using UK and Canadian preference weights. The levels of agreement between the two sets of scores were assessed using intraclass correlation coefficients (ICCs). Bland-Altman plots depicted individual-level differences between the two sets of scores. Differences in health utility scores were tested using the Wilcoxon signed rank sum test. A generalized linear model with a gamma distribution was used to examine whether participants' socio-demographic characteristics were associated with their health utility scores. The distributions of health utility scores derived from both the UK and Canadian preference weights were skewed to the left. The intraclass correlation coefficient was 0.94 (95 % CI: 0.92, 0.95) for persons with AD and 0.92 (95 % CI: 0.88, 0.94) for the caregivers. The Canadian weights yielded slightly higher median health utility scores than the UK weights for caregivers (median difference: 0.009; 95 % confidence interval: 0.007, 0.013). This finding persisted after stratifying by disease severity. Few socio-demographic characteristics were associated with the two sets of health utility scores. Health utility scores exhibited small and clinically unimportant differences when calculated with UK versus Canadian preference weights in persons with AD and their caregivers. The original UK and Canadian population samples used to obtain the preference weights valued health states similarly.
Validation of the Narrowing Beam Walking Test in Lower Limb Prosthesis Users.
Sawers, Andrew; Hafner, Brian
2018-04-11
To evaluate the content, construct, and discriminant validity of the Narrowing Beam Walking Test (NBWT), a performance-based balance test for lower limb prosthesis users. Cross-sectional study. Research laboratory and prosthetics clinic. Unilateral transtibial and transfemoral prosthesis users (N=40). Not applicable. Content validity was examined by quantifying the percentage of participants receiving maximum or minimum scores (ie, ceiling and floor effects). Convergent construct validity was examined using correlations between participants' NBWT scores and scores or times on existing clinical balance tests regularly administered to lower limb prosthesis users. Known-groups construct validity was examined by comparing NBWT scores between groups of participants with different fall histories, amputation levels, amputation etiologies, and functional levels. Discriminant validity was evaluated by analyzing the area under each test's receiver operating characteristic (ROC) curve. No minimum or maximum scores were recorded on the NBWT. NBWT scores demonstrated strong correlations (ρ=.70‒.85) with scores/times on performance-based balance tests (timed Up and Go test, Four Square Step Test, and Berg Balance Scale) and a moderate correlation (ρ=.49) with the self-report Activities-specific Balance Confidence scale. NBWT performance was significantly lower among participants with a history of falls (P=.003), transfemoral amputation (P=.011), and a lower mobility level (P<.001). The NBWT also had the largest area under the ROC curve (.81) and was the only test to exhibit an area that was statistically significantly >.50 (ie, chance). The results provide strong evidence of content, construct, and discriminant validity for the NBWT as a performance-based test of balance ability. The evidence supports its use to assess balance impairments and fall risk in unilateral transtibial and transfemoral prosthesis users. Copyright © 2018 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
The King-Devick test as a determinant of head trauma and concussion in boxers and MMA fighters.
Galetta, K M; Barrett, J; Allen, M; Madda, F; Delicata, D; Tennant, A T; Branas, C C; Maguire, M G; Messner, L V; Devick, S; Galetta, S L; Balcer, L J
2011-04-26
Sports-related concussion has received increasing attention as a cause of short- and long-term neurologic symptoms among athletes. The King-Devick (K-D) test is based on measurement of the speed of rapid number naming (reading aloud single-digit numbers from 3 test cards), and captures impairment of eye movements, attention, language, and other correlates of suboptimal brain function. We investigated the K-D test as a potential rapid sideline screening for concussion in a cohort of boxers and mixed martial arts fighters. The K-D test was administered prefight and postfight. The Military Acute Concussion Evaluation (MACE) was administered as a more comprehensive but longer test for concussion. Differences in postfight K-D scores and changes in scores from prefight to postfight were compared for athletes with head trauma during the fight vs those without. Postfight K-D scores (n = 39 participants) were significantly higher (worse) for those with head trauma during the match (59.1 ± 7.4 vs 41.0 ± 6.7 seconds, p < 0.0001, Wilcoxon rank sum test). Those with loss of consciousness showed the greatest worsening from prefight to postfight. Worse postfight K-D scores (r(s) = -0.79, p = 0.0001) and greater worsening of scores (r(s) = 0.90, p < 0.0001) correlated well with postfight MACE scores. Worsening of K-D scores by ≥5 seconds was a distinguishing characteristic noted only among participants with head trauma. High levels of test-retest reliability were observed (intraclass correlation coefficient 0.97 [95% confidence interval 0.90-1.0]). The K-D test is an accurate and reliable method for identifying athletes with head trauma, and is a strong candidate rapid sideline screening test for concussion.
The King-Devick test as a determinant of head trauma and concussion in boxers and MMA fighters
Galetta, K.M.; Barrett, J.; Allen, M.; Madda, F.; Delicata, D.; Tennant, A.T.; Branas, C.C.; Maguire, M.G.; Messner, L.V.; Devick, S.; Galetta, S.L.
2011-01-01
Objective: Sports-related concussion has received increasing attention as a cause of short- and long-term neurologic symptoms among athletes. The King-Devick (K-D) test is based on measurement of the speed of rapid number naming (reading aloud single-digit numbers from 3 test cards), and captures impairment of eye movements, attention, language, and other correlates of suboptimal brain function. We investigated the K-D test as a potential rapid sideline screening for concussion in a cohort of boxers and mixed martial arts fighters. Methods: The K-D test was administered prefight and postfight. The Military Acute Concussion Evaluation (MACE) was administered as a more comprehensive but longer test for concussion. Differences in postfight K-D scores and changes in scores from prefight to postfight were compared for athletes with head trauma during the fight vs those without. Results: Postfight K-D scores (n = 39 participants) were significantly higher (worse) for those with head trauma during the match (59.1 ± 7.4 vs 41.0 ± 6.7 seconds, p < 0.0001, Wilcoxon rank sum test). Those with loss of consciousness showed the greatest worsening from prefight to postfight. Worse postfight K-D scores (rs = −0.79, p = 0.0001) and greater worsening of scores (rs = 0.90, p < 0.0001) correlated well with postfight MACE scores. Worsening of K-D scores by ≥5 seconds was a distinguishing characteristic noted only among participants with head trauma. High levels of test-retest reliability were observed (intraclass correlation coefficient 0.97 [95% confidence interval 0.90–1.0]). Conclusions: The K-D test is an accurate and reliable method for identifying athletes with head trauma, and is a strong candidate rapid sideline screening test for concussion. PMID:21288984
ERIC Educational Resources Information Center
Laird, Robert D.; De Los Reyes, Andres
2013-01-01
Multiple informants commonly disagree when reporting child and family behavior. In many studies of informant discrepancies, researchers take the difference between two informants' reports and seek to examine the link between this difference score and external constructs (e.g., child maladjustment). In this paper, we review two reasons why…
ERIC Educational Resources Information Center
Yip, Din Yan; Chiu, Ming Ming; Ho, Esther Sui Chu
2004-01-01
This study examined gender differences in students' scientific literacy as measured by OECD-PISA. In particular, we focused on the 2437 students from 140 Hong Kong schools. Hong Kong boys' and girls' science scores did not differ overall. However, boys scored higher than girls at the higher percentiles (75th and above). Moreover, specific test…
A Preliminary Investigation of Dynamic Assessment With Native American Kindergartners.
Ukrainetz, Teresa A; Harpell, Stacey; Walsh, Chandra; Coyle, Catherine
2000-04-01
This study examined dynamic assessment as a lessbiased evaluation procedure for assessing the languagelearning ability of Native American children. Twenty-three Arapahoe/Shoshone kindergartners were identified as stronger (n = 15) or weaker (n = 8) language learners through teacher report and examiner classroom observation. Through a test-teach-test protocol, participants were briefly taught the principles of categorization. Participant responses to learning were measured in terms of an index of modifiability and post-test categorization scores. The modifiability index, determined during the teaching phase, was a combined score reflecting the child's learning strategies, such as ability to attend, plan, and self-regulate, and the child's responses to the learning situation. Post-test scores consisted of performance on expressive and receptive subtests from a standardized categorization test after partialling out pretest score differences. Effect sizes and confidence intervals were also determined. Group and individual results indicated that modifiability and post-test scores were significantly greater for stronger than for weaker language learners. The response to modifiability components was a better discriminator than was the learner strategies components. These results provide support for the further development of dynamic assessment as a valid measure of language learning ability in minority children.
NASA Astrophysics Data System (ADS)
Gonçalves Nigro, Rogerio; Frateschi Trivelato, Silvia
2012-11-01
The purpose of this article is to assess the knowledge, application of knowledge, and attitudes associated with the reading of different genres of expository science texts. We assigned approximately half of a sample consisting of 220 students 14-15 years of age, chosen at random, to read an excerpt from a popular scientific text, and the other half to read an excerpt from a textbook addressing the same topic. Readers took knowledge and application tests immediately after the reading and again 15 days later. Students also took knowledge and reading proficiency pre-tests, and attitude tests related to the selected texts. Overall, girls scored higher than boys and readers of the popular scientific text scored higher than their colleagues who read the textbook excerpt. We noted interaction between 'reader gender' and 'genre of the text read' in terms of long-term learning based on the reading. Attitude regarding the text read appears as an important factor in explaining behavior of boys who read the popular scientific text. Surprisingly, knowledge and application test scores were not statistically different among girls with different degrees of reading proficiency who read the textbook excerpt. In addition, on the application tests, among the boys who read the popular scientific text, good readers scored lower than their colleagues who read the textbook excerpt. In our opinion, this study can serve to show that 'reading in science education' is not a trivial matter and we feel that the subject merits more in-depth investigation.
Beese, Mark E; Joy, Elizabeth; Switzler, Craig L; Hicks-Little, Charlie A
2015-08-01
Single-sport specialization (SSS) is becoming more prevalent in youth athletes. Deficits in functional movement have been shown to predispose athletes to injury. It is unclear whether a link exists between SSS and the development of functional movement deficits that predispose SSS athletes to an increased risk of knee injury. To determine whether functional movement deficits exist in SSS athletes compared with multi-sport (M-S) athletes. Cross-sectional study. Soccer practice fields. A total of 40 (21 SSS [age = 15.05 ± 1.2 years], 19 M-S [age = 15.32 ± 1.2 years]) female high school athlete volunteers were recruited through local soccer clubs. All SSS athletes played soccer. Participants were grouped into 2 categories: SSS and M-S. All participants completed 3 trials of the standard Landing Error Scoring System (LESS) jump-landing task. They performed a double-legged jump from a 30-cm platform, landing on a rubber mat at a distance of half their body height. Upon landing, participants immediately performed a maximal vertical jump. Values were assigned to each trial using the LESS scoring criteria. We averaged the 3 scored trials and then used a Mann-Whitney U test to test for differences between groups. Participant scores from the jump-landing assessment for each group were also placed into the 4 defined LESS categories for group comparison using a Pearson χ(2) test. The α level was set a priori at .05. Mean scores were 6.84 ± 1.81 for the SSS group and 6.07 ± 1.93 for the M-S group. We observed no differences between groups (z = -1.44, P = .15). A Pearson χ(2) analysis revealed that the proportions of athletes classified as having excellent, good, moderate, or poor LESS scores were not different between the SSS and M-S groups ([Formula: see text] = 1.999, P = .57). Participation in soccer alone compared with multiple sports did not affect LESS scores in adolescent female soccer players. However, the LESS scores indicated that most female adolescent athletes may be at an increased risk for knee injury, regardless of the number of sports played.
Hurks, P P M; Hendriksen, J G M; Dek, J E; Kooij, A P
2013-01-01
Intelligence tests are included in millions of assessments of children and adults each year (Watkins, Glutting, & Lei, 2007a , Applied Neuropsychology, 14, 13). Clinicians often interpret large amounts of subtest scatter, or large differences between the highest and lowest scaled subtest scores, on an intelligence test battery as an index for abnormality or cognitive impairment. The purpose of the present study is to characterize "normal" patterns of variability among subtests of the Dutch Wechsler Preschool and Primary Scale of Intelligence - Third Edition (WPPSI-III-NL; Wechsler, 2010 ). Therefore, the frequencies of WPPSI-III-NL scaled subtest scatter were reported for 1039 healthy children aged 4:0-7:11 years. Results indicated that large differences between highest and lowest scaled subtest scores (or subtest scatter) were common in this sample. Furthermore, degree of subtest scatter was related to: (a) the magnitude of the highest scaled subtest score, i.e., more scatter was seen in children with the highest WPPSI-III-NL scaled subtest scores, (b) Full Scale IQ (FSIQ) scores, i.e., higher FSIQ scores were associated with an increase in subtest scatter, and (c) sex differences, with boys showing a tendency to display more scatter than girls. In conclusion, viewing subtest scatter as an index for abnormality in WPPSI-III-NL scores is an oversimplification as this fails to recognize disparate subtest heterogeneity that occurs within a population of healthy children aged 4:0-7:11 years.
The power to detect linkage in complex disease by means of simple LOD-score analyses.
Greenberg, D A; Abreu, P; Hodge, S E
1998-01-01
Maximum-likelihood analysis (via LOD score) provides the most powerful method for finding linkage when the mode of inheritance (MOI) is known. However, because one must assume an MOI, the application of LOD-score analysis to complex disease has been questioned. Although it is known that one can legitimately maximize the maximum LOD score with respect to genetic parameters, this approach raises three concerns: (1) multiple testing, (2) effect on power to detect linkage, and (3) adequacy of the approximate MOI for the true MOI. We evaluated the power of LOD scores to detect linkage when the true MOI was complex but a LOD score analysis assumed simple models. We simulated data from 14 different genetic models, including dominant and recessive at high (80%) and low (20%) penetrances, intermediate models, and several additive two-locus models. We calculated LOD scores by assuming two simple models, dominant and recessive, each with 50% penetrance, then took the higher of the two LOD scores as the raw test statistic and corrected for multiple tests. We call this test statistic "MMLS-C." We found that the ELODs for MMLS-C are >=80% of the ELOD under the true model when the ELOD for the true model is >=3. Similarly, the power to reach a given LOD score was usually >=80% that of the true model, when the power under the true model was >=60%. These results underscore that a critical factor in LOD-score analysis is the MOI at the linked locus, not that of the disease or trait per se. Thus, a limited set of simple genetic models in LOD-score analysis can work well in testing for linkage. PMID:9718328
The power to detect linkage in complex disease by means of simple LOD-score analyses.
Greenberg, D A; Abreu, P; Hodge, S E
1998-09-01
Maximum-likelihood analysis (via LOD score) provides the most powerful method for finding linkage when the mode of inheritance (MOI) is known. However, because one must assume an MOI, the application of LOD-score analysis to complex disease has been questioned. Although it is known that one can legitimately maximize the maximum LOD score with respect to genetic parameters, this approach raises three concerns: (1) multiple testing, (2) effect on power to detect linkage, and (3) adequacy of the approximate MOI for the true MOI. We evaluated the power of LOD scores to detect linkage when the true MOI was complex but a LOD score analysis assumed simple models. We simulated data from 14 different genetic models, including dominant and recessive at high (80%) and low (20%) penetrances, intermediate models, and several additive two-locus models. We calculated LOD scores by assuming two simple models, dominant and recessive, each with 50% penetrance, then took the higher of the two LOD scores as the raw test statistic and corrected for multiple tests. We call this test statistic "MMLS-C." We found that the ELODs for MMLS-C are >=80% of the ELOD under the true model when the ELOD for the true model is >=3. Similarly, the power to reach a given LOD score was usually >=80% that of the true model, when the power under the true model was >=60%. These results underscore that a critical factor in LOD-score analysis is the MOI at the linked locus, not that of the disease or trait per se. Thus, a limited set of simple genetic models in LOD-score analysis can work well in testing for linkage.
Lous, Jørgen; Glenn Lauritsen, Maj-Britt
2018-06-01
To search for predictive factors for language development measured by two receptive language tests for children, the Galker test (a word-recognition-in-noise test) testing hearing and vocabulary, and the Danish version of Reynell Developmental Language Scale (2nd revision, RDLS II) test, a language comprehension test. The study analysed if information about background variables and parents and pre-school teachers was predictive for test scores; if earlier middle ear disease, actual hearing loss and tympanometry was important for language development; and if the two receptive tests differed in terms of the degree to which variables were able to predict test scores at the age of three to five years. All children aged three and five years attending 20 day-care centres for children without cognitive development issues from the Municipality of Hillerød, Denmark, were invited to participate. We used questionnaires to the parents and day-care teachers and examined the children using tympanometry, hearing test and the two receptive language tests. We performed unadjusted and adjusted analyses of raw and grouped scores and background variables, as well as stepwise regression analysis with group scores as outcome. The results of the two tests were surprisingly similar in relation to background variables. The same variables were predictive for scores in the two receptive language tests. The predictive variables were: age group (22-31%), having no sibling (2-3%), being a boy (1%), information from the parents about the child's vocabulary (3%), phonology (0-2%). information from the pre-school teachers on the child's vocabulary (4-6%), and hearing beyond 25 dB in best ear (mean of four frequencies) (1%). We found that nearly the same variables were predictive for the test score and the grouped score in pre-school children in the RDLS II and the Galker test. Information from the pre-school teachers was more predictive of the test score than information from the parents. In the adjusted analysis, beside age group, information about the child's vocabulary was the most predictive information explaining 4-6% of the variation. Copyright © 2018 Elsevier B.V. All rights reserved.
External validation of the HIT Expert Probability (HEP) score.
Joseph, Lee; Gomes, Marcelo P V; Al Solaiman, Firas; St John, Julie; Ozaki, Asuka; Raju, Manjunath; Dhariwal, Manoj; Kim, Esther S H
2015-03-01
The diagnosis of heparin-induced thrombocytopenia (HIT) can be challenging. The HIT Expert Probability (HEP) Score has recently been proposed to aid in the diagnosis of HIT. We sought to externally and prospectively validate the HEP score. We prospectively assessed pre-test probability of HIT for 51 consecutive patients referred to our Consultative Service for evaluation of possible HIT between August 1, 2012 and February 1, 2013. Two Vascular Medicine fellows independently applied the 4T and HEP scores for each patient. Two independent HIT expert adjudicators rendered a diagnosis of HIT likely or unlikely. The median (interquartile range) of 4T and HEP scores were 4.5 (3.0, 6.0) and 5 (3.0, 8.5), respectively. There were no significant differences between area under receiver-operating characteristic curves of 4T and HEP scores against the gold standard, confirmed HIT [defined as positive serotonin release assay and positive anti-PF4/heparin ELISA] (0.74 vs 0.73, p = 0.97). HEP score ≥ 2 was 100 % sensitive and 16 % specific for determining the presence of confirmed HIT while a 4T score > 3 was 93 % sensitive and 35 % specific. In conclusion, the HEP and 4T scores are excellent screening pre-test probability models for HIT, however, in this prospective validation study, test characteristics for the diagnosis of HIT based on confirmatory laboratory testing and expert opinion are similar. Given the complexity of the HEP scoring model compared to that of the 4T score, further validation of the HEP score is warranted prior to widespread clinical acceptance.
Hannon, Brenda
2012-11-01
This study uses analysis of co-variance in order to determine which cognitive/learning (working memory, knowledge integration, epistemic belief of learning) or social/personality factors (test anxiety, performance-avoidance goals) might account for gender differences in SAT-V, SAT-M, and overall SAT scores. The results revealed that none of the cognitive/learning factors accounted for gender differences in SAT performance. However, the social/personality factors of test anxiety and performance-avoidance goals each separately accounted for all of the significant gender differences in SAT-V, SAT-M, and overall SAT performance. Furthermore, when the influences of both of these factors were statistically removed simultaneously, all non-significant gender differences reduced further to become trivial by Cohen's (1988) standards. Taken as a whole, these results suggest that gender differences in SAT-V, SAT-M, and overall SAT performance are a consequence of social/learning factors.
[Effect of vanadium exposure on neurobehavioral function in workers].
Zhu, C W; Liu, Y X; Huang, C J; Gao, W; Hu, G L; Li, J; Zhang, Q; Lan, Y J
2016-02-20
To establish the comprehensive indicators for neurobehavioral function test, and to investigate the possible adverse effect of long-time vanadium exposure on neurobehavioral function and its features in workers. From July to November, 2012, The Neurobehavioral Core Test Battery(NCTB) recommended by WHO was used to conduct tests for 128 workers in vanadium exposure group and 128 workers in control group. The t-test and analysis of covariance were used to compare the differences in each indicator in NCTB between different populations, and the principal component analysis was used to establish the comprehensive neurobehavioral index(NBI) and investigate the effect of vanadium on workers' neurobehavioral function. The vanadium exposure group had significantly lower visual retention score(6.9±1.9), digit span(order) score(8.9±2.9), lifting and turning dexterity(the non-handed hand) score (14.1±3.6), pursuit aiming test(the number of correct dots) score(65.7±24.8), and digit symbol score (31.1±15.0) than the control group (8.2±1.3, 9.4±2.7, 15.5±3.0, 76.5±23.8, and 33.7±9.5)(all P<0.05). The vanadium exposure group also had a significantly lower NBI than the control group(-0.167±0.602 vs 0.168±0.564, P<0.05). Long-term vanadium exposure can influence the workers' neurobehavioral function, with the manifestations of decreased hearing and visual memory, movement velocity, accuracy, and coordination.
Denehy, Linda; de Morton, Natalie A; Skinner, Elizabeth H; Edbrooke, Lara; Haines, Kimberley; Warrillow, Stephen; Berney, Sue
2013-12-01
Several tests have recently been developed to measure changes in patient strength and functional outcomes in the intensive care unit (ICU). The original Physical Function ICU Test (PFIT) demonstrates reliability and sensitivity. The aims of this study were to further develop the original PFIT, to derive an interval score (the PFIT-s), and to test the clinimetric properties of the PFIT-s. A nested cohort study was conducted. One hundred forty-four and 116 participants performed the PFIT at ICU admission and discharge, respectively. Original test components were modified using principal component analysis. Rasch analysis examined the unidimensionality of the PFIT, and an interval score was derived. Correlations tested validity, and multiple regression analyses investigated predictive ability. Responsiveness was assessed using the effect size index (ESI), and the minimal clinically important difference (MCID) was calculated. The shoulder lift component was removed. Unidimensionality of combined admission and discharge PFIT-s scores was confirmed. The PFIT-s displayed moderate convergent validity with the Timed "Up & Go" Test (r=-.60), the Six-Minute Walk Test (r=.41), and the Medical Research Council (MRC) sum score (rho=.49). The ESI of the PFIT-s was 0.82, and the MCID was 1.5 points (interval scale range=0-10). A higher admission PFIT-s score was predictive of: an MRC score of ≥48, increased likelihood of discharge home, reduced likelihood of discharge to inpatient rehabilitation, and reduced acute care hospital length of stay. Scoring of sit-to-stand assistance required is subjective, and cadence cutpoints used may not be generalizable. The PFIT-s is a safe and inexpensive test of physical function with high clinical utility. It is valid, responsive to change, and predictive of key outcomes. It is recommended that the PFIT-s be adopted to test physical function in the ICU.
Color vision defects in adrenomyeloneuropathy.
Sack, G H; Raven, M B; Moser, H W
1989-01-01
The relationship between abnormal color vision and adrenomyeloneuropathy (AMN) was investigated in 27 AMN patients and 31 age-matched controls by using the Farnsworth-Munsell 100 Hue test. Twelve (44%) of 27 patients showed test scores significantly above normal. The axes of bipolarity determined by the testing differed widely between the patients with abnormal scores, compatible with the notion that different alterations in visual pigment genes occur in different AMN kindreds. These observations confirm our earlier impression that the frequency of abnormal color vision is increased in these kindreds, and it supports our contentions that (1) AMN (and its companion, adrenoleukodystrophy) are very closely linked to the visual pigment loci at Xq28 and (2) this proximity might provide the opportunity to observe contiguous gene defects. PMID:2729274
Perrotin, Audrey; Isingrini, Michel; Souchay, Céline; Clarys, David; Taconnat, Laurence
2006-05-01
This research investigated adult age differences in a metamemory monitoring task-episodic feeling-of-knowing (FOK) and in an episodic memory task-cued recall. Executive functioning and processing speed were examined as mediators of these age differences. Young and elderly adults were administered an episodic FOK task, a cued recall task, executive tests and speed tests. Age-related decline was observed on all the measures. Correlation analyses revealed a pattern of double dissociation which indicates a specific relationship between executive score and FOK accuracy, and between speed score and cued recall. When executive functioning and processing speed were evaluated concurrently on FOK and cued recall variables, hierarchical regression analyses showed that executive score was a better mediator of age-related variance in FOK, and that speed score was the better mediator of age-related variance in cued recall.
Kenya, Amilliah W.; Hart, John F.; Vuyiya, Charles K.
2016-01-01
Objective: This study compared National Board of Chiropractic Examiners part I test scores between students who did and did not serve as tutors on the subject matter. Methods: Students who had a prior grade point average of 3.45 or above on a 4.0 scale just before taking part I of the board exams were eligible to participate. A 2-sample t-test was used to ascertain the difference in the mean scores on part I between the tutor group (n = 28) and nontutor (n = 29) group. Results: Scores were higher in all subjects for the tutor group compared to the nontutor group and the differences were statistically significant (p < .01) with large effect sizes. Conclusion: The tutors in this study performed better on part I of the board examination compared to nontutors, suggesting that tutoring results in an academic benefit for tutors themselves. PMID:26998665
Salamonsen, Matthew; McGrath, David; Steiler, Geoff; Ware, Robert; Colt, Henri; Fielding, David
2013-09-01
To reduce complications and increase success, thoracic ultrasound is recommended to guide all chest drainage procedures. Despite this, no tools currently exist to assess proceduralist training or competence. This study aims to validate an instrument to assess physician skill at performing thoracic ultrasound, including effusion markup, and examine its validity. We developed an 11-domain, 100-point assessment sheet in line with British Thoracic Society guidelines: the Ultrasound-Guided Thoracentesis Skills and Tasks Assessment Test (UGSTAT). The test was used to assess 22 participants (eight novices, seven intermediates, seven advanced) on two occasions while performing thoracic ultrasound on a pleural effusion phantom. Each test was scored by two blinded expert examiners. Validity was examined by assessing the ability of the test to stratify participants according to expected skill level (analysis of variance) and demonstrating test-retest and intertester reproducibility by comparison of repeated scores (mean difference [95% CI] and paired t test) and the intraclass correlation coefficient. Mean scores for the novice, intermediate, and advanced groups were 49.3, 73.0, and 91.5 respectively, which were all significantly different (P < .0001). There were no significant differences between repeated scores. Procedural training on mannequins prior to unsupervised performance on patients is rapidly becoming the standard in medical education. This study has validated the UGSTAT, which can now be used to determine the adequacy of thoracic ultrasound training prior to clinical practice. It is likely that its role could be extended to live patients, providing a way to document ongoing procedural competence.
A study on Korean nursing students' educational outcomes
Oh, Kasil; Lee, Hyang-Yeon; Lee, Sook-Ja; Kim, In-Ja; Choi, Kyung-Sook; Ko, Myung-Sook
2011-01-01
The purpose of this study was to describe outcome indicators of nursing education including critical thinking, professionalism, leadership, and communication and to evaluate differences among nursing programs and academic years. A descriptive research design was employed. A total of 454 students from four year baccalaureate (BS) nursing programs and two three-year associate degree (AD) programs consented to complete self-administered questionnaires. The variables were critical thinking, professionalism, leadership and communication. Descriptive statistics, χ2-test, t-tests, ANOVA, and the Tukey test were utilized for the data analysis. All the mean scores of the variables were above average for the test instruments utilized. Among the BS students, those in the upper classes tended to attain higher scores, but this tendency was not identified in AD students. There were significant differences between BS students and AD students for the mean scores of leadership and communication. These findings suggested the need for further research to define properties of nursing educational outcomes, and to develop standardized instruments for research replication and verification. PMID:21602914
Laganà, A S; Burgio, M A; Ciancimino, L; Sicilia, A; Pizzo, A; Magno, C; Butticè, S; Triolo, O
2015-08-01
Aim of the study was to assess the recovery and quality of sexual activity of women during postpartum, in relation to delivery. We recruited 200 women at 8 weeks after delivery. For each patient we recorded mode of delivery, age, body mass index (BMI), parity and test Female Sexual Function Index (FSFI) score. Sixty-four women (32%) had spontaneous deliveries without episiotomy, 48 (24%) had it with episiotomy, 88 (44%) had caesarean sections. The analysis of variance (ANOVA) test showed no significant differences among the 3 groups for age, BMI, parity. The test FSFI evidenced 68 cases (34%) of Regular Female Sexual Function (RFSF) and 132 (66%) of Female Sexual Dysfunction (FSD). The ANOVA test showed significant differences among the 3 groups in RFSF (F [2, 14]=8.075, P=0.005), but not in FSD (F [2, 30]=2.646, P=0.087). In RFSF, FSFI score was higher in women who had vaginal delivery with episiotomy compared with the other two groups. Conversely, in FSD (both with or without resumed sexual activity at 8 weeks postpartum) we evidenced that patients who had vaginal delivery with episiotomy showed lower FSFI score than the other two groups, with a decrease in lubrication, orgasm and satisfaction scores. Furthermore, we observed that most of the RFSF patients had a job and breastfed. Our results did not evidence a direct and significant correlation between mode of delivery and onset of female postpartum sexual dysfunction, even if FSD patients who underwent episiotomy during delivery markedly showed low FSFI scores.
Van Nuland, Hanneke J C; Dusseldorp, Elise; Martens, Rob L; Boekaerts, Monique
2010-08-01
Different theoretical viewpoints on motivation make it hard to decide which model has the best potential to provide valid predictions on classroom performance. This study was designed to explore motivation constructs derived from different motivation perspectives that predict performance on a novel task best. Motivation constructs from self-determination theory, self-regulation theory, and achievement goal theory were investigated in tandem. Performance was measured by systematicity (i.e. how systematically students worked on a problem-solving task) and test score (i.e. score on a multiple-choice test). Hierarchical regression analyses on data from 259 secondary school students showed a quadratic relation between a performance avoidance orientation and both performance outcomes, indicating that extreme high and low performance avoidance resulted in the lowest performance. Furthermore, two three-way interaction effects were found. Intrinsic motivation seemed to play a key role in test score and systematicity performance, provided that effort regulation and metacognitive skills were both high. Results indicate that intrinsic motivation in itself is not enough to attain a good performance. Instead, a moderate score on performance avoidance, together with the ability to remain motivated and effectively regulate and control task behavior, is needed to attain a good performance. High time management skills also contributed to higher test score and systematicity performance and a low performance approach orientation contributed to higher systematicity performance. We concluded that self-regulatory skills should be trained in order to have intrinsically motivated students perform well on novel tasks in the classroom.
Abbreviated neuropsychological assessment in schizophrenia
Harvey, Philip D.; Keefe, Richard S. E.; Patterson, Thomas L.; Heaton, Robert K.; Bowie, Christopher R.
2008-01-01
The aim of this study was to identify the best subset of neuropsychological tests for prediction of several different aspects of functioning in a large (n = 236) sample of older people with schizophrenia. While the validity of abbreviated assessment methods has been examined before, there has never been a comparative study of the prediction of different elements of cognitive impairment, real-world outcomes, and performance-based measures of functional capacity. Scores on 10 different tests from a neuropsychological assessment battery were used to predict global neuropsychological (NP) performance (indexed with averaged scores or calculated general deficit scores), performance-based indices of everyday-living skills and social competence, and case-manager ratings of real-world functioning. Forward entry stepwise regression analyses were used to identify the best predictors for each of the outcomes measures. Then, the analyses were adjusted for estimated premorbid IQ, which reduced the magnitude, but not the structure, of the correlations. Substantial amounts (over 70%) of the variance in overall NP performance were accounted for by a limited number of NP tests. Considerable variance in measures of functional capacity was also accounted for by a limited number of tests. Different tests constituted the best predictor set for each outcome measure. A substantial proportion of the variance in several different NP and functional outcomes can be accounted for by a small number of NP tests that can be completed in a few minutes, although there is considerable unexplained variance. However, the abbreviated assessments that best predict different outcomes vary across outcomes. Future studies should determine whether responses to pharmacological and remediation treatments can be captured with brief assessments as well. PMID:18720182
Sargin, Mehmet Akif; Yassa, Murat; Taymur, Bilge Dogan; Taymur, Bulent; Akca, Gizem; Tug, Niyazi
2017-04-01
To compare the status of female sexual dysfunction (FSD) between women with a history of previous gestational diabetes mellitus (GDM) and those with follow-up of a healthy pregnancy, using the female sexual function index (FSFI) questionnaire. Cross-sectional study. Department of Obstetrics and Gynecology, Fatih Sultan Mehmet Training and Research Hospital, Istanbul, Turkey, from September to December 2015. Healthy sexually active adult parous females were included. Participants were asked to complete the validated Turkish versions of the FSFI and Hospital Anxiety and Depression Scale (HADS) questionnaires. Student's t-test was used for two-group comparisons of normally distributed variables and quantitative data. Mann-Whitney U-test was used for two-group comparisons of non-normally distributed variables. Pearson's chi-squared test, the Fisher-FreemanHalton test, Fisher's exact test, and Yates' continuity correction test were used for comparison of qualitative data. The mean FSFI scores of the 179 participants was 23.50 ±3.94. FSFI scores and scores of desire, arousal, lubrication, orgasm, satisfaction, and pain were not statistically significantly different (p>0.05), according to a history of GDM and types of FSD (none, mild, severe). HADS scores and anxiety and depression types did not statistically significantly differ according to the history of GDM (p>0.05). An association could not be found in FSFI scores between participants with both the history of previous GDM and with healthy pregnancy; subclinical sexual dysfunction may be observed in the late postpartum period among women with a history of previous GDM. This may adversely affect their sexual health.
The effects of hands-on-science instruction on the science achievement of middle school students
NASA Astrophysics Data System (ADS)
Wiggins, Felita
Student achievement in the Twenty First Century demands a new rigor in student science knowledge, since advances in science and technology require students to think and act like scientists. As a result, students must acquire proficient levels of knowledge and skills to support a knowledge base that is expanding exponentially with new scientific advances. This study examined the effects of hands-on-science instruction on the science achievement of middle school students. More specifically, this study was concerned with the influence of hands-on science instruction versus traditional science instruction on the science test scores of middle school students. The subjects in this study were one hundred and twenty sixth-grade students in six classes. Instruction involved lecture/discussion and hands-on activities carried out for a three week period. Specifically, the study ascertained the influence of the variables gender, ethnicity, and socioeconomic status on the science test scores of middle school students. Additionally, this study assessed the effect of the variables gender, ethnicity, and socioeconomic status on the attitudes of sixth grade students toward science. The two instruments used to collect data for this study were the Prentice Hall unit ecosystem test and the Scientific Work Experience Programs for Teachers Study (SWEPT) student's attitude survey. Moreover, the data for the study was treated using the One-Way Analysis of Covariance and the One-Way Analysis of Variance. The following findings were made based on the results: (1) A statistically significant difference existed in the science performance of middle school students exposed to hands-on science instruction. These students had significantly higher scores than the science performance of middle school students exposed to traditional instruction. (2) A statistically significant difference did not exist between the science scores of male and female middle school students. (3) A statistically significant difference did not exist between the science scores of African American and non-African American middle school students. (4) A statistically significant difference existed in the socioeconomic status of students who were not provided with assisted lunches. Students with unassisted lunches had significantly higher science scores than those middle school students who were provided with assisted lunches. (5) A statistically significant difference was not found in the attitude scores of middle school students who were exposed to hands-on or traditional science instruction. (6) A statistically significant difference was not found in the observed attitude scores of middle school students who were exposed to either hands-on or traditional science instruction by their socioeconomic status. (7) A statistically significant difference was not found in the observed attitude scores of male and female students. (8) A statistically significant difference was not found in the observed attitude scores of African American and non African American students.
Automated essay scoring and the future of educational assessment in medical education.
Gierl, Mark J; Latifi, Syed; Lai, Hollis; Boulais, André-Philippe; De Champlain, André
2014-10-01
Constructed-response tasks, which range from short-answer tests to essay questions, are included in assessments of medical knowledge because they allow educators to measure students' ability to think, reason, solve complex problems, communicate and collaborate through their use of writing. However, constructed-response tasks are also costly to administer and challenging to score because they rely on human raters. One alternative to the manual scoring process is to integrate computer technology with writing assessment. The process of scoring written responses using computer programs is known as 'automated essay scoring' (AES). An AES system uses a computer program that builds a scoring model by extracting linguistic features from a constructed-response prompt that has been pre-scored by human raters and then, using machine learning algorithms, maps the linguistic features to the human scores so that the computer can be used to classify (i.e. score or grade) the responses of a new group of students. The accuracy of the score classification can be evaluated using different measures of agreement. Automated essay scoring provides a method for scoring constructed-response tests that complements the current use of selected-response testing in medical education. The method can serve medical educators by providing the summative scores required for high-stakes testing. It can also serve medical students by providing them with detailed feedback as part of a formative assessment process. Automated essay scoring systems yield scores that consistently agree with those of human raters at a level as high, if not higher, as the level of agreement among human raters themselves. The system offers medical educators many benefits for scoring constructed-response tasks, such as improving the consistency of scoring, reducing the time required for scoring and reporting, minimising the costs of scoring, and providing students with immediate feedback on constructed-response tasks. © 2014 John Wiley & Sons Ltd.
Quantitative traits for the tail suspension test: automation, optimization, and BXD RI mapping.
Lad, Heena V; Liu, Lin; Payá-Cano, José L; Fernandes, Cathy; Schalkwyk, Leonard C
2007-07-01
Immobility in the tail suspension test (TST) is considered a model of despair in a stressful situation, and acute treatment with antidepressants reduces immobility. Inbred strains of mouse exhibit widely differing baseline levels of immobility in the TST and several quantitative trait loci (QTLs) have been nominated. The labor of manual scoring and various scoring criteria make obtaining robust data and comparisons across different laboratories problematic. Several studies have validated strain gauge and video analysis methods by comparison with manual scoring. We set out to find objective criteria for automated scoring parameters that maximize the biological information obtained, using a video tracking system on tapes of tail suspension tests of 24 lines of the BXD recombinant inbred panel and the progenitor strains C57BL/6J and DBA/2J. The maximum genetic effect size is captured using the highest time resolution and a low mobility threshold. Dissecting the trait further by comparing genetic association of multiple measures reveals good evidence for loci involved in immobility on chromosomes 4 and 15. These are best seen when using a high threshold for immobility, despite the overall better heritability at the lower threshold. A second trial of the test has greater duration of immobility and a completely different genetic profile. Frequency of mobility is also an independent phenotype, with a distal chromosome 1 locus.
Gasquoine, Philip Gerard; Croyle, Kristin L; Cavazos-Gonzalez, Cynthia; Sandoval, Omar
2007-11-01
This study compared the performance of Hispanic American bilingual adults on Spanish and English language versions of a neuropsychological test battery. Language achievement test scores were used to divide 36 bilingual, neurologically intact, Hispanic Americans from south Texas into Spanish-dominant, balanced, and English-dominant bilingual groups. They were administered the eight subtests of the Bateria Neuropsicologica and the Matrix Reasoning subtest of the WAIS-III in Spanish and English. Half the participants were tested in Spanish first. Balanced bilinguals showed no significant differences in test scores between Spanish and English language administrations. Spanish and/or English dominant bilinguals showed significant effects of language of administration on tests with higher language compared to visual perceptual weighting (Woodcock-Munoz Language Survey-Revised, Letter Fluency, Story Memory, and Stroop Color and Word Test). Scores on tests with higher visual-perceptual weighting (Matrix Reasoning, Figure Memory, Wisconsin Card Sorting Test, and Spatial Span), were not significantly affected by language of administration, nor were scores on the Spanish/California Verbal Learning Test, and Digit Span. A problem was encountered in comparing false positive rates in each language, as Spanish norms fell below English norms, resulting in a much higher false positive rate in English across all bilingual groupings. Use of a comparison standard (picture vocabulary score) reduced false positive rates in both languages, but the higher false positive rate in English persisted.
Ferrie, Joseph P; Rolf, Karen; Troesken, Werner
2012-01-01
Higher prior exposure to water-borne lead among male World War Two U.S. Army enlistees was associated with lower intelligence test scores. Exposure was proxied by urban residence and the water pH levels of the cities where enlistees lived in 1930. Army General Classification Test scores were six points lower (nearly 1/3 standard deviation) where pH was 6 (so the water lead concentration for a given amount of lead piping was higher) than where pH was 7 (so the concentration was lower). This difference rose with time exposed. At this time, the dangers of exposure to lead in water were not widely known and lead was ubiquitous in water systems, so these results are not likely the effect of individuals selecting into locations with different levels of exposure. Copyright © 2011 Elsevier B.V. All rights reserved.
Reference values and equations reference of balance for children of 8 to 12 years.
Libardoni, Thiele de Cássia; Silveira, Carolina Buzzi da; Sinhorim, Larissa Milani Brognoli; Oliveira, Anamaria Siriani de; Santos, Márcio José Dos; Santos, Gilmar Moraes
2018-02-01
There are still no normative data in balance sway for school-age children in Brazil. We aimed to establish the reference ranges for balance scores and to develop prediction equations for estimation of balance scores in children aged 8 to 12 years old. The study included 165 healthy children (83 boys and 82 girls; age, 8-12 years) recruited from a public school in the city of Florianópolis, Santa Catarina, Brazil. We used the Sensory Organization Test to assess the balance scores and both a digital scale and a stadiometer to measure the anthropometric variables. We tested a stepwise multiple-regression model with sex, height, weight, and mid-thigh circumference of the dominant leg as predictors of the balance score. For all experimental conditions, girls' age accounted for over 85% of the variability in balance scores; while, boys' age accounted only 55% of the variability in balance scores. Therefore, balance scores increase with age for boys and girls. This study described the ranges of age- and sex-specific normative values for balance scores in children during 6 different testing conditions established by the sensory organization test. We confirmed that age was the predictor that best explained the variability in balance scores in children between 8 and 12 years old. This study stimulates a new and more comprehensive study to estimate balance scores from prediction equations for overall Brazilian pediatric population. Copyright © 2017 Elsevier B.V. All rights reserved.
Wrzus, Cornelia; Egloff, Boris; Riediger, Michaela
2017-08-01
Implicit association tests (IATs) are increasingly used to indirectly assess people's traits, attitudes, or other characteristics. In addition to measuring traits or attitudes, IAT scores also reflect differences in cognitive abilities because scores are based on reaction times (RTs) and errors. As cognitive abilities change with age, questions arise concerning the usage and interpretation of IATs for people of different age. To address these questions, the current study examined how cognitive abilities and cognitive processes (i.e., quad model parameters) contribute to IAT results in a large age-heterogeneous sample. Participants (N = 549; 51% female) in an age-stratified sample (range = 12-88 years) completed different IATs and 2 tasks to assess cognitive processing speed and verbal ability. From the IAT data, D2-scores were computed based on RTs, and quad process parameters (activation of associations, overcoming bias, detection, guessing) were estimated from individual error rates. Substantial IAT scores and quad processes except guessing varied with age. Quad processes AC and D predicted D2-scores of the content-specific IAT. Importantly, the effects of cognitive abilities and quad processes on IAT scores were not significantly moderated by participants' age. These findings suggest that IATs seem suitable for age-heterogeneous studies from adolescence to old age when IATs are constructed and analyzed appropriately, for example with D-scores and process parameters. We offer further insight into how D-scoring controls for method effects in IATs and what IAT scores capture in addition to implicit representations of characteristics. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
No better moment to score a goal than just before half time? A soccer myth statistically tested
Amez, Simon
2018-01-01
We test the soccer myth suggesting that a particularly good moment to score a goal is just before half time. To this end, rich data on 1,179 games played in the UEFA Champions League and UEFA Europa League are analysed. In contrast to the myth, we find that, conditional on the goal difference and other game characteristics at half time, the final goal difference at the advantage of the home team is 0.520 goals lower in case of a goal just before half time by this team. We show that this finding relates to this team’s lower probability of scoring a goal during the second half. PMID:29518165
Comparison of Lecture-Based Learning vs Discussion-Based Learning in Undergraduate Medical Students.
Zhao, Beiqun; Potter, Donald D
2016-01-01
To compare lecture-based learning (LBL) and discussion-based learning (DBL) by assessing immediate and long-term knowledge retention and application of practical knowledge in third- and fourth-year medical students. A prospective, randomized control trial was designed to study the effects of DBL. Medical students were randomly assigned to intervention (DBL) or control (LBL) groups. Both the groups were instructed regarding the management of gastroschisis. The control group received a PowerPoint presentation, whereas the intervention group was guided only by an objectives list and a gastroschisis model. Students were evaluated using a multiple-choice pretest (Pre-Test MC) immediately before the teaching session, a posttest (Post-Test MC) following the session, and a follow-up test (Follow-Up MC) at 3 months. A practical examination (PE), which tested simple skills and management decisions, was administered at the end of the clerkship (Initial PE) and at 3 months after clerkship (Follow-Up PE). Students were also given a self-evaluation immediately following the Post-Test MC to gauge satisfaction and comfort level in the management of gastroschisis. University of Iowa Hospitals and Clinics and the Carver College of Medicine, Iowa City, IA. A total of 49 third- and fourth-year medical students who were enrolled in the general surgery clerkship were eligible for this study. Enrollment into the study was completely voluntary. Of the 49 eligible students, 36 students agreed to participate in the study, and 27 completed the study. Mean scores for the Pre-Test MC, Post-Test MC, and Follow-Up MC were similar between the control and intervention groups. In the control group, the Post-Test MC scores were significantly greater than Pre-Test MC scores (8.92 ± 0.79 vs 4.00 ± 1.04, p < 0.0001), whereas the Follow-Up MC scores were significantly lower than Post-Test MC scores (7.17 ± 1.75 vs 8.92 ± 0.79, p = 0.005). In the control group, the Follow-Up MC scores were significantly greater than Pre-Test MC scores (7.17 ± 1.75 vs 4.00 ± 1.04, p < 0.0001). Analysis of variance for all control group MC examinations had a p < 0.0001. In the intervention group, the Post-Test MC scores were significantly greater than Pre-Test MC scores (8.33 ± 1.23 vs 4.60 ± 1.55, p < 0.0001), whereas the Follow-Up MC scores were significantly lower than Post-Test MC scores (7.13 ± 1.77 vs 8.33 ± 1.23, p = 0.04). In the intervention group, the Follow-Up MC scores were significantly greater than Pre-Test MC scores (7.13 ± 1.77 vs 4.60 ± 1.55, p = 0.0002). Analysis of variance for all intervention group MC examinations had a p < 0.0001. Mean scores for the Initial PE were significantly higher for the intervention group compared with the control group's score (7.47 ± 1.68 vs 5.25 ± 2.34, p = 0.008). Mean scores for the Follow-Up PE were significantly higher for the intervention group compared with the control group's score (7.87 ± 1.77 vs 5.83 ± 2.04, p = 0.005). A comparison of Initial PE vs Follow-Up PE was not significant in either group. Students in the intervention group were more comfortable in the immediate management of gastroschisis and placement of a silo and felt that the educational experience was more worthwhile than students in the control group did. After a single instructional session, there was a significant difference in the students' scores between the control and the intervention groups on both administrations of the PEs. There were no significant differences between the 2 groups in any administration of the MC examinations. This seems to suggest that DBL may lead to better practical knowledge and potentially improved long-term knowledge retention when compared with LBL. Students in the DBL group also felt more comfortable with the management of gastroschisis and were more satisfied with the educational session. Copyright © 2015 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
Testing a computer-based ostomy care training resource for staff nurses.
Bales, Isabel
2010-05-01
Fragmented teaching and ostomy care provided by nonspecialized clinicians unfamiliar with state-of-the-art care and products have been identified as problems in teaching ostomy care to the new ostomate. After conducting a literature review of theories and concepts related to the impact of nurse behaviors and confidence on ostomy care, the author developed a computer-based learning resource and assessed its effect on staff nurse confidence. Of 189 staff nurses with a minimum of 1 year acute-care experience employed in the acute care, emergency, and rehabilitation departments of an acute care facility in the Midwestern US, 103 agreed to participate and returned completed pre- and post-tests, each comprising the same eight statements about providing ostomy care. F and P values were computed for differences between pre- and post test scores. Based on a scale where 1 = totally disagree and 5 = totally agree with the statement, baseline confidence and perceived mean knowledge scores averaged 3.8 and after viewing the resource program post-test mean scores averaged 4.51, a statistically significant improvement (P = 0.000). The largest difference between pre- and post test scores involved feeling confident in having the resources to learn ostomy skills independently. The availability of an electronic ostomy care resource was rated highly in both pre- and post testing. Studies to assess the effects of increased confidence and knowledge on the quality and provision of care are warranted.
Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions.
Liu, Zhihai; Su, Minyi; Han, Li; Liu, Jie; Yang, Qifan; Li, Yan; Wang, Renxiao
2017-02-21
In structure-based drug design, scoring functions are widely used for fast evaluation of protein-ligand interactions. They are often applied in combination with molecular docking and de novo design methods. Since the early 1990s, a whole spectrum of protein-ligand interaction scoring functions have been developed. Regardless of their technical difference, scoring functions all need data sets combining protein-ligand complex structures and binding affinity data for parametrization and validation. However, data sets of this kind used to be rather limited in terms of size and quality. On the other hand, standard metrics for evaluating scoring function used to be ambiguous. Scoring functions are often tested in molecular docking or even virtual screening trials, which do not directly reflect the genuine quality of scoring functions. Collectively, these underlying obstacles have impeded the invention of more advanced scoring functions. In this Account, we describe our long-lasting efforts to overcome these obstacles, which involve two related projects. On the first project, we have created the PDBbind database. It is the first database that systematically annotates the protein-ligand complexes in the Protein Data Bank (PDB) with experimental binding data. This database has been updated annually since its first public release in 2004. The latest release (version 2016) provides binding data for 16 179 biomolecular complexes in PDB. Data sets provided by PDBbind have been applied to many computational and statistical studies on protein-ligand interaction and various subjects. In particular, it has become a major data resource for scoring function development. On the second project, we have established the Comparative Assessment of Scoring Functions (CASF) benchmark for scoring function evaluation. Our key idea is to decouple the "scoring" process from the "sampling" process, so scoring functions can be tested in a relatively pure context to reflect their quality. In our latest work on this track, i.e. CASF-2013, the performance of a scoring function was quantified in four aspects, including "scoring power", "ranking power", "docking power", and "screening power". All four performance tests were conducted on a test set containing 195 high-quality protein-ligand complexes selected from PDBbind. A panel of 20 standard scoring functions were tested as demonstration. Importantly, CASF is designed to be an open-access benchmark, with which scoring functions developed by different researchers can be compared on the same grounds. Indeed, it has become a popular choice for scoring function validation in recent years. Despite the considerable progress that has been made so far, the performance of today's scoring functions still does not meet people's expectations in many aspects. There is a constant demand for more advanced scoring functions. Our efforts have helped to overcome some obstacles underlying scoring function development so that the researchers in this field can move forward faster. We will continue to improve the PDBbind database and the CASF benchmark in the future to keep them as useful community resources.
ERIC Educational Resources Information Center
Bitsika, Vicki; Sharpley, Chris F.; Melham, Therese C.
2010-01-01
Anxiety and depression inventory scores from 200 male and female university students attending a private university in Australia were examined for their factor structure. Once established, the two sets of factors were tested for gender-based differences, revealing that females were more likely than males to report symptomatology associated with…
The Effect of Sex-Dependent Norms on Aggregated Reading and Mathematics Test Scores.
ERIC Educational Resources Information Center
Gramenz, Gary W.; And Others
1986-01-01
Differences between school reading and mathematics means, rank orderings, and change scores obtained from total-group and sex-dependent norms were examined for students in grades 2 through 9 in a Florida school district. A preliminary study investigated sex-related differences between verbal and quantitative performance of boys and girls on seven…
ERIC Educational Resources Information Center
Crossley, Scott A.; Allen, Laura K.; Snow, Erica L.; McNamara, Danielle S.
2016-01-01
This study investigates a novel approach to automatically assessing essay quality that combines natural language processing approaches that assess text features with approaches that assess individual differences in writers such as demographic information, standardized test scores, and survey results. The results demonstrate that combining text…
A Response to Some Questions Raised About the Woodcock-Johnson: I. The Mean Score Discrepancy Issue.
ERIC Educational Resources Information Center
Woodcook, Richard W.
1984-01-01
Twenty-one studies that reported mean score differences between the Woodcock-Johnson Tests of Cognitive Ability (WJTCA) and the Wechsler Intelligence Scale for Children-Revised (WISC-R) Full Scale are summarized. The differences are found to be minimal and are attributed to data bias and WJTCA's higher correlation with achievement. (EGS)
Meaningfulness of Sex Differences in Selected Interest-Values Test Scores.
ERIC Educational Resources Information Center
Plant, Walter T.; Southern, Mara L.
This research paper examines the meaningfulness of sex differences in the Allport, Vernon and Lindzey (AVL) Study of Values Scale and in selected scales of the Strong-Campbell Interest Inventory (SCII), using somewhat diverse groups of men and women. By comparing men's and women's scores on the two measures, it was found that little accuracy in…
Peck, Karen Y; DiStefano, Lindsay J; Marshall, Stephen W; Padua, Darin A; Beutler, Anthony I; de la Motte, Sarah J; Frank, Barnett S; Martinez, Jessica C; Cameron, Kenneth L
2017-11-01
Peck, KY, DiStefano, LJ, Marshall, SW, Padua, DA, Beutler, AI, de la Motte, SJ, Frank, BS, Martinez, JC, and Cameron, KL. Effect of a lower extremity preventive training program on physical performance scores in military recruits. J Strength Cond Res 31(11): 3146-3157, 2017-Exercise-based preventive training programs are designed to improve movement patterns associated with lower extremity injury risk; however, the impact of these programs on general physical fitness has not been evaluated. The purpose of this study was to compare fitness scores between participants in a preventive training program and a control group. One thousand sixty-eight freshmen from a U.S. Service Academy were cluster-randomized into either the intervention or control group during 6 weeks of summer training. The intervention group performed a preventive training program, specifically the Dynamic Integrated Movement Enhancement (DIME), which is designed to improve lower extremity movement patterns. The control group performed the Army Preparation Drill (PD), a warm-up designed to prepare soldiers for training. Main outcome measures were the Army Physical Fitness Test (APFT) raw and scaled (for age and sex) scores. Independent t tests were used to assess between-group differences. Multivariable logistic regression models were used to control for the influence of confounding variables. Dynamic Integrated Movement Enhancement group participants completed the APFT 2-mile run 20 seconds faster compared with the PD group (p < 0.001), which corresponded with significantly higher scaled scores (p < 0.001). Army Physical Fitness Test push-up scores were significantly higher in the DIME group (p = 0.041), but there were no significant differences in APFT sit-up scores. The DIME group had significantly higher total APFT scores compared with the PD group (p < 0.001). Similar results were observed in multivariable models after controlling for sex and body mass index (BMI). Committing time to the implementation of a preventive training program does not appear to negatively affect fitness test scores.
A web-based normative calculator for the uniform data set (UDS) neuropsychological test battery.
Shirk, Steven D; Mitchell, Meghan B; Shaughnessy, Lynn W; Sherman, Janet C; Locascio, Joseph J; Weintraub, Sandra; Atri, Alireza
2011-11-11
With the recent publication of new criteria for the diagnosis of preclinical Alzheimer's disease (AD), there is a need for neuropsychological tools that take premorbid functioning into account in order to detect subtle cognitive decline. Using demographic adjustments is one method for increasing the sensitivity of commonly used measures. We sought to provide a useful online z-score calculator that yields estimates of percentile ranges and adjusts individual performance based on sex, age and/or education for each of the neuropsychological tests of the National Alzheimer's Coordinating Center Uniform Data Set (NACC, UDS). In addition, we aimed to provide an easily accessible method of creating norms for other clinical researchers for their own, unique data sets. Data from 3,268 clinically cognitively-normal older UDS subjects from a cohort reported by Weintraub and colleagues (2009) were included. For all neuropsychological tests, z-scores were estimated by subtracting the raw score from the predicted mean and then dividing this difference score by the root mean squared error term (RMSE) for a given linear regression model. For each neuropsychological test, an estimated z-score was calculated for any raw score based on five different models that adjust for the demographic predictors of SEX, AGE and EDUCATION, either concurrently, individually or without covariates. The interactive online calculator allows the entry of a raw score and provides five corresponding estimated z-scores based on predictions from each corresponding linear regression model. The calculator produces percentile ranks and graphical output. An interactive, regression-based, normative score online calculator was created to serve as an additional resource for UDS clinical researchers, especially in guiding interpretation of individual performances that appear to fall in borderline realms and may be of particular utility for operationalizing subtle cognitive impairment present according to the newly proposed criteria for Stage 3 preclinical Alzheimer's disease.
Construction of an Exome-Wide Risk Score for Schizophrenia Based on a Weighted Burden Test.
Curtis, David
2018-01-01
Polygenic risk scores obtained as a weighted sum of associated variants can be used to explore association in additional data sets and to assign risk scores to individuals. The methods used to derive polygenic risk scores from common SNPs are not suitable for variants detected in whole exome sequencing studies. Rare variants, which may have major effects, are seen too infrequently to judge whether they are associated and may not be shared between training and test subjects. A method is proposed whereby variants are weighted according to their frequency, their annotations and the genes they affect. A weighted sum across all variants provides an individual risk score. Scores constructed in this way are used in a weighted burden test and are shown to be significantly different between schizophrenia cases and controls using a five-way cross-validation procedure. This approach represents a first attempt to summarise exome sequence variation into a summary risk score, which could be combined with risk scores from common variants and from environmental factors. It is hoped that the method could be developed further. © 2017 John Wiley & Sons Ltd/University College London.
Preterm birth, social disadvantage, and cognitive competence in Swedish 18- to 19-year-old men.
Ekeus, Cecilia; Lindström, Karolina; Lindblad, Frank; Rasmussen, Finn; Hjern, Anders
2010-01-01
The aim was to study the impact of a range of gestational ages (GAs) on cognitive competence in late adolescence and how this effect is modified by contextual social adversity in childhood. This was a register study based on a national cohort of 119664 men born in Sweden from 1973 to 1976. Data on GA and other perinatal factors were obtained from the Medical Birth Register, and information on cognitive test scores was extracted from military conscription at the ages of 18 to 19 years. Test scores were analyzed as z scores on a 9-point stanine scale, whereby each unit is equivalent to 0.5 SD. Socioeconomic indicators of the childhood household were obtained from the Population and Housing Census of 1990. The data were analyzed by multivariate linear regression. The mean cognitive test scores decreased in a stepwise manner with GA. In unadjusted analysis, the test scores were 0.63 stanine unit lower in men who were born after 24 to 32 gestational weeks than in those who were born at term. The difference in global scores between the lowest and highest category of socioeconomic status was 1.57. Adjusting the analysis for the childhood socioeconomic indicators decreased the effect of GA on cognitive test scores by 26% to 33%. There was also a multiplicative interaction effect of social adversity and moderately preterm birth on cognitive test scores. This study confirms previous claims of an incremental association of cognitive competence with GA. Socioeconomic indicators in childhood modified this effect at all levels of preterm birth.
Impact of a weekly reading program on orthopedic surgery residents' in-training examination.
Weglein, Daniel G; Gugala, Zbigniew; Simpson, Suzanne; Lindsey, Ronald W
2015-05-01
In response to a decline in individual residents' performance and overall program performance on the Orthopaedic In-Training Examination (OITE), the authors' department initiated a daily literature reading program coupled with weekly tests on the assigned material. The goal of this study was to assess the effect of the reading program on individual residents' scores and the training program's OITE scores. The reading program consisted of daily review articles from the Journal of the American Academy of Orthopaedic Surgeons, followed by a weekly written examination consisting of multiple-choice or fill-in-the-blank questions. All articles were selected and all questions were written by the departmental chair. A questionnaire was given to assess residents' perceptions of the weekly tests. As a result of implementing the reading program for a 10-month period, residents' subsequent performance on the OITE significantly improved (mean score increase, 4, P<.0001; percentile score increase, 11, P=.0007). The difference in mean score was significant for residents in postgraduate years 3, 4, and 5. A statistically significant correlation was found between weekly test scores and performance on the OITE, with a significant correlation between weekly test scores and OITE percentile ranking. The study results also showed a positive correlation between reading test attendance and weekly test scores. Residents' anonymous questionnaire responses also demonstrated the reading program to be a valuable addition to the residency training curriculum. In conclusion, the study strongly supports the benefits of a weekly reading and examination program in enhancing the core knowledge of orthopedic surgery residents. Copyright 2015, SLACK Incorporated.
Emotional intelligence and psychological health in a sample of Kuwaiti college students.
Alkhadher, Othman
2007-06-01
This summary investigated correlations between emotional intelligence and psychological health amongst 191 Kuwaiti undergraduate students in psychology, 98 men and 93 women (M age=20.6 yr., SD=2.8). There were two measures of emotional intelligence, one based on the ability model, the Arabic Test for Emotional Intelligence, and the other on the mixed model, the Emotional Intelligence Questionnaire. Participants' psychological health was assessed using scales from the Personality Assessment Inventory. A weak relationship between the two types of emotional intelligence was found. A correlation for scores on the Emotional Intelligence Questionnaire with the Personality Assessment Inventory was found but not with those of the Arabic Test for Emotional Intelligence. Regression analysis indicated scores on Managing Emotions and Self-awareness accounted for most of the variance in the association with the Personality Assessment Inventory. Significant sex differences were found only on the Arabic Test for Emotional Intelligence; women scored higher than men. On Emotional Intelligence Questionnaire measures, men had significantly higher means on Managing Emotions and Self-motivation. However, no significant differences were found between the sexes on the Total Emotional Intelligence Questionnaire scores.
ERIC Educational Resources Information Center
Wheelock, Anne
Scores on the Massachusetts Comprehensive Assessment System (MCAS) tests are used to select exemplary schools in Massachusetts, and the schools thus identified can receive awards from three different programs. This study examined the evidence about the use of MCAS scores to assess school quality. These three programs use MCAS to identify exemplary…
The Usefulness of the Bock Model for Scoring with Information from Incorrect Responses.
ERIC Educational Resources Information Center
Huynh, Huynh; Casteel, Jim
1987-01-01
In the context of pass/fail decisions, using the Bock multi-nominal latent trait model for moderate-length tests does not produce decisions that differ substantially from those based on the raw scores. The Bock decisions appear to relate less strongly to outside criteria than those based on the raw scores. (Author/JAZ)
Assessment of theory of mind in children with communication disorders: role of presentation mode.
van Buijsen, Marit; Hendriks, Angelique; Ketelaars, Mieke; Verhoeven, Ludo
2011-01-01
Children with communication disorders have problems with both language and social interaction. The theory-of-mind hypothesis provides an explanation for these problems, and different tests have been developed to test this hypothesis. However, different modes of presentation are used in these tasks, which make the results difficult to compare. In the present study, the performances of typically developing children, children with specific language impairments, and children with autism spectrum disorders were therefore compared using three theory-of-mind tests (the Charlie test, the Smarties test, and the Sally-and-Anne test) presented in three different manners each (spoken, video, and line drawing modes). The results showed differential outcomes for the three types of tests and a significant interaction between group of children and mode of presentation. For the typically developing children, no differential effects of presentation mode were detected. For the children with SLI, the highest test scores were consistently evidenced in the line-drawing mode. For the children with ASD, test performance depended on the mode of presentation. Just how the children's non-verbal age, verbal age, and short-term memory related to their test scores was also explored for each group of children. The test scores of the SLI group correlated significantly with their short-term memory, those of the ASD group with their verbal age. These findings demonstrate that performance on theory-of-mind tests clearly depend upon mode of test presentation as well as the children's cognitive and linguistic abilities. Copyright © 2011 Elsevier Ltd. All rights reserved.
Proposal for a new categorization of aseptic processing facilities based on risk assessment scores.
Katayama, Hirohito; Toda, Atsushi; Tokunaga, Yuji; Katoh, Shigeo
2008-01-01
Risk assessment of aseptic processing facilities was performed using two published risk assessment tools. Calculated risk scores were compared with experimental test results, including environmental monitoring and media fill run results, in three different types of facilities. The two risk assessment tools used gave a generally similar outcome. However, depending on the tool used, variations were observed in the relative scores between the facilities. For the facility yielding the lowest risk scores, the corresponding experimental test results showed no contamination, indicating that these ordinal testing methods are insufficient to evaluate this kind of facility. A conventional facility having acceptable aseptic processing lines gave relatively high risk scores. The facility showing a rather high risk score demonstrated the usefulness of conventional microbiological test methods. Considering the significant gaps observed in calculated risk scores and in the ordinal microbiological test results between advanced and conventional facilities, we propose a facility categorization based on risk assessment. The most important risk factor in aseptic processing is human intervention. When human intervention is eliminated from the process by advanced hardware design, the aseptic processing facility can be classified into a new risk category that is better suited for assuring sterility based on a new set of criteria rather than on currently used microbiological analysis. To fully benefit from advanced technologies, we propose three risk categories for these aseptic facilities.
A Web-based course on infection control for physicians in training: an educational intervention.
Fakih, Mohamad G; Enayet, Iram; Minnick, Steven; Saravolatz, Louis D
2006-07-01
To evaluate the effectiveness of a Web-based course on infection control accessed by physicians in training. Educational intervention. A 607-bed urban teaching hospital. A total of 55 physicians in training beginning their first postgraduate year (the iPGY1 group) and 59 physicians completing their first, second, or third postgraduate year (the oPGY group). Individuals in the iPGY1 group took a Web-based course on infection control practices. Persons in the iPGY1 group who took the Web-based course completed an evaluation test consisting of 15 multiple-choice questions (total possible score, 15 points). The same test was given to persons in the oPGY group, who did not take the Web-based course. We compared scores of the Web-based test taken by subjects in the iPGY1 group immediately after the course with scores of the test they took 3 months after the course and with test scores of subjects in the oPGY group. The mean score (+/-SD) for subjects in the iPGY1 group who took the Web-based course was 10.6+/-2.2, compared with 8.0+/-2.5 for subjects in the oPGY group (P<.001). The mean score (+/-SD) for subjects in the iPGY1 group 3 months after completing the course decreased to 8.0+/-2.4 (P<.001 by the paired t test). For the oPGY group, significant differences were found between the scores (+/-SD) for subjects in the internal medicine (9.9+/-2.3), emergency medicine (8.4+/-1.7), pediatrics (7.0+/-1.7), and family medicine (5.8+/-1.6) residency programs (P<.001); there were no significant differences in scores according to the year of residency. Web-based infection control courses are an attractive teaching tool for physicians in training and need to be considered for teaching infection control. The evaluation of information retention will help identify physicians in training who require further training.
Physiological and behavioral responses of horses during police training.
Munsters, C C B M; Visser, E K; van den Broek, J; Sloet van Oldruitenborgh-Oosterbaan, M M
2013-05-01
Mounted police horses have to cope with challenging, unpredictable situations when on duty and it is essential to gain insight into how these horses handle stress to warrant their welfare. The aim of the study was to evaluate physiological and behavioral responses of 12 (six experienced and six inexperienced) police horses during police training. Horses were evaluated during four test settings at three time points over a 7-week period: outdoor track test, street track test, indoor arena test and smoke machine test. Heart rate (HR; beats/min), HR variability (HRV; root means square of successive differences; ms), behavior score (BS; scores 0 to 5) and standard police performance score (PPS; scores 1 to 0) were obtained per test. All data were statistically evaluated using a linear mixed model (Akaike's Information criterium; t > 2.00) or logistic regression (P < 0.05). HR of horses was increased at indoor arena test (98 ± 26) and smoke machine test (107 ± 25) compared with outdoor track (80 ± 12, t = 2.83 and t = 3.91, respectively) and street track tests (81 ± 14, t = 2.48 and t = 3.52, respectively). HRV of horses at the indoor arena test (42.4 ± 50.2) was significantly lower compared with street track test (85.7 ± 94.3 and t = 2.78). BS did not show significant differences between tests and HR of horses was not always correlated with the observed moderate behavioral responses. HR, HRV, PPS and BS did not differ between repetition of tests and there were no significant differences in any of the four tests between experienced and inexperienced horses. No habituation occurred during the test weeks, and experience as a police horse does not seem to be a key factor in how these horses handle stress. All horses showed only modest behavioral responses, and HR may provide complimentary information for individual evaluation and welfare assessment of these horses. Overall, little evidence of stress was observed during these police training tests. As three of these tests (excluding the indoor arena test) reflect normal police work, it is suggested that this kind of police work is not significantly stressful for horses and will have no negative impact on the horse's welfare.
Kellis, Eleftherios; Ellinoudis, Athanasios; Kofotolis, Nikolaos
2015-06-01
Although the straight leg raise (SLR) test frequently is used to assess hamstring extensibility in individuals with low back pain (LBP), evidence relating LBP, SLR, and hamstring extensibility remains unclear. The SLR measures the angle between the lifted leg and the horizontal, however, and, as such, it is not a direct measure of the elongation capacity of the hamstrings. To examine the differences in hamstring elongation (quantified via ultrasonography) and SLR score between individuals with LBP and asymptomatic controls and to determine the relationship between hamstring elongation, SLR, and functional disability scores. Cross-sectional study. University laboratory. Forty men and women with chronic LBP (mean ± SD, age 43.51 ± 3.71 years and 40 control subjects (age 45.11 ± 4.01 years) participated in this study. Passive SLR, elongation assessed via ultrasonography, and functional disability. SLR score, elongation of tendinous tissue within the semitendinosus muscle, and Oswestry Disability Index. Two-way analysis of variance tests indicated a significantly lower SLR score and a greater Oswestry score in LBP group compared with control subjects (P < .05). In contrast, there were no significant group differences in hamstring elongation (P > .05). Gender did not have an effect on all dependent measures (P > .05). Hamstring elongation showed a low correlation with SLR score and a minimal correlation with Oswestry score. These results indicate that the SLR score is not determined by hamstring elongation (quantified via ultrasonography). Copyright © 2015 American Academy of Physical Medicine and Rehabilitation. Published by Elsevier Inc. All rights reserved.
Serin, Gürdeniz; Karabulut, Gonca; Kabasakal, Yasemin; Kandiloğlu, Gülşen; Akalin, Taner
2016-01-01
Minor salivary gland biopsy is one of the objective tests used in the diagnosis of Sjögren syndrome. The aim of our study was to compare the clinical and laboratory data of primary and secondary Sjögren syndrome cases with a lymphocyte score 3 and 4 in the minor salivary gland biopsy. Data from a total of 2346 consecutive minor salivary gland biopsies were retrospectively evaluated in this study. Clinical and autoantibody characteristics of 367 cases with lymphocyte score 3 or 4 and diagnosed with primary or secondary Sjögren syndrome were compared. There was no difference between lymphocyte score 3 and 4 primary Sjögren syndrome patients in terms of dry mouth, dry eye symptoms and Schirmer test results but Anti-Ro and Antinuclear Antibody positivity was statistically significantly higher in cases with lymphocyte score 4 (p= 0.025, p= 0.001). Anti-Ro test results were also found to be statistically significantly higher in secondary Sjögren syndrome patients with lymphocyte score 4 (p= 0.048). In this study, the high proportion of cases with negative autoantibody but positive lymphocyte score is significant in terms of showing the contribution of minor salivary gland biopsy to Sjögren syndrome diagnosis. Lymphocyte score 3 and 4 cases were found to have similar clinical findings but a difference regarding antibody positivity in primary Sjögren syndrome. We believe that cases with lymphocyte score 4 may be Sjögren syndrome cases whose clinical manifestations are relatively established and higher autoantibody levels are therefore found.
New graduate students' baseline knowledge of the responsible conduct of research.
Heitman, Elizabeth; Olsen, Cara H; Anestidou, Lida; Bulger, Ruth Ellen
2007-09-01
To assess (1) new biomedical science graduate students' baseline knowledge of core concepts and standards in responsible conduct of research (RCR), (2) differences in graduate students' baseline knowledge overall and across the Office of Research Integrity's nine core areas, and (3) demographic and educational factors in these differences. A 30-question, computer-scored multiple-choice test on core concepts and standards of RCR was developed following content analysis of 20 United States-published RCR texts, and combined with demographic questions on undergraduate experience with RCR developed from graduate student focus groups. Four hundred two new graduate students at three health science universities were recruited for Scantron and online testing before beginning RCR instruction. Two hundred fifty-one of 402 eligible trainees (62%) at three universities completed the test; scores ranged from 26.7% to 83.3%, with a mean of 59.5%. Only seven (3%) participants scored 80% or above. Students who received their undergraduate education outside the United States scored significantly lower (mean 52.0%) than those with U.S. bachelor's degrees (mean 60.5%, P < .001). Participants with prior graduate biomedical or health professions education scored marginally higher than new students, but both groups' mean scores were well below 80%. The mean score of 16 participants who reported previous graduate-level RCR instruction was 67.7%. Participants' specific knowledge varied, but overall scores were universally low. New graduate biomedical sciences students have inadequate and inconsistent knowledge of RCR, irrespective of their prior education or experience. Incoming trainees with previous graduate RCR education may also have gaps in core knowledge.
Rescorla, Leslie; Ivanova, Masha Y; Achenbach, Thomas M; Begovac, Ivan; Chahed, Myriam; Drugli, May Britt; Emerich, Deisy Ribas; Fung, Daniel S S; Haider, Mariam; Hansson, Kjell; Hewitt, Nohelia; Jaimes, Stefanny; Larsson, Bo; Maggiolini, Alfio; Marković, Jasminka; Mitrović, Dragan; Moreira, Paulo; Oliveira, João Tiago; Olsson, Martin; Ooi, Yoon Phaik; Petot, Djaouida; Pisa, Cecilia; Pomalima, Rolando; da Rocha, Marina Monzani; Rudan, Vlasta; Sekulić, Slobodan; Shahini, Mimoza; de Mattos Silvares, Edwiges Ferreira; Szirovicza, Lajos; Valverde, José; Vera, Luis Anderssen; Villa, Maria Clara; Viola, Laura; Woo, Bernardine S C; Zhang, Eugene Yuqing
2012-12-01
To build on Achenbach, Rescorla, and Ivanova (2012) by (a) reporting new international findings for parent, teacher, and self-ratings on the Child Behavior Checklist, Youth Self-Report, and Teacher's Report Form; (b) testing the fit of syndrome models to new data from 17 societies, including previously underrepresented regions; (c) testing effects of society, gender, and age in 44 societies by integrating new and previous data; (d) testing cross-society correlations between mean item ratings; (e) describing the construction of multisociety norms; (f) illustrating clinical applications. Confirmatory factor analyses (CFAs) of parent, teacher, and self-ratings, performed separately for each society; tests of societal, gender, and age effects on dimensional syndrome scales, DSM-oriented scales, Internalizing, Externalizing, and Total Problems scales; tests of agreement between low, medium, and high ratings of problem items across societies. CFAs supported the tested syndrome models in all societies according to the primary fit index (Root Mean Square Error of Approximation [RMSEA]), but less consistently according to other indices; effect sizes were small-to-medium for societal differences in scale scores, but very small for gender, age, and interactions with society; items received similarly low, medium, or high ratings in different societies; problem scores from 44 societies fit three sets of multisociety norms. Statistically derived syndrome models fit parent, teacher, and self-ratings when tested individually in all 44 societies according to RMSEAs (but less consistently according to other indices). Small to medium differences in scale scores among societies supported the use of low-, medium-, and high-scoring norms in clinical assessment of individual children. Copyright © 2012 American Academy of Child and Adolescent Psychiatry. Published by Elsevier Inc. All rights reserved.
Kaido, Minako; Ishida, Reiko; Dogru, Murat; Tsubota, Kazuo
2011-09-01
To investigate the relation of functional visual acuity (FVA) measurements with dry eye test parameters and to compare the testing methods with and without blink suppression and anesthetic instillation. A prospective comparative case series. Thirty right eyes of 30 dry eye patients and 25 right eyes of 25 normal subjects seen at Keio University School of Medicine, Department of Ophthalmology were studied. FVA testing was performed using a FVA measurement system with two different approaches, one in which measurements were made under natural blinking conditions without topical anesthesia (FVA-N) and the other in which the measurements were made under the blink suppression condition with topical anesthetic eye drops (FVA-BS). Tear function examinations, such as the Schirmer test, tear film break-up time, and fluorescein and Rose Bengal vital staining as ocular surface evaluation, were performed. The mean logMAR FVA-N scores and logMAR Landolt visual acuity scores were significantly lower in the dry eye subjects than in the healthy controls (p < 0.05), while there were no statistical differences between the logMAR FVA-BS scores of the dry eye subjects and those of the healthy controls. There was a significant correlation between the logMAR Landolt visual acuities and the logMAR FVA-N and logMAR FVA-BS scores. The FVA-N scores correlated significantly with tear quantities, tear stability and, especially, the ocular surface vital staining scores. FVA measurements performed under natural blinking significantly reflected the tear functions and ocular surface status of the eye and would appear to be a reliable method of FVA testing. FVA measurement is also an accurate predictor of dry eye status.
Wang, L Y; Peng, H; Huang, W N; Gao, B
2016-04-20
Objective: This study was designed to observe the dizziness handicap inventory (DHI) scores in patients with BPPV (benign paroxysmal positional vertigo) before and after maneuver repositioning and aimed to discuss the values of DHI scores in the diagnosing and treatment of BPPV. Method: Charts of 72 patients with BPPV diagnosed by positioning test were reviewed. Four DHI scores were used including the total score (DHIT), the functional score (DHIF), the emotional score (DHIE), and the physical score (DHIP). We compared the pre-repositioning DHI scores and post-repositioning scores of patients, and also compared the DHI scores of patients with and without residual dizziness. Result: All of the 72 patients were underwent maneuver repositioning and recorded the DHI scores. The mean post-repositioning scores were dramatically decreased compared with pre-repositioning scores, and the difference was significant ( P <0.01). The differences of the DHIP scores between the residual dizziness group and the non-residual dizziness group was not significant, while the DHIF scores, the DHIE scores and the DHIT scores between the two groups were statistically different. Conclusion: After maneuver repositioning the dizziness handicap of BPPV patients could be significantly improved. The next treatment program for residual dizziness patients after successful repositioning could be aimed at the functional and emotional dizziness. Copyright© by the Editorial Department of Journal of Clinical Otorhinolaryngology Head and Neck Surgery.
D'Eon, Marcel F
2006-01-01
Background Many senior undergraduate students from the University of Saskatchewan indicated informally that they did not remember much from their first year courses and wondered why we were teaching content that did not seem relevant to later clinical work or studies. To determine the extent of the problem a course evaluation study that measured the knowledge loss of medical students on selected first year courses was conducted. This study replicates previous memory decrement studies with three first year medicine basic science courses, something that was not found in the literature. It was expected that some courses would show more and some courses would show less knowledge loss. Methods In the spring of 2004 over 20 students were recruited to retake questions from three first year courses: Immunology, physiology, and neuroanatomy. Student scores on the selected questions at the time of the final examination in May 2003 (the 'test') were compared with their scores on the questions 10 or 11 months later (the 're-test') using paired samples t -tests. A repeated-measures MANOVA was used to compare the test and re-test scores among the three courses. The re-test scores were matched with the overall student ratings of the courses and the student scores on the May 2003 examinations. Results A statistically significant main effect of knowledge loss (F = 297.385; p < .001) and an interaction effect by course (F = 46.081; p < .001) were found. The students' scores in the Immunology course dropped 13.1%, 46.5% in Neuroanatomy, and 16.1% in physiology. Bonferroni post hoc comparisons showed a significant difference between Neuroanatomy and Physiology (mean difference of 10.7, p = .004). Conclusion There was considerable knowledge loss among medical students in the three basic science courses tested and this loss was not uniform across courses. Knowledge loss does not seem to be related to the marks on the final examination or the assessment of course quality by the students. PMID:16412241
Aşkar, Petek; Altun, Arif; Cangöz, Banu; Cevik, Vildan; Kaya, Galip; Türksoy, Hasan
2012-04-01
The purpose of this study was to assess whether a computerized battery of neuropsychological tests could produce similar results as the conventional forms. Comparisons on 77 volunteer undergraduates were carried out with two neuropsychological tests: Line Orientation Test and Enhanced Cued Recall Test. Firstly, students were assigned randomly across the test medium (paper-and-pencil versus computerized). Secondly, the groups were given the same test in the other medium after a 30-day interval between tests. Results showed that the Enhanced Cued Recall Test-Computer-based did not correlate with the Enhanced Cued Recall Test-Paper-and-pencil results. Line Orientation Test-Computer-based scores, on the other hand, did correlate significantly with the Line Orientation Test-Paper-and-pencil version. In both tests, scores were higher on paper-and-pencil tests compared to computer-based tests. Total score difference between modalities was statistically significant for both Enhanced Cued Recall Tests and for the Line Orientation Test. In both computer-based tests, it took less time for participants to complete the tests.
Ries, Julie D; Echternach, John L; Nof, Leah; Gagnon Blodgett, Michelle
2009-06-01
With the increasing incidence of Alzheimer disease (AD), determining the validity and reliability of outcome measures for people with this disease is necessary. The goals of this study were to assess test-retest reliability of data for the Timed "Up & Go" Test (TUG), the Six-Minute Walk Test (6MWT), and gait speed and to calculate minimal detectable change (MDC) scores for each outcome measure. Performance differences between groups with mild to moderate AD and moderately severe to severe AD (as determined by the Functional Assessment Staging [FAST] scale) were studied. This was a prospective, nonexperimental, descriptive methodological study. Background data collected for 51 people with AD included: use of an assistive device, Mini-Mental Status Examination scores, and FAST scale scores. Each participant engaged in 2 test sessions, separated by a 30- to 60-minute rest period, which included 2 TUG trials, 1 6MWT trial, and 2 gait speed trials using a computerized gait assessment system. A specific cuing protocol was followed to achieve optimal performance during test sessions. Test-retest reliability values for the TUG, the 6MWT, and gait speed were high for all participants together and for the mild to moderate AD and moderately severe to severe AD groups separately (intraclass correlation coefficients > or = .973); however, individual variability of performance also was high. Calculated MDC scores at the 90% confidence interval were: TUG=4.09 seconds, 6MWT=33.5 m (110 ft), and gait speed=9.4 cm/s. The 2 groups were significantly different in performance of clinical tests, with the participants who were more cognitively impaired being more physically and functionally impaired. A single researcher for data collection limited sample numbers and prohibited blinding to dementia level. The TUG, the 6MWT, and gait speed are reliable outcome measures for use with people with AD, recognizing that individual variability of performance is high. Minimal detectable change scores at the 90% confidence interval can be used to assess change in performance over time and the impact of treatment.
The effects of academic grouping on student performance in science
NASA Astrophysics Data System (ADS)
Scoggins, Sally Smykla
The current action research study explored how student placement in heterogeneous or homogeneous classes in seventh-grade science affected students' eighth-grade Science State of Texas Assessment of Academic Readiness (STAAR) scores, and how ability grouping affected students' scores based on race and socioeconomic status. The population included all eighth-grade students in the target district who took the regular eighth-grade science STAAR over four academic school years. The researcher ran three statistical tests: a t-test for independent samples, a one-way between subjects analysis of variance (ANOVA) and a two-way between subjects ANOVA. The results showed no statistically significant difference between eighth-grade Pre-AP students from seventh-grade Pre-AP classes and eighth-grade Pre-AP students from heterogeneous seventh-grade classes and no statistically significant difference between Pre-AP students' scores based on socioeconomic status. There was no statistically significant interaction between socioeconomic status and the seventh-grade science classes. The scores between regular eighth-grade students who were in heterogeneous seventh-grade classes were statistically significantly higher than the scores of regular eighth-grade students who were in regular seventh-grade classes. The results also revealed that the scores of students who were White were statistically significantly higher than the scores of students who were Black and Hispanic. Black and Hispanic scores did not differ significantly. Further results indicated that the STAAR Level II and Level III scores were statistically significantly higher for the Pre-AP eighth-grade students who were in heterogeneous seventh-grade classes than the STAAR Level II and Level III scores of Pre-AP eighth-grade students who were in Pre-AP seventh-grade classes.
ERIC Educational Resources Information Center
Meir, Rudi; Newton, Robert; Curtis, Edgar; Fardell, Matthew; Butler, Benjamin
2001-01-01
Australian and English professional rugby players completed various physical fitness performance tests to determine differences when grouping players into three different rugby positional categories. Results found minimal differences in test scores on the basis of players' specific positions on a team, however, when players were grouped according…
Eslamian, Ladan; Borzabadi-Farahani, Ali; Gholami, Hadi
2016-05-01
To compare the analgesic effect of topical benzocaine (5%) and ketoprofen (1.60 mg/mL) after 2 mm activation of 7 mm long delta loops used for maxillary en-masse orthodontic space closure. Twenty patients (seven males, 13 females, 15-25 years of age, mean age of 19.5 years) participated in a randomised crossover, double-blind trial. After appliance activation, participants were instructed to use analgesic gels and record pain perception at 2, 6, 24 hours and 2, 3 and 7 days (at 18.00 hrs), using a visual analogue scale ruler (VAS, 0-4). Each patient received all three gels (benzocaine, ketoprofen, and a control (placebo)) randomly, but at three different appliance activation visits following a wash-over gap of one month. After the first day, the patients were instructed to repeat gel application twice a day at 10:00 and 18:00 hrs for three days. The recorded pain scores were subjected to non-parametric analysis. The highest pain was recorded at 2 and 6 hours. Pain scores were significantly different between the three groups (Kruskal-Wallis test, p < 0.01). The overall mean (SD) pain scores for the benzocaine 5%, ketoprofen, and control (placebo) groups were 0.89 (0.41), 0.68 (0.34), and 1.15 (0.81), respectively. The pain scores were significantly different between the ketoprofen and control groups (mean difference = 0.47, p = 0.005). All groups demonstrated significant differences in pain scores at the six different time intervals (p < 0.05) and there was no gender difference (p > 0.05). A significant pain reduction was observed following the use of ketoprofen when tested against a control gel (placebo). The highest pain scores were experienced in patients administered the placebo and the lowest scores in patients who applied ketoprofen gel. Benzocaine had an effect mid-way between ketoprofen and the placebo. The highest pain scores were recorded 2 hours following force application, which decreased to the lowest scores after 7 days.
Allam, Eman; Ghoneima, Ahmed; Tholpady, Sunil S; Kula, Katherine
2018-06-19
The aim of this study was to determine whether molar incisor hypomineralization (MIH) is greater in patients with cleft lip and palate (CLP) who underwent primary alveolar grafting (PAG) as compared with CLP waiting for secondary alveolar grafting (SAG) and with controls. A retrospective analysis of intraoral photographs of 13 CLP patients who underwent a PAG, 28 CLP prior to SAG, and 60 controls without CLP was performed. Mantel-Haenszel χ tests were used to compare the 3 groups for differences in MIH scores, and Wilcoxon rank sum tests were used to compare the groups for differences in average MIH scores. A 5% significance level was used for all tests. Molar incisor hypomineralization scores were significantly higher for the PAG and SAG groups compared with the control group (P < 0.001). The PAG group had significantly higher incisor MIH (P = 0.016) compared with the SAG group. Molar incisor hypomineralization average scores were significantly higher for the 2 graft groups compared with the controls (P < 0.0001). The PAG group had significantly higher average MIH score and average MIH score for incisors compared with the SAG group (P = 0.03). Cleft lip and palate patients have significantly greater MIH compared with controls, and CLP patients with PAGs have significantly greater MIH in the incisor region compared with CLP patients with SAGs, indicating that subjects with PAGs have more severely affected dentition.
Changiz, Tahereh; Haghani, Fariba; Nowroozi, Nasim
2013-01-01
Appropriate instructional design plays a crucial role in e-learning success, and analyzing learners is the cornerstone for instructional design process. Students' readiness for e-learning was assessed in the present study as an example of learner analysis for a distance course in medical education master program. A census sample of 23 students applied for distance master program on medical education, completed the "Students' E-Learning Readiness Scale" developed by Watkins, via email. The reliability and validity of the scale has been confirmed before. Average scores in total and 6 subscales were calculated. The score range was 1-5 and scores above 3 indicated good readiness. Data was interpreted using descriptive and non-parametric tests (Mann-Whitney U and Kruskal-Wallis). Response rate was 100%. The students' readiness scores in total and all subscales ("technology access", "online skills and relationships", "motivation", "online audio/video", "readiness for online discussions", and "importance of e-learning to your success") were above 3. Comparing different subscales, students' mean scores in "motivation" and "internet discussion" subscales were less than others, although the difference was not significant. There were no significant gender differences in the readiness scores. Students who were academic staff had significantly higher scores than others in total and in "motivation" and "online skills and relationship" subscales. Good learners' readiness, observed in the present study, may imply that the instructional designer can rely on e-learning strategies and build the course upon them. However, according to the slightly lower scores in "motivation" and "online discussion" subscales, it is recommended to stress more on strategies that improve these two components. To generalize the results, it is needed to test students' readiness in more different degree programs.
Glaviano, Neal R; Benson, Shari; Goodkin, Howard P; Broshek, Donna K; Saliba, Susan
2015-07-01
To compare baseline scores of middle and high school students on the Sport Concussion Assessment Tool 2 (SCAT2) by sex and age. Cross-sectional study. Single private school athletic program. Three hundred sixty-one middle and high school student-athletes. Preseason SCAT2 was administered to student-athletes before athletic participation. Total SCAT2 score, symptoms, symptom severity, Glasgow coma scale, modified Balance Error Scoring System (BESS), coordination, and Standardized Assessment of Concussion (SAC) with subsections: Orientation, Immediate Memory, Concentration, and Delayed Recall. No differences were found in total SCAT2 scores between sex (P = 0.463) or age (P = 0.21). Differences were found in subcomponents of the SCAT2. Twelve year olds had significantly lower concentration scores (3.3 ± 1.2) than 15 and 18 year olds (3.9 ± 1.0 and 4.2 ± 1.0, respectively). The 12 year olds also had the lowest percentage of correct responses for the SAC's concentration 5-digit (46%), 6-digit (21%), and months' backward (67%) tasks. Females presented with more symptoms (20.0 ± 2.2 vs. 20.6 ± 2.1 P = 0.007) better immediate memory (14.6 ± 0.9 vs. 14.3 ± 1.0, P = 0.022) and better BESS scores (27.2 ± 2.3 vs. 26.6 ± 2.6, P = 0.043) than their male counterparts. Normative values for total SCAT2 and subscale scores show differences in concentration between ages, whereas symptoms, BESS, and immediate memory differed between sexes. We also found that 12 year olds have increased difficultly with the advanced concentration tasks, which lends support to the development of a separate instrument, such as the Child-SCAT3. The presence of developmental differences in the younger age groups suggests the need for annual baseline testing. Subtle differences between age and sex have been identified in many components of the SCAT2 assessment. These differences may support the current evolution of concussion assessment tools to provide the most appropriate test. Baseline testing should be used when available, and clinicians should be aware of potential differences when using normalized values.
Deng, Nina; Anatchkova, Milena D; Waring, Molly E; Han, Kyung T; Ware, John E
2015-08-01
The Quality-of-life (QOL) Disease Impact Scale (QDIS(®)) standardizes the content and scoring of QOL impact attributed to different diseases using item response theory (IRT). This study examined the IRT invariance of the QDIS-standardized IRT parameters in an independent sample. The differential functioning of items and test (DFIT) of a static short-form (QDIS-7) was examined across two independent sources: patients hospitalized for acute coronary syndrome (ACS) in the TRACE-CORE study (N = 1,544) and chronically ill US adults in the QDIS standardization sample. "ACS-specific" IRT item parameters were calibrated and linearly transformed to compare to "standardized" IRT item parameters. Differences in IRT model-expected item, scale and theta scores were examined. The DFIT results were also compared in a standard logistic regression differential item functioning analysis. Item parameters estimated in the ACS sample showed lower discrimination parameters than the standardized discrimination parameters, but only small differences were found for thresholds parameters. In DFIT, results on the non-compensatory differential item functioning index (range 0.005-0.074) were all below the threshold of 0.096. Item differences were further canceled out at the scale level. IRT-based theta scores for ACS patients using standardized and ACS-specific item parameters were highly correlated (r = 0.995, root-mean-square difference = 0.09). Using standardized item parameters, ACS patients scored one-half standard deviation higher (indicating greater QOL impact) compared to chronically ill adults in the standardization sample. The study showed sufficient IRT invariance to warrant the use of standardized IRT scoring of QDIS-7 for studies comparing the QOL impact attributed to acute coronary disease and other chronic conditions.
Van Damme, Benedicte; Stevens, Veerle; Van Tiggelen, Damien; Perneel, Christiaan; Crombez, Geert; Danneels, Lieven
2014-10-01
The influence of psychosocial components on back and abdominal endurance tests in patients with persistent non-specific low back pain should be investigated to ensure the correct interpretation of these measures. Three-hundred and thirty-two patients (291 men and 41 women) from 19 to 63years performed an abdominal and back muscle endurance test after completing some psychosocial questionnaires. During the endurance tests, surface electromyography signals of the internal obliques, the external obliques, the lumbar multifidus and the iliocostalis were recorded. Patients were dichotomized as underperformers and good performers, by comparing their real endurance time, to the expected time of endurance derived from the normalized median frequency slope. Independent t-tests were performed to examine the differences on the outcome of the questionnaires. In the back muscle endurance test, the underperformers had significantly lower (p<0.05) scores on some of the physical subscales of the SF-36. The underperformers group of the AE test scored significantly higher on the DRAM MZDI (p=0.018) and on the PCS scale (p=0.020) and showed also significantly lower scores on the SF-36 (p<0.05). Back muscle endurance tests are influenced by physical components, while abdominal endurance tests seem influenced by psychosocial components. Copyright © 2014 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Bartik, Timothy J.; Gormley, William; Adelstein, Shirley
2011-01-01
This paper estimates future adult earnings effects associated with a universal pre-K program in Tulsa, Oklahoma. These informed projections help to compensate for the lack of long-term data on universal pre-K programs, while using metrics that relate test scores to valued social benefits. Combining test-score data from the fall of 2006 and recent…
ERIC Educational Resources Information Center
Freund, Philipp Alexander; Holling, Heinz
2011-01-01
The interpretation of retest scores is problematic because they are potentially affected by measurement and predictive bias, which impact construct validity, and because their size differs as a function of various factors. This paper investigates the construct stability of scores on a figural matrices test and models retest effects at the level of…
ERIC Educational Resources Information Center
Mery,Yvonne; Newby, Jill; Peng, Ke
2012-01-01
This paper reports on a research project that examined the test scores of students who took part in an online information literacy course. Researchers analyzed the pre- and post-test scores of students who received different types of instruction including a traditional one-shot library session and an online course. Results show that students who…
ERIC Educational Resources Information Center
Pucel, David J.; And Others
Using post-secondary vocational education students as the populations, these two sub-studies of the Project MINI-SCORE sought to determine the extent to which pre-enrollment standardized test data can be used to predict vocational success. For the purpose of the study, vocational success was defined either as successful graduation or successful…
ERIC Educational Resources Information Center
Moses, Tim; Liu, Jinghua
2011-01-01
In equating research and practice, equating functions that are smooth are typically assumed to be more accurate than equating functions with irregularities. This assumption presumes that population test score distributions are relatively smooth. In this study, two examples were used to reconsider common beliefs about smoothing and equating. The…
Son, Eun Jin; Lee, Dong-Hee; Oh, Jeong-Hoon; Seo, Jae-Hyun; Jeon, Eun-Ju
2015-01-01
The dizziness handicap inventory (DHI) is widely used to evaluate self-perceived handicap due to dizziness, and is known to correlate with vestibular function tests in chronic dizziness. However, whether DHI reflects subjective symptoms during the acute phase has not been studied. This study aims to investigate the correlations of subjective and objective measurements to highlight parameters that reflect the severity of dizziness during the first week of acute unilateral vestibulopathy. Thirty-seven patients with acute unilateral vestibulopathy were examined. Patients' subjective perceptions of dizziness were measured using the DHI, Vertigo Visual Analog Scale (VVAS), Disability Scale (DS), and Activity-Specific Balance Scale (ABC). Additionally, the oculomotor tests, Romberg and sharpened Romberg tests, functional reach test, and dynamic visual acuity tests were performed. The correlation between the DHI and other tests was evaluated. DHI-total scores exhibited a moderately positive correlation with VVAS and DS, and a moderately negative correlation with ABC. However, DHI-total score did not correlate with results of the Romberg, sharpened Romberg, or functional reach tests. When compared among four groups divided according to DHI scores, VVAS and DS scores exhibited statistically significant differences, but no significant differences were detected for other test results. Our findings revealed that the DHI correlated significantly with self-perceived symptoms measured by VVAS and DS, but not ABC. There was no significant correlation with other balance function tests during the first week of acute vestibulopathy. The results suggest that DHI, VVAS and DS may be more useful to measure the severity of acute dizziness symptoms. Copyright © 2015 Elsevier Inc. All rights reserved.
Pallett, Edward; Rentowl, Patricia; Hanning, Christopher
2009-09-01
An Electronic Portable Information Collection audio device (EPIC-Vox) has been developed to deliver questionnaires in spoken word format via headphones. Patients respond by pressing buttons on the device. The aims of this study were to determine limits of agreement between, and test-retest reliability of audio (A) and paper (P) versions of the Brief Fatigue Inventory (BFI). Two hundred sixty outpatients (204 male, mean age 55.7 years) attending a sleep disorders clinic were allocated to four groups using block randomization. All completed the BFI twice, separated by a one-minute distracter task. Half the patients completed paper and audio versions, then an evaluation questionnaire. The remainder completed either paper or audio versions to compare test-retest reliability. BFI global scores were analyzed using Bland-Altman methodology. Agreement between categorical fatigue severity scores was determined using Cohen's kappa. The mean (SD) difference between paper and audio scores was -0.04 (0.48). The limits of agreement (mean difference+/-2SD) were -0.93 to +1.00. Test-retest reliability of the paper BFI showed a mean (SD) difference of 0.17 (0.32) between first and second presentations (limits -0.46 to +0.81). For audio, the mean (SD) difference was 0.17 (0.48) (limits -0.79 to +1.14). For agreement between categorical scores, Cohen's kappa=0.73 for P and A, 0.67 (P at test and retest) and 0.87 (A at test and retest). Evaluation preferences (n=128): 36.7% audio; 18.0% paper; and 45.3% no preference. A total of 99.2% found EPIC-Vox "easy to use." These data demonstrate that the English audio version of the BFI provides an acceptable alternative to the paper questionnaire.
Chen, Shi; Pan, Zhouxian; Wu, Yanyan; Gu, Zhaoqi; Li, Man; Liang, Ze; Zhu, Huijuan; Yao, Yong; Shui, Wuyang; Shen, Zhen; Zhao, Jun; Pan, Hui
2017-04-03
Three-dimensional (3D) printed models represent educational tools of high quality compared with traditional teaching aids. Colored skull models were produced by 3D printing technology. A randomized controlled trial (RCT) was conducted to compare the learning efficiency of 3D printed skulls with that of cadaveric skulls and atlas. Seventy-nine medical students, who never studied anatomy, were randomized into three groups by drawing lots, using 3D printed skulls, cadaveric skulls, and atlas, respectively, to study the anatomical structures in skull through an introductory lecture and small group discussions. All students completed identical tests, which composed of a theory test and a lab test, before and after a lecture. Pre-test scores showed no differences between the three groups. In post-test, the 3D group was better than the other two groups in total score (cadaver: 29.5 [IQR: 25-33], 3D: 31.5 [IQR: 29-36], atlas: 27.75 [IQR: 24.125-32]; p = 0.044) and scores of lab test (cadaver: 14 [IQR: 10.5-18], 3D: 16.5 [IQR: 14.375-21.625], atlas: 14.5 [IQR: 10-18.125]; p = 0.049). Scores involving theory test, however, showed no difference between the three groups. In this RCT, an inexpensive, precise and rapidly-produced skull model had advantages in assisting anatomy study, especially in structure recognition, compared with traditional education materials.
Atay, Selma; Karabacak, Ukke
2012-06-01
It is expected that nursing education improves abilities of students in solving problems, decision making and critical thinking in different circumstances. This study was performed to analyse the effects of care plans prepared using concept maps on the critical thinking dispositions of students. An experimental group and a control group were made up of a total of 80 freshman and sophomore students from the nursing department of a health school. The study used a pre-test post-test control group design. The critical thinking dispositions of the groups were measured using the California Critical Thinking Disposition Inventory. In addition, the care plans prepared by the experimental group students were evaluated using the criteria for evaluating care plans with concept maps. T-test was used in analysing the data. The results showed that there were no statistically significant differences in the total and sub-scale pre-test scores between the experimental group and control group students. There were also significant differences in the total and sub-scale post-test scores between the experimental group and control group students. There were significant differences between concept map care plan evaluation criteria mean scores of the experimental students. In the light of these findings, it could be argued that the concept mapping strategy improves critical thinking skills of students. © 2012 Blackwell Publishing Asia Pty Ltd.
Emotional intelligence among nursing students: Findings from a cross-sectional study.
Štiglic, Gregor; Cilar, Leona; Novak, Žiga; Vrbnjak, Dominika; Stenhouse, Rosie; Snowden, Austyn; Pajnkihar, Majda
2018-07-01
Emotional intelligence in nursing is of global interest. International studies identify that emotional intelligence influences nurses' work and relationships with patients. It is associated with compassion and care. Nursing students scored higher on measures of emotional intelligence compared to students of other study programmes. The level of emotional intelligence increases with age and tends to be higher in women. This study aims to measure the differences in emotional intelligence between nursing students with previous caring experience and those without; to examine the effects of gender on emotional intelligence scores; and to test whether nursing students score higher than engineering colleagues on emotional intelligence measures. A cross-sectional descriptive study design was used. The study included 113 nursing and 104 engineering students at the beginning of their first year of study at a university in Slovenia. Emotional intelligence was measured using the Trait Emotional Intelligence Questionnaire (TEIQue) and Schutte Self Report Emotional Intelligence Test (SSEIT). Shapiro-Wilk's test of normality was used to test the sample distribution, while the differences in mean values were tested using Student t-test of independent samples. Emotional intelligence was higher in nursing students (n = 113) than engineering students (n = 104) in both measures [TEIQue t = 3.972; p < 0.001; SSEIT t = 8.288; p < 0.001]. Although nursing female students achieved higher emotional intelligence scores than male students on both measures, the difference was not statistically significant [TEIQue t = -0.839; p = 0.403; SSEIT t = -1.159; p = 0.249]. EI scores in nursing students with previous caring experience were not higher compared to students without such experience for any measure [TEIQue t = -1.633; p = 0.105; SSEIT t = -0.595; p = 0.553]. Emotional intelligence was higher in nursing than engineering students, and slightly higher in women than men. It was not associated with previous caring experience. Copyright © 2018 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Tatsuoka, Kikumi K.; Baillie, Robert
A 40-item free response test on signed-number subtraction was administered to 172 eighth graders. Their responses are viewed as consisting of two different components, the sign and absolute value. Each component is scored zero for wrong or one for correct, yielding a score of one only when both components have scores of one. By taking the values…
PKMζ Differentially Utilized between Sexes for Remote Long-Term Spatial Memory
Sebastian, Veronica; Vergel, Tatyana; Baig, Raheela; Schrott, Lisa M.; Serrano, Peter A.
2013-01-01
It is well established that male rats have an advantage in acquiring place-learning strategies, allowing them to learn spatial tasks more readily than female rats. However many of these differences have been examined solely during acquisition or in 24h memory retention. Here, we investigated whether sex differences exist in remote long-term memory, lasting 30d after training, and whether there are differences in the expression pattern of molecular markers associated with long-term memory maintenance. Specifically, we analyzed the expression of protein kinase M zeta (PKMζ) and the α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid (AMPA) receptor subunit GluA2. To adequately evaluate memory retention, we used a robust training protocol to attenuate sex differences in acquisition and found differential effects in memory retention 1d and 30d after training. Female cohorts tested for memory retention 1d after 60 training trials outperformed males by making significantly fewer reference memory errors at test. In contrast, male cohorts tested 30d after 60 training trials outperformed females of the same condition, making fewer reference memory errors and achieving significantly higher retention test scores. Furthermore, given 60 training trials, females tested 30d later showed significantly worse memory compared to females tested 1d later, while males tested 30d later did not differ from males tested 1d later. Together these data suggest that with robust training males do no retain spatial information as well as females do 24h post-training but maintain this spatial information for longer. Males also showed a significant increase in synaptic PKMζ expression and a positive correlation with retention test scores, while females did not. Interestingly, both sexes showed a positive correlation between retention test scores and synaptic GluA2 expression. Furthermore, the increased expression of synaptic PKMζ, associated with male memory but not with female memory, identifies another potential sex-mediated difference in memory processing. PMID:24244733
Hellemann, G S; Green, M F; Kern, R S; Sitarenios, G; Nuechterlein, K H
2017-10-01
Measures of social cognition are increasingly being applied to psychopathology, including studies of schizophrenia and other psychotic disorders. Tests of social cognition present unique challenges for international adaptations. The Mayer-Salovey-Caruso Emotional Intelligence Test, Managing Emotions Branch (MSCEIT-ME) is a commonly-used social cognition test that involves the evaluation of social scenarios presented in vignettes. This paper presents evaluations of translations of this test in six different languages based on representative samples from the relevant countries. The goal was to identify items from the MSCEIT-ME that show different response patterns across countries using indices of discrepancy and content validity criteria. An international version of the MSCEIT-ME scoring was developed that excludes items that showed undesirable properties across countries. We then confirmed that this new version had better performance (i.e. less discrepancy across regions) in international samples than the version based on the original norms. Additionally, it provides scores that are comparable to ratings based on local norms. This paper shows that it is possible to adapt complex social cognitive tasks so they can provide valid data across different cultural contexts.
The BioMedical Admissions Test for medical student selection: issues of fairness and bias.
Emery, Joanne L; Bell, John F; Vidal Rodeiro, Carmen L
2011-01-01
The BioMedical Admissions Test (BMAT) forms part of the undergraduate medical admission process at the University of Cambridge. The fairness of admissions tests is an important issue. Aims were to investigate the relationships between applicants' background variables and BMAT scores, whether they were offered a place or rejected and, for those admitted, performance on the first year course examinations. Multilevel regression models were employed with data from three combined applicant cohorts. Admission rates for different groups were investigated with and without controlling for BMAT performance. The fairness of the BMAT was investigated by determining, for those admitted, whether scores predicted examination performance equitably. Despite some differences in applicants' BMAT performance (e.g. by school type and gender), BMAT scores predicted mean examination marks equitably for all background variables considered. The probability of achieving a 1st class examination result, however, was slightly under-predicted for those admitted from schools and colleges entering relatively few applicants. Not all differences in admission rates were accounted for by BMAT performance. However, the test constitutes only one part of a compensatory admission system in which other factors, such as interview performance, are important considerations. Results are in support of the equity of the BMAT.
Frengopoulos, Courtney; Burley, Joshua; Viana, Ricardo; Payne, Michael W; Hunter, Susan W
2017-03-01
To determine whether scores on a cognitive measure are associated with walking endurance and functional mobility of individuals with transfemoral or transtibial amputations at discharge from inpatient prosthetic rehabilitation. Retrospective cohort study. Rehabilitation hospital. Consecutive admissions (N=176; mean age ± SD, 64.27±13.23y) with transfemoral or transtibial amputation that had data at admission and discharge from an inpatient prosthetic rehabilitation program. Not applicable. Cognitive status was assessed using the Montreal Cognitive Assessment (MoCA). The L Test and the 2-minute walk test (2MWT) were used to estimate functional mobility and walking endurance. The mean ± SD MoCA score was 24.05±4.09 (range, 6-30), and 56.3% of patients had scores <26. MoCA scores had a small positive correlation with the 2MWT (r=.29, P<.01), and a small negative correlation to the L Test (r=-.24, P<.01). In multivariable linear regression, compared with people with the highest MoCA score quartile, there was no difference on the 2MWT, but people in the lowest 2 quartiles took longer to complete the L Test. Cognitive impairment was very prevalent. The association between MoCA and functional mobility was statistically significant. These results highlight the potential for differences on complex motor tasks for individuals with cognitive impairment but does not indicate a need to exclude them from rehabilitation on the basis of cognitive impairment alone. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Description, measurement and evaluation of tertiary-education food environments.
Roy, R; Hebden, L; Kelly, B; De Gois, T; Ferrone, E M; Samrout, M; Vermont, S; Allman-Farinelli, M
2016-05-01
Obesity in young adults is an increasing health problem in Australia and many other countries. Evidence-based information is needed to guide interventions that reduce the obesity-promoting elements in tertiary-education environments. In a food environmental audit survey, 252 outlets were audited across seven institutions: three universities and four technical and further education institutions campuses. A scoring instrument called the food environment-quality index was developed and used to assess all food outlets on these campuses. Information was collated on the availability, accessibility and promotion of foods and beverages and a composite score (maximum score=148; higher score indicates healthier outlets) was calculated. Each outlet and the overall campus were ranked into tertiles based on their 'healthiness'. Differences in median scores for each outcome measure were compared between institutions and outlet types using one-way ANOVA with post hoc Scheffe's testing, χ 2 tests, Kruskal-Wallis H test and the Mann-Whitney U test. Binomial logistic regressions were used to compare the proportion of healthy v. unhealthy food categories across different types of outlets. Overall, the most frequently available items were sugar-sweetened beverages (20 % of all food/drink items) followed by chocolates (12 %), high-energy (>600 kJ/serve) foods (10 %), chips (10 %) and confectionery (10 %). Healthy food and beverages were observed to be less available, accessible and promoted than unhealthy options. The median score across all outlets was 72 (interquartile range=7). Tertiary-education food environments are dominated by high-energy, nutrient-poor foods and beverages. Interventions to decrease availability, accessibility and promotion of unhealthy foods are needed.
Test anxiety in mathematics among early undergraduate students in a British university in Malaysia
NASA Astrophysics Data System (ADS)
Karjanto, Natanael; Yong, Su Ting
2013-03-01
The level of test anxiety in mathematics subjects among early undergraduate students at the University of Nottingham Malaysia Campus is studied in this article. The sample consists of 206 students taking several mathematics modules who completed the questionnaires on test anxiety just before they entered the venue for midterm examinations. The sample data include the differences in the context of academic levels, gender groups and nationality backgrounds. The level of test anxiety in mathematics is measured using seven Likert questionnaire statements adapted from the Test Anxiety Inventory describing one's emotional feeling before the start of an examination. In general, the result shows that the students who had a lower score expectation were more anxious than those who had a higher score expectation, but that they obtained a better score than the expected score. In the context of academic levels, gender groups and nationality backgrounds, there were no significant correlations between the level of test anxiety and the students' academic performance. The effect size of the correlation values ranged from extremely small to moderate.
Peláez-García, Alberto; Yébenes, Laura; Berjón, Alberto; Angulo, Antonia; Zamora, Pilar; Sánchez-Méndez, José Ignacio; Espinosa, Enrique; Redondo, Andrés; Heredia-Soto, Victoria; Mendiola, Marta; Feliú, Jaime
2017-01-01
Purpose To compare the concordance in risk classification between the EndoPredict and the MammaPrint scores obtained for the same cancer samples on 40 estrogen-receptor positive/HER2-negative breast carcinomas. Methods Formalin-fixed, paraffin-embedded invasive breast carcinoma tissues that were previously analyzed with MammaPrint as part of routine care of the patients, and were classified as high-risk (20 patients) and low-risk (20 patients), were selected to be analyzed by the EndoPredict assay, a second generation gene expression test that combines expression of 8 genes (EP score) with two clinicopathological variables (tumor size and nodal status, EPclin score). Results The EP score classified 15 patients as low-risk and 25 patients as high-risk. EPclin re-classified 5 of the 25 EP high-risk patients into low-risk, resulting in a total of 20 high-risk and 20 low-risk tumors. EP score and MammaPrint score were significantly correlated (p = 0.008). Twelve of 20 samples classified as low-risk by MammaPrint were also low-risk by EP score (60%). 17 of 20 MammaPrint high-risk tumors were also high-risk by EP score. The overall concordance between EP score and MammaPrint was 72.5% (κ = 0.45, (95% CI, 0.182 to 0.718)). EPclin score also correlated with MammaPrint results (p = 0.004). Discrepancies between both tests occurred in 10 cases: 5 MammaPrint low-risk patients were classified as EPclin high-risk and 5 high-risk MammaPrint were classified as low-risk by EPclin and overall concordance of 75% (κ = 0.5, (95% CI, 0.232 to 0.768)). Conclusions This pilot study demonstrates a limited concordance between MammaPrint and EndoPredict. Differences in results could be explained by the inclusion of different gene sets in each platform, the use of different methodology, and the inclusion of clinicopathological parameters, such as tumor size and nodal status, in the EndoPredict test. PMID:28886093
Effects of ozone (O3) therapy on cisplatin-induced ototoxicity in rats.
Koçak, Hasan Emre; Taşkın, Ümit; Aydın, Salih; Oktay, Mehmet Faruk; Altınay, Serdar; Çelik, Duygu Sultan; Yücebaş, Kadir; Altaş, Bengül
2016-12-01
The aim of this study is to investigate the effect of rectal ozone and intratympanic ozone therapy on cisplatin-induced ototoxicity in rats. Eighteen female Wistar albino rats were included in our study. External auditory canal and tympanic membrane examinations were normal in all rats. The rats were randomly divided into three groups. Initially, all the rats were tested with distortion product otoacoustic emissions (DPOAE), and emissions were measured normally. All rats were injected with 5-mg/kg/day cisplatin for 3 days intraperitoneally. Ototoxicy had developed in all rats, as confirmed with DPOAE after 1 week. Rectal and intratympanic ozone therapy group was Group 1. No treatment was administered for the rats in Group 2 as the control group. The rats in Group 3 were treated with rectal ozone. All the rats were tested with DPOAE under general anesthesia, and all were sacrificed for pathological examination 1 week after ozone administration. Their cochleas were removed. The outer hair cell damage and stria vascularis damage were examined. In the statistical analysis conducted, a statistically significant difference between Group 1 and Group 2 was observed in all frequencies according to the DPOAE test. In addition, between Group 2 and Group 3, a statistically significant difference was observed in the DPOAE test. However, a statistically significant difference was not observed between Group 1 and Group 3 according to the DPOAE test. According to histopathological scoring, the outer hair cell damage score was statistically significantly high in Group 2 compared with Group 1. In addition, the outer hair cell damage score was also statistically significantly high in Group 2 compared with Group 3. Outer hair cell damage scores were low in Group 1 and Group 3, but there was no statistically significant difference between these groups. There was no statistically significant difference between the groups in terms of stria vascularis damage score examinations. Systemic ozone gas therapy is effective in the treatment of cell damage in cisplatin-induced ototoxicity. The intratympanic administration of ozone gas does not have any additional advantage over the rectal administration.
The influence of four different anticoagulants on dynamic light scattering of platelets.
Raczat, T; Kraemer, L; Gall, C; Weiss, D R; Eckstein, R; Ringwald, J
2014-08-01
For testing of dynamic light scattering of platelets with ThromboLUX (TLX) in platelet-rich plasma (PRP) derived from venous whole blood (vWB), anticoagulation is needed. We compared TLX score in PRPs containing citrate, ethylene-diamine-tetraacetic-acid (EDTA), citrate-phosphate-dextrose-adenine (CPDA) or citrate-theophylline-adenosine-dipyridamole. Initial and late TLX scores were measured after 30-120 min or four to six hours, respectively. Compared with citrate, mean differences in initial TLX score were only significant for CPDA. Also, mean differences between initial and late TLX scores were only significant for CPDA. TLX failed to detect EDTA-induced platelet alterations. The clinical relevance of TLX needs further studies. © 2014 International Society of Blood Transfusion.
Physicians’ attitudes toward pharmacogenetic testing before and after pharmacogenetic education
Luzum, Jasmine A; Luzum, Matthew J
2016-01-01
Aim: Our aim was to evaluate physicians’ attitudes toward pharmacogenetic testing before and after pharmacogenetic education. Methods: In total, 12 physicians (˜40% response rate) completed a survey with eight questions on 10-point scales on their attitudes toward pharmacogenetic testing before and after a 1-h grand rounds presentation on pharmacogenetics. Differences in question scores overall, among training levels (resident/fellow/attending), and specific drugs (clopidogrel/simvastatin/warfarin) were assessed using Wilcoxon signed-rank and exact Kruskal–Wallis tests. Results & conclusion: The scores for all eight questions increased, with statistically significant (p < 0.05) increases for four out of eight questions. The scores were similar among training levels, but the postscores for clopidogrel were significantly higher than for simvastatin and warfarin. In conclusion, brief pharmacogenetic education can significantly affect physicians’ attitudes toward pharmacogenetic testing. PMID:29749904
Kim, Ji-Yong
2011-01-01
Background In-training examination (ITE) is a cognitive examination similar to the written test, but it is different from the Clinical Practice Examination of the Korean Academy of Family Medicine (KAFM) Certification Examination (CE). The objective of this is to estimate the positive predictive value of the KAFM-ITE for identifying residents at risk for poor performance on the three types of KAFM-CE. Methods 372 residents who completed the KAFM-CE in 2011 were included. We compared the mean KAFM-CE scores with ITE experience. We evaluated the correlation and the positive predictive value (PPV) of ITE for the multiple choice question (MCQ) scores of 1st written test & 2nd slide examination, the total clinical practice examination scores, and the total sum of 2nd test. Results 275 out of 372 residents completed ITE. Those who completed ITE had significantly higher MCQ scores of 1st written test than those who did not. The correlation of ITE scores with 1st written MCQ (0.627) was found to be the highest among the other kinds of CE. The PPV of the ITE score for 1st written MCQ scores was 0.672. The PPV of the ITE score ranged from 0.376 to 0.502. Conclusion The score of the KAFM ITE has acceptable positive predictive value that could be used as a part of comprehensive evaluation system for residents in cognitive field. PMID:22745873
Pharmacy students' test-taking motivation-effort on a low-stakes standardized test.
Waskiewicz, Rhonda A
2011-04-11
To measure third-year pharmacy students' level of motivation while completing the Pharmacy Curriculum Outcomes Assessment (PCOA) administered as a low-stakes test to better understand use of the PCOA as a measure of student content knowledge. Student motivation was manipulated through an incentive (ie, personal letter from the dean) and a process of statistical motivation filtering. Data were analyzed to determine any differences between the experimental and control groups in PCOA test performance, motivation to perform well, and test performance after filtering for low motivation-effort. Incentivizing students diminished the need for filtering PCOA scores for low effort. Where filtering was used, performance scores improved, providing a more realistic measure of aggregate student performance. To ensure that PCOA scores are an accurate reflection of student knowledge, incentivizing and/or filtering for low motivation-effort among pharmacy students should be considered fundamental best practice when the PCOA is administered as a low-stakes test.
Furnham, A; Adam-Saib, S
2001-09-01
Previous studies have found significantly higher scores on the Eating Attitudes Test (EAT-26) which measures eating disorders among second-generation British-Asian schoolgirls in comparison to their White counterparts. Further, high EAT-26 scores (an indication of unhealthy eating attitudes and behaviours) are positively associated with parental overprotection scores on the Parental Bonding Instrument (PBI). This study aimed to replicate and extend previous findings, comparing British-Asian schoolgirls to White schoolgirls and consider 'intra-Asian' differences on the same measures, including factor scores. Participants completed three questionnaires: EAT-26, PBI and BSS (Body Satisfaction Scale). There were 168 participants: 46 White, 40 Indian, 44 Pakistani and 38 Bengali. Previous findings were supported; the Asian scores were significantly higher than the White scores on the EAT-26 and PBI, but not the BSS. The Bengali sample had significantly higher EAT-26 total and 'oral control' scores than the other groups. There were no intra-Asian differences for the overprotection scores. PBI scores were not associated with EAT-26 scores. The BSS score was the only significant predictor of EAT scores, when entered into a regression along with PBI scores and the body mass index. Results demonstrated sociocultural factors in the development of eating disorders. The results suggest that there are important psychological differences between second-generation migrants from different countries on the Indian subcontinent. In line with previous studies, significant differences were found between the four ethnic groups, parenting styles, but these did not relate to actual eating disorders.
Gedikoglu, U; Coskun, O; Inan, L E; Ucler, S; Tunc, T; Emre, U
2005-06-01
The Migraine Disability Assessment (MIDAS) questionnaire is a brief, self-administered questionnaire which is designed to quantify headache-related disability in a 3-month period. We have tested a Turkish version of the MIDAS questionnaire in 60 migraine patients. Sixty of the clinically diagnosed migraine headache sufferers were enrolled in a 90-day diary study and completed the MIDAS questionnaire in the first, 21st and the last day of the 90-day study. The scores taken from the diary and the scores of the MIDAS taken at different times were evaluated by the correlation tests of both Pearson and Spearman for each question and total scores. Cronbach's scores taken from the diary and taken from the test of the MIDAS which was applied at different times were evaluated. Pearson's correlation on the responses in the initial MIDAS questions was between 0.44 (reduced productivity in household chores) and 0.78 (missed work or school days). The correlation of the Spearman was similar to the Pearson values. As a result, we found that the overall score of the MIDAS has a good reliability and its internal consistency is also good (Cronbach's alpha 0.87). These findings support the use of the MIDAS questionnaire as a clinical and research tool on Turkish patients.
Test-Taking Strategy as a Mediator between Race and Academic Performance
ERIC Educational Resources Information Center
Dollinger, Stephen J.; Clark, M. H.
2012-01-01
The issue of race differences in standardized test scores and academic achievement continues to be a vexing one for behavioral scientists and society at large. Ellis and Ryan (2003) suggested that a portion of the cognitive-ability test performance differences between White/Caucasian-American and Black/African-American college students could be…
The Effects of a Translation Bias on the Scores for the "Basic Economics Test"
ERIC Educational Resources Information Center
Hahn, Jinsoo; Jang, Kyungho
2012-01-01
International comparisons of economic understanding generally require a translation of a standardized test written in English into another language. Test results can differ based on how researchers translate the English written exam into one in their own language. To confirm this hypothesis, two differently translated versions of the "Basic…
Merchant, Roland C; Clark, Melissa A; Mayer, Kenneth H; Seage Iii, George R; DeGruttola, Victor G; Becker, Bruce M
2009-02-01
Video-based delivery of human immunodeficiency virus (HIV) pretest information might assist in streamlining HIV screening and testing efforts in the emergency department (ED). The objectives of this study were to determine if the video "Do you know about rapid HIV testing?" is an acceptable alternative to an in-person information session on rapid HIV pretest information, in regard to comprehension of rapid HIV pretest fundamentals, and to identify patients who might have difficulties in comprehending pretest information. This was a noninferiority trial of 574 participants in an ED opt-in rapid HIV screening program who were randomly assigned to receive identical pretest information from either an animated and live-action 9.5-minute video or an in-person information session. Pretest information comprehension was assessed using a questionnaire. The video would be accepted as not inferior to the in-person information session if the 95% confidence interval (CI) of the difference (Delta) in mean scores on the questionnaire between the two information groups was less than a 10% decrease in the in-person information session arm's mean score. Linear regression models were constructed to identify patients with lower mean scores based upon study arm assignment, demographic characteristics, and history of prior HIV testing. The questionnaire mean scores were 20.1 (95% CI = 19.7 to 20.5) for the video arm and 20.8 (95% CI = 20.4 to 21.2) for the in-person information session arm. The difference in mean scores compared to the mean score for the in-person information session met the noninferiority criterion for this investigation (Delta = 0.68; 95% CI = 0.18 to 1.26). In a multivariable linear regression model, Blacks/African Americans, Hispanics, and those with Medicare and Medicaid insurance exhibited slightly lower mean scores, regardless of the pretest information delivery format. There was a strong relationship between fewer years of formal education and lower mean scores on the questionnaire. Age, gender, type of insurance, partner/marital status, and history of prior HIV testing were not predictive of scores on the questionnaire. In terms of patient comprehension of rapid HIV pretest information fundamentals, the video was an acceptable substitute to pretest information delivered by an HIV test counselor. Both the video and the in-person information session were less effective in providing pretest information for patients with fewer years of formal education.
Kelly, Maureen E; Regan, Daniel; Dunne, Fidelma; Henn, Patrick; Newell, John; O'Flynn, Siun
2013-05-10
Internationally, tests of general mental ability are used in the selection of medical students. Examples include the Medical College Admission Test, Undergraduate Medicine and Health Sciences Admission Test and the UK Clinical Aptitude Test. The most widely used measure of their efficacy is predictive validity.A new tool, the Health Professions Admission Test- Ireland (HPAT-Ireland), was introduced in 2009. Traditionally, selection to Irish undergraduate medical schools relied on academic achievement. Since 2009, Irish and EU applicants are selected on a combination of their secondary school academic record (measured predominately by the Leaving Certificate Examination) and HPAT-Ireland score. This is the first study to report on the predictive validity of the HPAT-Ireland for early undergraduate assessments of communication and clinical skills. Students enrolled at two Irish medical schools in 2009 were followed up for two years. Data collected were gender, HPAT-Ireland total and subsection scores; Leaving Certificate Examination plus HPAT-Ireland combined score, Year 1 Objective Structured Clinical Examination (OSCE) scores (Total score, communication and clinical subtest scores), Year 1 Multiple Choice Questions and Year 2 OSCE and subset scores. We report descriptive statistics, Pearson correlation coefficients and Multiple linear regression models. Data were available for 312 students. In Year 1 none of the selection criteria were significantly related to student OSCE performance. The Leaving Certificate Examination and Leaving Certificate plus HPAT-Ireland combined scores correlated with MCQ marks.In Year 2 a series of significant correlations emerged between the HPAT-Ireland and subsections thereof with OSCE Communication Z-scores; OSCE Clinical Z-scores; and Total OSCE Z-scores. However on multiple regression only the relationship between Total OSCE Score and the Total HPAT-Ireland score remained significant; albeit the predictive power was modest. We found that none of our selection criteria strongly predict clinical and communication skills. The HPAT- Ireland appears to measures ability in domains different to those assessed by the Leaving Certificate Examination. While some significant associations did emerge in Year 2 between HPAT Ireland and total OSCE scores further evaluation is required to establish if this pattern continues during the senior years of the medical course.
2013-01-01
Background Internationally, tests of general mental ability are used in the selection of medical students. Examples include the Medical College Admission Test, Undergraduate Medicine and Health Sciences Admission Test and the UK Clinical Aptitude Test. The most widely used measure of their efficacy is predictive validity. A new tool, the Health Professions Admission Test- Ireland (HPAT-Ireland), was introduced in 2009. Traditionally, selection to Irish undergraduate medical schools relied on academic achievement. Since 2009, Irish and EU applicants are selected on a combination of their secondary school academic record (measured predominately by the Leaving Certificate Examination) and HPAT-Ireland score. This is the first study to report on the predictive validity of the HPAT-Ireland for early undergraduate assessments of communication and clinical skills. Method Students enrolled at two Irish medical schools in 2009 were followed up for two years. Data collected were gender, HPAT-Ireland total and subsection scores; Leaving Certificate Examination plus HPAT-Ireland combined score, Year 1 Objective Structured Clinical Examination (OSCE) scores (Total score, communication and clinical subtest scores), Year 1 Multiple Choice Questions and Year 2 OSCE and subset scores. We report descriptive statistics, Pearson correlation coefficients and Multiple linear regression models. Results Data were available for 312 students. In Year 1 none of the selection criteria were significantly related to student OSCE performance. The Leaving Certificate Examination and Leaving Certificate plus HPAT-Ireland combined scores correlated with MCQ marks. In Year 2 a series of significant correlations emerged between the HPAT-Ireland and subsections thereof with OSCE Communication Z-scores; OSCE Clinical Z-scores; and Total OSCE Z-scores. However on multiple regression only the relationship between Total OSCE Score and the Total HPAT-Ireland score remained significant; albeit the predictive power was modest. Conclusion We found that none of our selection criteria strongly predict clinical and communication skills. The HPAT- Ireland appears to measures ability in domains different to those assessed by the Leaving Certificate Examination. While some significant associations did emerge in Year 2 between HPAT Ireland and total OSCE scores further evaluation is required to establish if this pattern continues during the senior years of the medical course. PMID:23663266
ERIC Educational Resources Information Center
Ladyshewsky, Richard K.
2015-01-01
This research explores differences in multiple choice test (MCT) scores in a cohort of post-graduate students enrolled in a management and leadership course. A total of 250 students completed the MCT in either a supervised in-class paper and pencil test or an unsupervised online test. The only statistically significant difference between the nine…
Zhao, Xiaohui; Oppler, Scott; Dunleavy, Dana; Kroopnick, Marc
2010-10-01
This study investigated the validity of four approaches (the average, most recent, highest-within-administration, and highest-across-administration approaches) of using repeaters' Medical College Admission Test (MCAT) scores to predict Step 1 scores. Using the differential predication method, this study investigated the magnitude of differences in the expected Step 1 total scores between MCAT nonrepeaters and three repeater groups (two-time, three-time, and four-time test takers) for the four scoring approaches. For the average score approach, matriculants with the same MCAT average are expected to achieve similar Step 1 total scores regardless of whether the individual attempted the MCAT exam one or multiple times. For the other three approaches, repeaters are expected to achieve lower Step 1 scores than nonrepeaters; for a given MCAT score, as the number of attempts increases, the expected Step 1 decreases. The effect was strongest for the highest-across-administration approach, followed by the highest-within-administration approach, and then the most recent approach. Using the average score is the best approach for considering repeaters' MCAT scores in medical school admission decisions.
ERIC Educational Resources Information Center
Olsen, Marilyn
A study (conducted in suburban central New Jersey using 218 second graders' California Achievement Test (CAT) scores from 1986-1988 compared the effectiveness of two well-known reading programs. Results indicated that although there was no statistically significant difference in the scores, the mean difference suggested that children who were…
ERIC Educational Resources Information Center
Root, Melissa M.; Marchis, Lavinia; White, Erica; Courville, Troy; Choi, Dowon; Bray, Melissa A.; Pan, Xingyu; Wayte, Jessica
2017-01-01
This study investigated the differences in error factor scores on the Kaufman Test of Educational Achievement-Third Edition between individuals with mild intellectual disabilities (Mild IDs), those with low achievement scores but average intelligence, and those with low intelligence but without a Mild ID diagnosis. The two control groups were…
ERIC Educational Resources Information Center
Crocker, Linda M.; Mehrens, William A.
Four new methods of item analysis were used to select subsets of items which would yield measures of attitude change. The sample consisted of 263 students at Michigan State University who were tested on the Inventory of Beliefs as freshmen and retested on the same instrument as juniors. Item change scores and total change scores were computed for…
ERIC Educational Resources Information Center
Ozdemir, Burhanettin
2017-01-01
The purpose of this study is to equate Trends in International Mathematics and Science Study (TIMSS) mathematics subtest scores obtained from TIMSS 2011 to scores obtained from TIMSS 2007 form with different nonlinear observed score equating methods under Non-Equivalent Anchor Test (NEAT) design where common items are used to link two or more test…
ERIC Educational Resources Information Center
Resendes, John; Lecci, Len
2012-01-01
MMPI-2 scores from a parent competency sample (N = 136 parents) are compared with a previously published data set of MMPI-2 scores for child custody litigants (N = 508 parents; Bathurst et al., 1997). Independent samples t tests yielded significant and in some cases substantial differences on the standard MMPI-2 clinical scales (especially Scales…
Intelligent Use of Intelligence Tests: Empirical and Clinical Support for Canadian WAIS-IV Norms
ERIC Educational Resources Information Center
Miller, Jessie L.; Weiss, Lawrence G.; Beal, A. Lynne; Saklofske, Donald H.; Zhu, Jianjun; Holdnack, James A.
2015-01-01
It is well established that Canadians produce higher raw scores than their U.S. counterparts on intellectual assessments. As a result of these differences in ability along with smaller variability in the population's intellectual performance, Canadian normative data will yield lower standard scores for most raw score points compared to U.S. norms.…
Effect of basic laparoscopic skills courses on essential knowledge of equipment.
van Hove, P Diederick; Verdaasdonk, Emiel G G; van der Harst, Erwin; Jansen, Frank Willem; Dankelman, Jenny; Stassen, Laurents P S
2012-12-01
This study aims to evaluate the effect of laparoscopic skills courses on the knowledge of laparoscopic equipment. A knowledge test on laparoscopic equipment was developed, and participants of 3 separate basic laparoscopic skills courses in the Netherlands completed the test at the beginning and end of these courses. All lectures and demonstrations during the courses were recorded on video to assess the matching of its contents with the items in the test. As a reference, the test was also completed by a group of laparoscopic experts by e-mail. In total, 36 participants (64.3%) completed both the pretest and posttest. Overall, the mean test score improved from 60.4% of the maximum possible score for the pretest to 68.4% for the posttest. There were no significant differences in test scores between the 3 separate courses. However, the actual content varied among the courses. The correspondence of the test items with the course content varied from 47% to 69%. Although 30% of the participants had already received training for laparoscopic equipment in their own hospital, 92.5% wanted to receive more training. 28 experts completed the test with a mean score of 75.7%, which was significantly better than the posttest score of the course participants. The laparoscopic skills courses evaluated in this study had a modest positive effect on the acquisition of knowledge about laparoscopic equipment. Variance exists among their contents.
Avşar, Fatma; Ayaz Alkaya, Sultan
The aim of this study was to determine the effectiveness of an assertive training for school-aged children on peer bullying and assertiveness. A quasi-experimental design using pre- and post-testing was conducted. Data were collected using a demographic questionnaire, an assertiveness scale, and the peer victimization scale. The training program was comprised of eight sessions which were implemented to intervention group. Descriptive characteristics were not statistically different between the groups (p>0.05). The peer victimization victim dimension results show that post-test mean scores of the students in the intervention group were lower than the pre-test mean scores (p<0.05). For the control group, no significant change was found in the pre-test and post-test mean scores (p>0.05). A comparison of the mean pre-test/post-test scores of peer-victimization bully dimension of the students' intervention and control groups revealed that the mean post-test scores of the students in the each group decreased (p>0.05). An assertiveness training program increased the assertiveness level and reduced the state of being victims, but did not affect the state of being bullies. The results of this study can help children acquire assertive behaviors instead of negative behaviors such as aggression and shyness, and help them to build effective social communication. Copyright © 2017 Elsevier Inc. All rights reserved.
Root Kustritz, Margaret V
2014-01-01
Third-year veterinary students in a required theriogenology diagnostics course were allowed to self-select attendance at a lecture in either the evening or the next morning. One group was presented with PowerPoint slides in a traditional format (T group), and the other group was presented with PowerPoint slides in the assertion-evidence format (A-E group), which uses a single sentence and a highly relevant graphic on each slide to ensure attention is drawn to the most important points in the presentation. Students took a multiple-choice pre-test, attended lecture, and then completed a take-home assignment. All students then completed an online multiple-choice post-test and, one month later, a different online multiple-choice test to evaluate retention. Groups did not differ on pre-test, assignment, or post-test scores, and both groups showed significant gains from pre-test to post-test and from pre-test to retention test. However, the T group showed significant decline from post-test to retention test, while the A-E group did not. Short-term differences between slide designs were most likely unaffected due to required coursework immediately after lecture, but retention of material was superior with the assertion-evidence slide design.
Vasconcelos-Moreno, Mirela P; Bücker, Joana; Bürke, Kelen P; Czepielewski, Leticia; Santos, Barbara T; Fijtman, Adam; Passos, Ives C; Kunz, Mauricio; Bonnín, Caterina Del Mar; Vieta, Eduard; Kapczinski, Flavio; Rosa, Adriane R; Kauer-Sant'Anna, Marcia
2016-01-01
To assess cognitive performance and psychosocial functioning in patients with bipolar disorder (BD), in unaffected siblings, and in healthy controls. Subjects were patients with BD (n=36), unaffected siblings (n=35), and healthy controls (n=44). Psychosocial functioning was accessed using the Functioning Assessment Short Test (FAST). A sub-group of patients with BD (n=21), unaffected siblings (n=14), and healthy controls (n=22) also underwent a battery of neuropsychological tests: California Verbal Learning Test (CVLT), Stroop Color and Word Test, and Wisconsin Card Sorting Test (WCST). Clinical and sociodemographic characteristics were analyzed using one-way analysis of variance or the chi-square test; multivariate analysis of covariance was used to examine differences in neuropsychological variables. Patients with BD showed higher FAST total scores (23.90±11.35) than healthy controls (5.86±5.47; p < 0.001) and siblings (12.60±11.83; p 0.001). Siblings and healthy controls also showed statistically significant differences in FAST total scores (p = 0.008). Patients performed worse than healthy controls on all CVLT sub-tests (p < 0.030) and in the number of correctly completed categories on WCST (p = 0.030). Siblings did not differ from healthy controls in cognitive tests. Unaffected siblings of patients with BD may show poorer functional performance compared to healthy controls. FAST scores may contribute to the development of markers of vulnerability and endophenotypic traits in at-risk populations.
Relation of anosognosia to frontal lobe dysfunction in Alzheimer's disease.
Michon, A; Deweer, B; Pillon, B; Agid, Y; Dubois, B
1994-07-01
A self-rating scale of memory functions was administered to 24 non-depressed patients with probable Alzheimer's disease, divided into two groups according to the overall severity of dementia (mild, mini-mental state (MMS) > 21; moderate, MMS between 10 and 20). These groups did not significantly differ in their self-rating of memory functions. The same questionnaire was submitted to a member of each patient's family, who had to rate the patient's memory. An "anosognosia score" was defined as the difference between patient's and family's ratings. This score was highly variable, and covered, in the two groups, the full range between complete awareness of deficits and total anosognosia. Correlations between the anosognosia score and several neuropsychological data were searched for. No significant correlation was found with either the Wechsler memory scale, the MMS, or linguistic abilities and gestures. In contrast, this score was highly correlated with the "frontal score", defined as the sum of scores on the Wisconsin card sorting test (WCST), verbal fluency, Luria's graphic series, and "frontal behaviours" (prehension, utilisation, imitation behaviours, inertia, indifference). Among these tests of executive functions, the highest correlation with the anosognosia score was obtained on the WCST. This suggests that anosognosia in Alzheimer's disease is not related to the degree of cognitive deterioration but results, at least in part, from frontal dysfunction.
Relationship between Learning Style and Academic Status of Babol Dental Students
Nasiri, Zahra; Gharekhani, Samane; Ghasempour, Maryam
2016-01-01
Introduction Identifying and employing students’ learning styles could play an important role in selecting appropriate teaching methods in order to improve education. The aim of this study was to determine the relationship between the students’ final exam scores and the learning style preferences of dental students at Babol University of Medical Sciences. Methods This cross-sectional study was conducted on 88 dental students studying in their fourth, fifth, and sixth years using the visual–aural–reading/writing–kinesthetic (VARK) learning styles’ questionnaire. The data were analyzed with IBM SPSS, version 21, using the chi-squared test and the t-test. Results Of the 88 participants who responded to the questionnaire, 87 preferred multimodal learning styles. There was no significant difference between the mean of the final exam scores in students who did and did not prefer the aural learning style (p = 0.86), the reading/writing learning style (p = 0.20), and the kinesthetic learning style (p = 0.32). In addition, there was no significant difference between the scores on the final clinical course among the students who had different preferences for learning style. However, there was a significant difference between the mean of the final exam scores in students with and without visual learning style preference (p = 0.03), with the former having higher mean scores. There was no significant relationship between preferred learning styles and gender (p > 0.05). Conclusion The majority of dental students preferred multimodal learning styles, and there was a significant difference between the mean of the final exam scores for students with and without a preference for the visual learning style. In addition, there were no differences in the preferred learning styles between male and female students. PMID:27382442
Dahlke, Jeffrey A; Kostal, Jack W; Sackett, Paul R; Kuncel, Nathan R
2018-05-03
We explore potential explanations for validity degradation using a unique predictive validation data set containing up to four consecutive years of high school students' cognitive test scores and four complete years of those students' college grades. This data set permits analyses that disentangle the effects of predictor-score age and timing of criterion measurements on validity degradation. We investigate the extent to which validity degradation is explained by criterion dynamism versus the limited shelf-life of ability scores. We also explore whether validity degradation is attributable to fluctuations in criterion variability over time and/or GPA contamination from individual differences in course-taking patterns. Analyses of multiyear predictor data suggest that changes to the determinants of performance over time have much stronger effects on validity degradation than does the shelf-life of cognitive test scores. The age of predictor scores had only a modest relationship with criterion-related validity when the criterion measurement occasion was held constant. Practical implications and recommendations for future research are discussed. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
ERIC Educational Resources Information Center
Schooler, Douglas L.; Anderson, Robert L.
1979-01-01
Analyzes preschoolers' scores on the Developmental Test of Visual Motor Integration (VMI), the Slosson Intelligence Test (SIT), and the ABC Inventory (ABCI). Separate ANOVAs reveal no race effect on the VMI. Race differences favoring Whites are found for SIT and ABCI. There were no effects for sex on any measure. (Author)
ERIC Educational Resources Information Center
Jensen, Arthur R.
Charles Spearman originally suggested in 1927 that the varying magnitudes of the mean differences between whites and blacks in standardized scores on a variety of mental tests are directly related to the size of the tests' loadings on g, the general factor common to all complex tests of mental ability. Several independent large-scale studies…
Tee, Jason C; Klingbiel, Jannie F G; Collins, Robert; Lambert, Mike I; Coopoo, Yoga
2016-11-01
Tee, JC, Klingbiel, JFG, Collins, R, Lambert, MI, and Coopoo, Y. Preseason Functional Movement Screen component tests predict severe contact injuries in professional rugby union players. J Strength Cond Res 30(11): 3194-3203, 2016-Rugby union is a collision sport with a relatively high risk of injury. The ability of the Functional Movement Screen (FMS) or its component tests to predict the occurrence of severe (≥28 days) injuries in professional players was assessed. Ninety FMS test observations from 62 players across 4 different time periods were compared with severe injuries sustained during 6 months after FMS testing. Mean composite FMS scores were significantly lower in players who sustained severe injury (injured 13.2 ± 1.5 vs. noninjured 14.5 ± 1.4, Effect Size = 0.83, large) because of differences in in-line lunge (ILL) and active straight leg raise scores (ASLR). Receiver-operated characteristic curves and 2 × 2 contingency tables were used to determine that ASLR (cut-off 2/3) was the injury predictor with the greatest sensitivity (0.96, 95% confidence interval [CI] = 0.79-1.0). Adding the ILL in combination with ASLR (ILL + ASLR) improved the specificity of the injury prediction model (ASLR specificity = 0.29, 95% CI = 0.18-0.43 vs. ASLR + ILL specificity = 0.53, 95% CI = 0.39-0.66, p ≤ 0.05). Further analysis was performed to determine whether FMS tests could predict contact and noncontact injuries. The FMS composite score and various combinations of component tests (deep squat [DS] + ILL, ILL + ASLR, and DS + ILL + ASLR) were all significant predictors of contact injury. The FMS composite score also predicted noncontact injury, but no component test or combination thereof produced a similar result. These findings indicate that low scores on various FMS component tests are risk factors for injury in professional rugby players.
Eslami, Ahmad Ali; Rabiei, Leili; Afzali, Seyed Mohammad; Hamidizadeh, Saeed; Masoudi, Reza
2016-01-01
Adolescence is a transition period from childhood to early adulthood. Because of the immense pressure imposed on adolescents due to the complications and ambiguities of this transition, their level of excitement increases and sometimes it appears in the form of sensitivity and intense excitement. This study aimed at determining the effectiveness of assertiveness training on the levels of stress, anxiety, and depression of high school students. This quasi-experimental study was conducted on high school students of Isfahan in academic year 2012 - 13. A total of 126 second grade high school students were collected according to simple random sampling method and divided into two groups: experimental with 63 participants and control with the same number. Data gathering instruments included a demographic questionnaire, Gambill-Richey assertiveness scale, and depression anxiety stress scales (DASS-21). Assertiveness training was carried out on the experimental group in 8 sessions; after 8 weeks, posttest was carried out on both groups. Statistical tests such as independent t test, repeated measures ANOVA, Chi-square test, and the Mann-Whitney test were used to interpret and analyze the data. The Chi-square and Mann-Whitney tests did not show significant statistical differences between the two groups in terms of demographic variables (P ≥ 0.05). Repeated measures ANOVA showed no significant difference between the mean scores for assertiveness before (100.23 ± 7.37), immediately after (101.57 ± 16.06), and 2 months after (100.77 ± 12.50) the intervention in the control group. However, the same test found a significant difference between the mean score for assertiveness in the experimental group before (101.6 ± 9.1), immediately after (96.47 ± 10.84), and 2 months after (95.41 ± 8.37) implementing the training program (P = 0.002). The independent t test showed no significant difference in the mean score for anxiety and stress between two groups before the assertiveness training program; however, 2 months after the intervention, the mean score for anxiety in the experimental group was found significantly lower than the control group. As for the mean score for depression, the independent t test showed no significant difference between two groups before training; however, despite the decrease in the mean scores for depression in the experimental group following the intervention, the difference was not significant (P = 0.09). The results of the current study show that conducting assertive training in high school students decreases their anxiety, stress, and depression. Given that high school years are among the most sensitive stages of one's life plus the fact that conducting such training programs besides their safe and low cost nature are effective and practical, it is highly recommended that such programs be carried out among high school adolescents.
Eslami, Ahmad Ali; Rabiei, Leili; Afzali, Seyed Mohammad; Hamidizadeh, Saeed; Masoudi, Reza
2016-01-01
Background: Adolescence is a transition period from childhood to early adulthood. Because of the immense pressure imposed on adolescents due to the complications and ambiguities of this transition, their level of excitement increases and sometimes it appears in the form of sensitivity and intense excitement. Objectives: This study aimed at determining the effectiveness of assertiveness training on the levels of stress, anxiety, and depression of high school students. Materials and Methods: This quasi-experimental study was conducted on high school students of Isfahan in academic year 2012 - 13. A total of 126 second grade high school students were collected according to simple random sampling method and divided into two groups: experimental with 63 participants and control with the same number. Data gathering instruments included a demographic questionnaire, Gambill-Richey assertiveness scale, and depression anxiety stress scales (DASS-21). Assertiveness training was carried out on the experimental group in 8 sessions; after 8 weeks, posttest was carried out on both groups. Statistical tests such as independent t test, repeated measures ANOVA, Chi-square test, and the Mann-Whitney test were used to interpret and analyze the data. Results: The Chi-square and Mann-Whitney tests did not show significant statistical differences between the two groups in terms of demographic variables (P ≥ 0.05). Repeated measures ANOVA showed no significant difference between the mean scores for assertiveness before (100.23 ± 7.37), immediately after (101.57 ± 16.06), and 2 months after (100.77 ± 12.50) the intervention in the control group. However, the same test found a significant difference between the mean score for assertiveness in the experimental group before (101.6 ± 9.1), immediately after (96.47 ± 10.84), and 2 months after (95.41 ± 8.37) implementing the training program (P = 0.002). The independent t test showed no significant difference in the mean score for anxiety and stress between two groups before the assertiveness training program; however, 2 months after the intervention, the mean score for anxiety in the experimental group was found significantly lower than the control group. As for the mean score for depression, the independent t test showed no significant difference between two groups before training; however, despite the decrease in the mean scores for depression in the experimental group following the intervention, the difference was not significant (P = 0.09). Conclusions: The results of the current study show that conducting assertive training in high school students decreases their anxiety, stress, and depression. Given that high school years are among the most sensitive stages of one’s life plus the fact that conducting such training programs besides their safe and low cost nature are effective and practical, it is highly recommended that such programs be carried out among high school adolescents. PMID:26889390
DERAKHSHANDEH, ZAHRA; AMINI, MITRA; KOJURI, JAVAD; DEHBOZORGIAN, MARZIYEH
2018-01-01
Introduction: Clinical reasoning is one of the most important skills in the process of training a medical student to become an efficient physician. Assessment of the reasoning skills in a medical school program is important to direct students’ learning. One of the tests for measuring the clinical reasoning ability is Clinical Reasoning Problems (CRPs). The major aim of this study is to measure psychometric qualities of CRPs and define correlation between this test and routine MCQ in cardiology department of Shiraz medical school. Methods: This study was a descriptive study conducted on total cardiology residents of Shiraz Medical School. The study population consists of 40 residents in 2014. The routine CRPs and the MCQ tests was designed based on similar objectives and were carried out simultaneously. Reliability, item difficulty, item discrimination, and correlation between each item and the total score of CRPs were all measured by Excel and SPSS software for checking psycometeric CRPs test. Furthermore, we calculated the correlation between CRPs test and MCQ test. The mean differences of CRPs test score between residents’ academic year [second, third and fourth year] were also evaluated by Analysis of variances test (One Way ANOVA) using SPSS software (version 20)(α=0.05). Results: The mean and standard deviation of score in CRPs was 10.19 ±3.39 out of 20; in MCQ, it was 13.15±3.81 out of 20. Item difficulty was in the range of 0.27-0.72; item discrimination was 0.30-0.75 with question No.3 being the exception (that was 0.24). The correlation between each item and the total score of CRP was 0.26-0.87; the correlation between CRPs test and MCQ test was 0.68 (p<0.001). The reliability of the CRPs was 0.72 as calculated by using Cronbach's alpha. The mean score of CRPs was different among residents based on their academic year and this difference was statistically significant (p<0.001). Conclusion: The results of this present investigation revealed that CRPs could be reliable test for measuring clinical reasoning in residents. It can be included in cardiology residency assessment programs. PMID:29344528
Hebbal, M; Ankola, A V
2012-10-01
To develop a special oral health education technique and compare plaque scores before and after health education. Non-randomised before and after comparison trial without controls. The final study population comprised of 96 visually impaired children aged 6-18 years old. Silness and Loe plaque index scores were recorded at baseline. 'Audio tactile performance technique' (ATP Technique) a specially designed health education method was used to educate these children regarding oral hygiene maintenance. Periodic reinforcement of health education was performed at an interval of 9 months. Re-examination was carried out after 18 months of health education to assess plaque scores. Wilcoxon's sign rank test and paired t test was used to assess the difference between the scores before and after health education. There was increase in frequency of tooth brushing after health education. The mean plaque scores pre- and post-health education were 1.41 (+/-0.58) and 0.63 (+/-0.39) respectively. The difference was statistically significant (p<0.001). Visually impaired children could maintain an acceptable level of oral hygiene when taught using special customised methods.
Characterizing the gender gap in introductory physics
NASA Astrophysics Data System (ADS)
Kost, Lauren E.; Pollock, Steven J.; Finkelstein, Noah D.
2009-06-01
Previous research [S. J. Pollock , Phys. Rev. ST Phys. Educ. Res. 3, 1 (2007)] showed that despite the use of interactive engagement techniques, the gap in performance between males and females on a conceptual learning survey persisted from pretest to post-test at the University of Colorado at Boulder. Such findings were counter to previously published work [M. Lorenzo , Am. J. Phys. 74, 118 (2006)]. This study begins by identifying a variety of other gender differences. There is a small but significant difference in the course grades of males and females. Males and females have significantly different prior understandings of physics and mathematics. Females are less likely to take high school physics than males, although they are equally likely to take high school calculus. Males and females also differ in their incoming attitudes and beliefs about physics. This collection of background factors is analyzed to determine the extent to which each factor correlates with performance on a conceptual post-test and with gender. Binned by quintiles, we observe that males and females with similar pretest scores do not have significantly different post-test scores (p>0.2) . The post-test data are then modeled using two regression models (multiple regression and logistic regression) to estimate the gender gap in post-test scores after controlling for these important prior factors. These prior factors account for about 70% of the observed gender gap. The results indicate that the gender gap exists in interactive physics classes at our institution but is largely associated with differences in previous physics and math knowledge and incoming attitudes and beliefs.
Assessing Freshman Engineering Students' Understanding of Ethical Behavior.
Henslee, Amber M; Murray, Susan L; Olbricht, Gayla R; Ludlow, Douglas K; Hays, Malcolm E; Nelson, Hannah M
2017-02-01
Academic dishonesty, including cheating and plagiarism, is on the rise in colleges, particularly among engineering students. While students decide to engage in these behaviors for many different reasons, academic integrity training can help improve their understanding of ethical decision making. The two studies outlined in this paper assess the effectiveness of an online module in increasing academic integrity among first semester engineering students. Study 1 tested the effectiveness of an academic honesty tutorial by using a between groups design with a Time 1- and Time 2-test. An academic honesty quiz assessed participants' knowledge at both time points. Study 2, which incorporated an improved version of the module and quiz, utilized a between groups design with three assessment time points. The additional Time 3-test allowed researchers to test for retention of information. Results were analyzed using ANCOVA and t tests. In Study 1, the experimental group exhibited significant improvement on the plagiarism items, but not the total score. However, at Time 2 there was no significant difference between groups after controlling for Time 1 scores. In Study 2, between- and within-group analyses suggest there was a significant improvement in total scores, but not plagiarism scores, after exposure to the tutorial. Overall, the academic integrity module impacted participants as evidenced by changes in total score and on specific plagiarism items. Although future implementation of the tutorial and quiz would benefit from modifications to reduce ceiling effects and improve assessment of knowledge, the results suggest such tutorial may be one valuable element in a systems approach to improving the academic integrity of engineering students.
NASA Astrophysics Data System (ADS)
Powers, Angela R.
2000-10-01
This study explored the relationship between secondary chemistry students' conceptual representations of acid-base chemistry, as shown in student-constructed concept maps, and their ability to solve acid-base problems, represented by their score on an 18-item paper and pencil test, the Acid-Base Concept Assessment (ABCA). The ABCA, consisting of both multiple-choice and short-answer items, was originally designed using a question-type by subtopic matrix, validated by a panel of experts, and refined through pilot studies and factor analysis to create the final instrument. The concept map task included a short introduction to concept mapping, a prototype concept map, a practice concept-mapping activity, and the instructions for the acid-base concept map task. The instruments were administered to chemistry students at two high schools; 108 subjects completed both instruments for this study. Factor analysis of ABCA results indicated that the test was unifactorial for these students, despite the intention to create an instrument with multiple "question-type" scales. Concept maps were scored both holistically and by counting valid concepts. The two approaches were highly correlated (r = 0.75). The correlation between ABCA score and concept-map score was 0.29 for holistically-scored concept maps and 0.33 for counted-concept maps. Although both correlations were significant, they accounted for only 8.8 and 10.2% of variance in ABCA scores, respectively. However, when the reliability of the instruments used is considered, more than 20% of the variance in ABCA scores may be explained by concept map scores. MANOVAs for ABCA and concept map scores by instructor, student gender, and year in school showed significant differences for both holistic and counted concept-map scores. Discriminant analysis revealed that the source of these differences was the instruction variable. Significant differences between classes receiving different instruction were found in the frequency of concepts listed by students for 9 of 10 concepts evaluated. Mean ABCA scores did not differ significantly between the two instruction groups. The results of this study failed to provide evidence of conceptual distinctions among different "types" of problem-solving items. The results suggested that several factors influence success in chemistry problem solving, including concept knowledge and organization. Further research into the nature of chemistry problems and problem solving is recommended.
Pera-Guardiola, Vanessa; Batalla, Iolanda; Bosque, Javier; Kosson, David; Pifarré, Josep; Hernández-Ribas, Rosa; Goldberg, Ximena; Contreras-Rodríguez, Oren; Menchón, José M; Soriano-Mas, Carles; Cardoner, Narcís
2016-01-30
Neuropsychological deficits in executive functions (EF) have been linked to antisocial behavior and considered to be cardinal to the onset and persistence of severe antisocial and aggressive behavior. However, when psychopathy is present, prior evidence suggests that the dorsolateral prefrontal cortex is unaffected leading to intact EF. Ninety-one male offenders with Antisocial Personality Disorder (ASPD) and 24 controls completed the Wisconsin Card Sorting Test (WCST). ASPD individuals were grouped in three categories according to Psychopathy Checklist-Revised (PCL-R) scores (low, medium and high). We hypothesized that ASPD offenders with high PCL-R scores will not differ from healthy controls in EF and will show better EF performance in comparison with subjects with low PCL-R scores. Results showed that ASPD offenders with low PCL-R scores committed more perseverative errors and responses than controls and offenders with high PCL-R scores, which did not differ from healthy controls. Moreover, scores on Factor 1 and the interpersonal facet of the PCL-R were predictors of better WCST performance. Our results suggest a modulatory role of psychopathy in the cognitive performance of ASPD offenders, and provide further evidence supporting that offenders with ASPD and psychopathy are characterized by a cognitive profile different from those with ASPD without psychopathy. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Effect of teaching mathematics using GeoGebra on students' with dissimilar spatial visualisation
NASA Astrophysics Data System (ADS)
Bakar, Kamariah Abu; Ayub, Ahmad Fauzi Mohd; Tarmizi, Rohani Ahmad; Luan, Wong Su
2015-10-01
This study examined the effects of GeoGebra on mathematics performance of students with different spatial visualization. A qusai-experimental, pretest-posttest control group design was conducted. A total of 71 students from two intact groups were involved in the study. They were in two groups and each group was randonly assigned to the experimental group (36 students) and control group (35 students). A spatial visual test to identify students with high or low visualization, and a mathematics performance pre-test were administered at the initial stage of this study. A post-test was administered after 12 weeks of treatment using GeoGebra. Analyses of Covarion (ANCOVA) was used to adjust for the pre-test score. Findings showed that the group with access to GeoGebra achieved significantly better test scores in the posttest as compared to the group which followed the traditional teaching method. A two-way ANCOVA used to analyse the effect of students' spatial visualization on post-test performance showed that there was no effect. The results from this study suggested that using GeoGebra had helped the students to score better in the posttest. However, there is no significance difference on mathematics performances on students with difference types of spatial visualisastion. This study indicates that GeoGebra is useful in enhancing the teaching and learning of mathematics.
McCaffrey, Ruth; Bishop, Mary; Adonis-Rizzo, Marie; Williamson, Ellen; McPherson, Melanie; Cruikshank, Alice; Carrier, Vicki Jo; Sands, Simone; Pigano, Diane; Girard, Patricia; Lauzon, Cathy
2007-01-01
Hospital-acquired deep vein thrombosis (DVT) and pulmonary embolisms (PE) are preventable problems that can increase mortality. Early assessment and recognition of risk as well as initiating appropriate prevention measures can prevent DVT or PE. The purpose of this research project was to develop a DVT risk assessment tool and test the tool for validity and reliability. Three phases were undertaken in developing and testing the JFK Medical Center DVT risk assessment tool. Investigation and clarification of risk and predisposing factors for DVT were identified from the literature, expert nursing knowledge, and medical staff input. Second, item development and weighting were undertaken. Third, parametric testing for content validity measured the differences in mean assessment tool scores between a group of patients who developed DVT in the hospital and a demographically similar group who did not develop DVT. Interrater reliability was measured by having three different nurses score each patient and compare the differences in scores among the three. The DVT group had significantly higher scores on the JFK DVT assessment scale than did those who did not experience DVT. Interrater reliability showed a strong correlation among the scores of the three nurses (.98). Providing a valid and reliable tool for measuring the risk for DVT or PE in hospitalized patients will enable nurses to intervene early in patients at risk. Basing DVT risk assessment on the evidence provided in this study will assist nurses in becoming more confident in recognizing the necessity for interventions in hospitalized patients and decreasing risk. Nurses can now evaluate patients at risk for DVT or PE using the JFK Medial Center's risk assessment tool.
Cold chain monitoring of OPV at transit levels in India: correlation of VVM and potency status.
Jain, R; Sahu, A K; Tewari, S; Malik, N; Singh, S; Khare, S; Bhatia, R
2003-12-01
We have conducted a study to analyze monitoring of the cold chain of 674 OPV field samples collected at four different levels of vaccine distribution viz., immunization clinics, district stores, hospitals and Primary Health Centers (PHC) from states of Uttar Pradesh, Madhya Pradesh, and Delhi. The study design included: collection and scoring of vaccine vial monitor (VVM) status of the samples and testing for total oral polio virus concentration (TOPV) by standard WHO protocol. Ten samples each were exposed to 25 degrees C and 37 degrees C, and 10 samples as controls were kept at -20 degrees C. VVM were scored daily till they attained grade 4 and each sample was subsequently subjected to potency testing for individual polio serotypes 1, 2 and 3, and TOPV. Of the 674 samples tested it was observed that: samples from immunization clinics and district stores had an acceptable VVM score of grade 1 and 2; however the probable risk that a sub potent vaccine could have been administered was 2.15%. In 2.5% samples received from district stores vaccine had a VVM score of grade 3 (i.e., discard point), although vaccine when tested was found to be potent (i.e., leading to the vaccine wastage). With exposure to higher temperatures, VVM changed score to grade 2 and 3 when the vaccine was kept at 25 degrees C/37 degrees C, and the titres of individual serotypes 1, 2 and 3 and TOPV were beyond the acceptable limits. Important observations at the different levels of vaccine distribution network and correlation of VVM and potency status of OPV are discussed in the paper which will be of help to the EPI program managers at different transit levels.
Desai, Arti D; Burkhart, Q; Parast, Layla; Simon, Tamara D; Allshouse, Carolyn; Britto, Maria T; Leyenaar, JoAnna K; Gidengil, Courtney A; Toomey, Sara L; Elliott, Marc N; Schneider, Eric C; Mangione-Smith, Rita
Few measures exist to assess pediatric transition quality between care settings. The study objective was to develop and pilot test caregiver-reported quality measures for pediatric hospital and emergency department (ED) to home transitions. On the basis of an evidence review, we developed draft caregiver-reported quality measures for transitions between sites of care. Using the RAND-UCLA Modified Delphi method, a multistakeholder panel endorsed measures for further development. Measures were operationalized into 2 surveys, which were administered to caregivers of patients (n = 2839) discharged from Seattle Children's Hospital between July 1 and September 1, 2014. Caregivers were randomized to mail or telephone survey mode. Measure scores were computed as a percentage of eligible caregivers who endorsed receiving the indicated care. Differences in scores were examined according to survey mode and caregiver characteristics. The Delphi panel endorsed 6 of 8 hospital to home transition measures and 2 of 3 ED to home transitions measures. Scores differed significantly according to mode for 1 measure. Caregivers with lower levels of educational attainment and/or Spanish-speaking caregivers reported significantly higher scores on 3 of the measures. The largest difference was reported for the measure that assessed whether caregivers received assistance with scheduling follow-up appointments; 92% score for caregivers with lower educational attainment versus 79% for caregivers with higher educational attainment (P < .001). We developed 8 new, evidence-based quality measures to assess transition quality from the perspective of caregivers. Pilot testing of these measures in a single institution yielded valuable insights for future testing and implementation of these measures. Copyright © 2016 Academic Pediatric Association. Published by Elsevier Inc. All rights reserved.
Lodeiro-Fernández, Leire; Lorenzo-López, Laura; Maseda, Ana; Núñez-Naveira, Laura; Rodríguez-Villamil, José Luis; Millán-Calenti, José Carlos
2015-01-01
Purpose The possible relationship between audiometric hearing thresholds and cognitive performance on language tests was analyzed in a cross-sectional cohort of older adults aged ≥65 years (N=98) with different degrees of cognitive impairment. Materials and methods Participants were distributed into two groups according to Reisberg’s Global Deterioration Scale (GDS): a normal/predementia group (GDS scores 1–3) and a moderate/moderately severe dementia group (GDS scores 4 and 5). Hearing loss (pure-tone audiometry) and receptive and production-based language function (Verbal Fluency Test, Boston Naming Test, and Token Test) were assessed. Results Results showed that the dementia group achieved significantly lower scores than the predementia group in all language tests. A moderate negative correlation between hearing loss and verbal comprehension (r=−0.298; P<0.003) was observed in the predementia group (r=−0.363; P<0.007). However, no significant relationship between hearing loss and verbal fluency and naming scores was observed, regardless of cognitive impairment. Conclusion In the predementia group, reduced hearing level partially explains comprehension performance but not language production. In the dementia group, hearing loss cannot be considered as an explanatory factor of poor receptive and production-based language performance. These results are suggestive of cognitive rather than simply auditory problems to explain the language impairment in the elderly. PMID:25914528
Using a genetic/clinical risk score to stop smoking (GeTSS): randomised controlled trial.
Nichols, John A A; Grob, Paul; Kite, Wendy; Williams, Peter; de Lusignan, Simon
2017-10-23
As genetic tests become cheaper, the possibility of their widespread availability must be considered. This study involves a risk score for lung cancer in smokers that is roughly 50% genetic (50% clinical criteria). The risk score has been shown to be effective as a smoking cessation motivator in hospital recruited subjects (not actively seeking cessation services). This was an RCT set in a United Kingdom National Health Service (NHS) smoking cessation clinic. Smokers were identified from medical records. Subjects that wanted to participate were randomised to a test group that was administered a gene-based risk test and given a lung cancer risk score, or a control group where no risk score was performed. Each group had 8 weeks of weekly smoking cessation sessions involving group therapy and advice on smoking cessation pharmacotherapy and follow-up at 6 months. The primary endpoint was smoking cessation at 6 months. Secondary outcomes included ranking of the risk score and other motivators. 67 subjects attended the smoking cessation clinic. The 6 months quit rates were 29.4%, (10/34; 95% CI 14.1-44.7%) for the test group and 42.9% (12/28; 95% CI 24.6-61.2%) for the controls. The difference is not significant. However, the quit rate for test group subjects with a "very high" risk score was 89% (8/9; 95% CI 68.4-100%) which was significant when compared with the control group (p = 0.023) and test group subjects with moderate risk scores had a 9.5% quit rate (2/21; 95% CI 2.7-28.9%) which was significantly lower than for above moderate risk score 61.5% (8/13; 95% CI 35.5-82.3; p = 0.03). Only the sub-group with the highest risk score showed an increased quit rate. Controls and test group subjects with a moderate risk score were relatively unlikely to have achieved and maintained non-smoker status at 6 months. ClinicalTrials.gov ID NCT01176383 (date of registration: 3 August 2010).
Effects of correcting for prematurity on cognitive test scores in childhood.
Wilson-Ching, Michelle; Pascoe, Leona; Doyle, Lex W; Anderson, Peter J
2014-03-01
The American Academy of Pediatrics recommends that test scores should be corrected for prematurity up to 3 years of age, but this practice varies greatly in both clinical and research settings. The aim of this study was to contrast the effects of using chronological age and those of using corrected age on measures of cognitive outcome across childhood. A theoretical model was constructed using norms from the Bayley Scales of Infant and Toddler Development, Third Edition; the Wechsler Preschool and Primary Scale of Intelligence, Third Edition Australian; and the Wechsler Intelligence Scales for Children, Fourth Edition Australian. Baseline scores representing different levels of functioning (70, below average; 85, borderline; and 100, average) were recalculated using the normative data for ages 6 months to 16 years to account for 1, 2, 3 and 4 months of prematurity. The model created depicted the difference in standardised scores between chronological and corrected age. Compared with scores corrected for prematurity, the absolute reduction in scores using chronological age was greater for increasing degree of prematurity, younger ages at assessment and higher baseline scores and was substantial even beyond 3 years of age. However, the pattern was erratic, with considerable fluctuation evident across different ages and baseline scores. Chronological age results in a lowering of scores at all ages for preterm-born subjects that is greater in the first few years and in those born at earlier gestational ages. Whether or not to correct for prematurity depends upon the context of the assessment. © 2014 The Authors. Journal of Paediatrics and Child Health © 2014 Paediatrics and Child Health Division (Royal Australasian College of Physicians).
A comparison of WISC-IV and SB-5 intelligence scores in adolescents with autism spectrum disorder.
Baum, Katherine T; Shear, Paula K; Howe, Steven R; Bishop, Somer L
2015-08-01
In autism spectrum disorders, results of cognitive testing inform clinical care, theories of neurodevelopment, and research design. The Wechsler Intelligence Scale for Children and the Stanford-Binet are commonly used in autism spectrum disorder evaluations and scores from these tests have been shown to be highly correlated in typically developing populations. However, they have not been compared in individuals with autism spectrum disorder, whose core symptoms can make testing challenging, potentially compromising test reliability. We used a within-subjects research design to evaluate the convergent validity between the Wechsler Intelligence Scale for Children, 4th ed., and Stanford-Binet, 5th ed., in 40 youth (ages 10-16 years) with autism spectrum disorder. Corresponding intelligence scores were highly correlated (r = 0.78 to 0.88), but full-scale intelligence quotient (IQ) scores (t(38) = -2.27, p = 0.03, d = -0.16) and verbal IQ scores (t(36) = 2.23, p = 0.03; d = 0.19) differed between the two tests. Most participants obtained higher full-scale IQ scores on the Stanford-Binet, 5th ed., compared to Wechsler Intelligence Scale for Children, 4th ed., with 14% scoring more than one standard deviation higher. In contrast, verbal indices were higher on the Wechsler Intelligence Scale for Children, 4th ed., Verbal-nonverbal discrepancy classifications were only consistent for 60% of the sample. Comparisons of IQ test scores in autism spectrum disorder and other special groups are important, as it cannot necessarily be assumed that convergent validity findings in typically developing children and adolescents hold true across all pediatric populations. © The Author(s) 2014.
ERIC Educational Resources Information Center
Ling, Guangming; Powers, Donald E.; Adler, Rachel M.
2014-01-01
One fundamental way to determine the validity of standardized English-language test scores is to investigate the extent to which they reflect anticipated learning effects in different English-language programs. In this study, we investigated the extent to which the "TOEFL iBT"® practice test reflects the learning effects of students at…
ERIC Educational Resources Information Center
Olneck, Michael R.
This study used five data sets to investigate the effects of measured cognitive skills on educational attainment, and the effects of cognitive skills and educational attainment on occupational status and earning among men with low test scores, as compared to men with high test scores, and among men with blue-collar fathers, as compared to men with…