Do Examinees Understand Score Reports for Alternate Methods of Scoring Computer Based Tests?
ERIC Educational Resources Information Center
Whittaker, Tiffany A.; Williams, Natasha J.; Dodd, Barbara G.
2011-01-01
This study assessed the interpretability of scaled scores based on either number correct (NC) scoring for a paper-and-pencil test or one of two methods of scoring computer-based tests: an item pattern (IP) scoring method and a method based on equated NC scoring. The equated NC scoring method for computer-based tests was proposed as an alternative…
Gavett, Brandon E
2015-03-01
The base rates of abnormal test scores in cognitively normal samples have been a focus of recent research. The goal of the current study is to illustrate how Bayes' theorem uses these base rates--along with the same base rates in cognitively impaired samples and prevalence rates of cognitive impairment--to yield probability values that are more useful for making judgments about the absence or presence of cognitive impairment. Correlation matrices, means, and standard deviations were obtained from the Wechsler Memory Scale--4th Edition (WMS-IV) Technical and Interpretive Manual and used in Monte Carlo simulations to estimate the base rates of abnormal test scores in the standardization and special groups (mixed clinical) samples. Bayes' theorem was applied to these estimates to identify probabilities of normal cognition based on the number of abnormal test scores observed. Abnormal scores were common in the standardization sample (65.4% scoring below a scaled score of 7 on at least one subtest) and more common in the mixed clinical sample (85.6% scoring below a scaled score of 7 on at least one subtest). Probabilities varied according to the number of abnormal test scores, base rates of normal cognition, and cutoff scores. The results suggest that interpretation of base rates obtained from cognitively healthy samples must also account for data from cognitively impaired samples. Bayes' theorem can help neuropsychologists answer questions about the probability that an individual examinee is cognitively healthy based on the number of abnormal test scores observed.
Karr, Justin E; Garcia-Barrera, Mauricio A; Holdnack, James A; Iverson, Grant L
2018-01-01
Multivariate base rates allow for the simultaneous statistical interpretation of multiple test scores, quantifying the normal frequency of low scores on a test battery. This study provides multivariate base rates for the Delis-Kaplan Executive Function System (D-KEFS). The D-KEFS consists of 9 tests with 16 Total Achievement scores (i.e. primary indicators of executive function ability). Stratified by education and intelligence, multivariate base rates were derived for the full D-KEFS and an abbreviated four-test battery (i.e. Trail Making, Color-Word Interference, Verbal Fluency, and Tower Test) using the adult portion of the normative sample (ages 16-89). Multivariate base rates are provided for the full and four-test D-KEFS batteries, calculated using five low score cutoffs (i.e. ≤25th, 16th, 9th, 5th, and 2nd percentiles). Low scores occurred commonly among the D-KEFS normative sample, with 82.6 and 71.8% of participants obtaining at least one score ≤16th percentile for the full and four-test batteries, respectively. Intelligence and education were inversely related to low score frequency. The base rates provided herein allow clinicians to interpret multiple D-KEFS scores simultaneously for the full D-KEFS and an abbreviated battery of commonly administered tests. The use of these base rates will support clinicians when differentiating between normal variations in cognitive performance and true executive function deficits.
Jaiprakash, Heethal; Min, Aung Ko Ko; Ghosh, Sarmishtha
2016-03-01
This paper is aimed at finding if there was a change of correlation between the written test score and tutors' performance test scores in the assessment of medical students during a problem-based learning (PBL) course in Malaysia. This is a cross-sectional observational study, conducted among 264 medical students in two groups from November 2010 to November 2012. The first group's tutors did not receive tutor training; while the second group's tutors were trained in the PBL process. Each group was divided into high, middle and low achievers based on their end-of-semester exam scores. PBL scores were taken which included written test scores and tutors' performance test scores. Pearson correlation coefficient was calculated between the two kinds of scores in each group. The correlation coefficient between the written scores and tutors' scores in group 1 was 0.099 (p<0.001) and for group 2 was 0.305 (p<0.001). The higher correlation coefficient in the group where tutors received the PBL training reinforces the importance of tutor training before their participation in the PBL course.
ERIC Educational Resources Information Center
Lee, Guemin; Park, In-Yong
2012-01-01
Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…
A Comment on Early Student Blunders on Computer-Based Adaptive Tests
ERIC Educational Resources Information Center
Green, Bert F.
2011-01-01
This article refutes a recent claim that computer-based tests produce biased scores for very proficient test takers who make mistakes on one or two initial items and that the "bias" can be reduced by using a four-parameter IRT model. Because the same effect occurs with pattern scores on nonadaptive tests, the effect results from IRT scoring, not…
Strom, Suzanne L; Anderson, Craig L; Yang, Luanna; Canales, Cecilia; Amin, Alpesh; Lotfipour, Shahram; McCoy, C Eric; Osborn, Megan Boysen; Langdorf, Mark I
2015-11-01
Traditional Advanced Cardiac Life Support (ACLS) courses are evaluated using written multiple-choice tests. High-fidelity simulation is a widely used adjunct to didactic content, and has been used in many specialties as a training resource as well as an evaluative tool. There are no data to our knowledge that compare simulation examination scores with written test scores for ACLS courses. To compare and correlate a novel high-fidelity simulation-based evaluation with traditional written testing for senior medical students in an ACLS course. We performed a prospective cohort study to determine the correlation between simulation-based evaluation and traditional written testing in a medical school simulation center. Students were tested on a standard acute coronary syndrome/ventricular fibrillation cardiac arrest scenario. Our primary outcome measure was correlation of exam results for 19 volunteer fourth-year medical students after a 32-hour ACLS-based Resuscitation Boot Camp course. Our secondary outcome was comparison of simulation-based vs. written outcome scores. The composite average score on the written evaluation was substantially higher (93.6%) than the simulation performance score (81.3%, absolute difference 12.3%, 95% CI [10.6-14.0%], p<0.00005). We found a statistically significant moderate correlation between simulation scenario test performance and traditional written testing (Pearson r=0.48, p=0.04), validating the new evaluation method. Simulation-based ACLS evaluation methods correlate with traditional written testing and demonstrate resuscitation knowledge and skills. Simulation may be a more discriminating and challenging testing method, as students scored higher on written evaluation methods compared to simulation.
Confidence Intervals for Weighted Composite Scores under the Compound Binomial Error Model
ERIC Educational Resources Information Center
Kim, Kyung Yong; Lee, Won-Chan
2018-01-01
Reporting confidence intervals with test scores helps test users make important decisions about examinees by providing information about the precision of test scores. Although a variety of estimation procedures based on the binomial error model are available for computing intervals for test scores, these procedures assume that items are randomly…
Self-Monitoring Assessments for Educational Accountability Systems
ERIC Educational Resources Information Center
Koretz, Daniel; Beguin, Anton
2010-01-01
Test-based accountability is now the cornerstone of U.S. education policy, and it is becoming more important in many other nations as well. Educators sometimes respond to test-based accountability in ways that produce score inflation. In the past, score inflation has usually been evaluated by comparing trends in scores on a high-stakes test to…
Reliability of Total Test Scores When Considered as Ordinal Measurements
ERIC Educational Resources Information Center
Biswas, Ajoy Kumar
2006-01-01
This article studies the ordinal reliability of (total) test scores. This study is based on a classical-type linear model of observed score (X), true score (T), and random error (E). Based on the idea of Kendall's tau-a coefficient, a measure of ordinal reliability for small-examinee populations is developed. This measure is extended to large…
ERIC Educational Resources Information Center
Feldt, Leonard S.
2004-01-01
In some settings, the validity of a battery composite or a test score is enhanced by weighting some parts or items more heavily than others in the total score. This article describes methods of estimating the total score reliability coefficient when differential weights are used with items or parts.
ERIC Educational Resources Information Center
Helwig, Robert; Anderson, Lisbeth; Tindal, Gerald
2002-01-01
An 11-item math concept curriculum-based measure (CBM) was administered to 171 eighth grade students. Scores were correlated with scores from a computer adaptive test designed in conjunction with the state to approximate the official statewide mathematics achievement tests. Correlations for general education students and students with learning…
Verification of learner’s differences by team-based learning in biochemistry classes
2017-01-01
Purpose We tested the effect of team-based learning (TBL) on medical education through the second-year premedical students’ TBL scores in biochemistry classes over 5 years. Methods We analyzed the results based on test scores before and after the students’ debate. The groups of students for statistical analysis were divided as follows: group 1 comprised the top-ranked students, group 3 comprised the low-ranked students, and group 2 comprised the medium-ranked students. Therefore, group T comprised 382 students (the total number of students in group 1, 2, and 3). To calibrate the difficulty of the test, original scores were converted into standardized scores. We determined the differences of the tests using Student t-test, and the relationship between scores before, and after the TBL using linear regression tests. Results Although there was a decrease in the lowest score, group T and 3 showed a significant increase in both original and standardized scores; there was also an increase in the standardized score of group 3. There was a positive correlation between the pre- and the post-debate scores in group T, and 2. And the beta values of the pre-debate scores and “the changes between the pre- and post-debate scores” were statistically significant in both original and standardized scores. Conclusion TBL is one of the educational methods for helping students improve their grades, particularly those of low-ranked students. PMID:29207457
ERIC Educational Resources Information Center
Meijer, Rob R.
2004-01-01
Two new methods have been proposed to determine unexpected sum scores on sub-tests (testlets) both for paper-and-pencil tests and computer adaptive tests. A method based on a conservative bound using the hypergeometric distribution, denoted p, was compared with a method where the probability for each score combination was calculated using a…
ERIC Educational Resources Information Center
George-Ezzelle, Carol E.; Skaggs, Gary
2004-01-01
Current testing standards call for test developers to provide evidence that testing procedures and test scores, and the inferences made based on the test scores, show evidence of validity and are comparable across subpopulations (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on…
MANUSCRIPT IN PRESS: DEMENTIA & GERIATRIC COGNITIVE DISORDERS
O’Bryant, Sid E.; Xiao, Guanghua; Barber, Robert; Cullum, C. Munro; Weiner, Myron; Hall, James; Edwards, Melissa; Grammas, Paula; Wilhelmsen, Kirk; Doody, Rachelle; Diaz-Arrastia, Ramon
2015-01-01
Background Prior work on the link between blood-based biomarkers and cognitive status has largely been based on dichotomous classifications rather than detailed neuropsychological functioning. The current project was designed to create serum-based biomarker algorithms that predict neuropsychological test performance. Methods A battery of neuropsychological measures was administered. Random forest analyses were utilized to create neuropsychological test-specific biomarker risk scores in a training set that were entered into linear regression models predicting the respective test scores in the test set. Serum multiplex biomarker data were analyzed on 108 proteins from 395 participants (197 AD cases and 198 controls) from the Texas Alzheimer’s Research and Care Consortium. Results The biomarker risk scores were significant predictors (p<0.05) of scores on all neuropsychological tests. With the exception of premorbid intellectual status (6.6%), the biomarker risk scores alone accounted for a minimum of 12.9% of the variance in neuropsychological scores. Biomarker algorithms (biomarker risk scores + demographics) accounted for substantially more variance in scores. Review of the variable importance plots indicated differential patterns of biomarker significance for each test, suggesting the possibility of domain-specific biomarker algorithms. Conclusions Our findings provide proof-of-concept for a novel area of scientific discovery, which we term “molecular neuropsychology.” PMID:24107792
Further Study of the Choice of Anchor Tests in Equating
ERIC Educational Resources Information Center
Trierweiler, Tammy J.; Lewis, Charles; Smith, Robert L.
2016-01-01
In this study, we describe what factors influence the observed score correlation between an (external) anchor test and a total test. We show that the anchor to full-test observed score correlation is based on two components: the true score correlation between the anchor and total test, and the reliability of the anchor test. Findings using an…
Jones, Loretta; Bazargan, Mohsen; Lucas-Wright, Anna; Vadgama, Jaydutt V; Vargas, Roberto; Smith, James; Otoukesh, Salman; Maxwell, Annette E
2013-01-01
Most theoretical formulations acknowledge that knowledge and awareness of cancer screening and prevention recommendations significantly influence health behaviors. This study compares perceived knowledge of cancer prevention and screening with test-based knowledge in a community sample. We also examine demographic variables and self-reported cancer screening and prevention behaviors as correlates of both knowledge scores, and consider whether cancer related knowledge can be accurately assessed using just a few, simple questions in a short and easy-to-complete survey. We used a community-partnered participatory research approach to develop our study aims and a survey. The study sample was composed of 180 predominantly African American and Hispanic community individuals who participated in a full-day cancer prevention and screening promotion conference in South Los Angeles, California, on July 2011. Participants completed a self-administered survey in English or Spanish at the beginning of the conference. Our data indicate that perceived and test-based knowledge scores are only moderately correlated. Perceived knowledge score shows a stronger association with demographic characteristics and other cancer related variables than the test-based score. Thirteen out of twenty variables that are examined in our study showed a statistically significant correlation with the perceived knowledge score, however, only four variables demonstrated a statistically significant correlation with the test-based knowledge score. Perceived knowledge of cancer prevention and screening was assessed with fewer items than test-based knowledge. Thus, using this assessment could potentially reduce respondent burden. However, our data demonstrate that perceived and test-based knowledge are separate constructs.
ERIC Educational Resources Information Center
Moses, Tim
2013-01-01
The purpose of this report is to review ETS psychometric contributions that focus on test scores. Two major sections review contributions based on assessing test scores' measurement characteristics and other contributions about using test scores as predictors in correlational and regression relationships. An additional section reviews additional…
The Comparison of Accuracy Scores on the Paper and Pencil Testing vs. Computer-Based Testing
ERIC Educational Resources Information Center
Retnawati, Heri
2015-01-01
This study aimed to compare the accuracy of the test scores as results of Test of English Proficiency (TOEP) based on paper and pencil test (PPT) versus computer-based test (CBT). Using the participants' responses to the PPT documented from 2008-2010 and data of CBT TOEP documented in 2013-2014 on the sets of 1A, 2A, and 3A for the Listening and…
ERIC Educational Resources Information Center
Kim, Seonghoon
2013-01-01
With known item response theory (IRT) item parameters, Lord and Wingersky provided a recursive algorithm for computing the conditional frequency distribution of number-correct test scores, given proficiency. This article presents a generalized algorithm for computing the conditional distribution of summed test scores involving real-number item…
ERIC Educational Resources Information Center
Truell, Allen D.; Zhao, Jensen J.; Alexander, Melody W.
2005-01-01
The purposes of this study were to determine if there is a significant difference in postsecondary business student scores and test completion time based on settable test item exposure control interface format, and to determine if there is a significant difference in student scores and test completion time based on settable test item exposure…
ERIC Educational Resources Information Center
Davis, Holly S.
This study examines the correlation between absence, cognitive skills index (CSI), and various achievement indicators such as the Indiana Statewide Testing for Educational Progress (ISTEP) test scores, discrepancies, and school-based English and mathematics tests for 64 seventh-grade students from one middle school. Scores for each of the subtests…
ERIC Educational Resources Information Center
Needham, Martha Elaine
2010-01-01
This research compares differences between standardized test scores in problem-based learning (PBL) classrooms and a traditional classroom for 6th grade students using a mixed-method, quasi-experimental and qualitative design. The research shows that problem-based learning is as effective as traditional teaching methods on standardized tests. The…
Validity and Reliability of Baseline Testing in a Standardized Environment.
Higgins, Kathryn L; Caze, Todd; Maerlender, Arthur
2017-08-11
The Immediate Postconcussion Assessment and Cognitive Testing (ImPACT) is a computerized neuropsychological test battery commonly used to determine cognitive recovery from concussion based on comparing post-injury scores to baseline scores. This model is based on the premise that ImPACT baseline test scores are a valid and reliable measure of optimal cognitive function at baseline. Growing evidence suggests that this premise may not be accurate and a large contributor to invalid and unreliable baseline test scores may be the protocol and environment in which baseline tests are administered. This study examined the effects of a standardized environment and administration protocol on the reliability and performance validity of athletes' baseline test scores on ImPACT by comparing scores obtained in two different group-testing settings. Three hundred-sixty one Division 1 cohort-matched collegiate athletes' baseline data were assessed using a variety of indicators of potential performance invalidity; internal reliability was also examined. Thirty-one to thirty-nine percent of the baseline cases had at least one indicator of low performance validity, but there were no significant differences in validity indicators based on environment in which the testing was conducted. Internal consistency reliability scores were in the acceptable to good range, with no significant differences between administration conditions. These results suggest that athletes may be reliably performing at levels lower than their best effort would produce. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Eckner, James T; Rettmann, Ashley; Narisetty, Naveen; Greer, Jacob; Moore, Brandon; Brimacombe, Susan; He, Xuming; Broglio, Steven P
2016-01-01
To determine test-re-test reliabilities of novel Evoked Response Potential (ERP)-based Brain Network Activation (BNA) scores in healthy athletes. Observational, repeated-measures study. Forty-two healthy male and female high school and collegiate athletes completed auditory oddball and go/no-go ERP assessments at baseline, 1 week, 6 weeks and 1 year. The BNA algorithm was applied to the ERP data, considering electrode location, frequency band, peak latency and normalized amplitude to generate seven unique BNA scores for each testing session. Mean BNA scores, intra-class correlation coefficient (ICC) values and reliable change (RC) values were calculated for each of the seven BNA networks. BNA scores ranged from 46.3 ± 34.9 to 69.9 ± 22.8, ICC values ranged from 0.46-0.65 and 95% RC values ranged from 38.3-68.1 across the seven networks. The wide range of BNA scores observed in this population of healthy athletes suggests that a single BNA score or set of BNA scores from a single after-injury test session may be difficult to interpret in isolation without knowledge of the athlete's own baseline BNA score(s) and/or the results of serial tests performed at additional time points. The stability of each BNA network should be considered when interpreting test-re-test BNA score changes.
Binetruy, M; Mauny, F; Lavaux, M; Meyer, A; Sylvestre, G; Puyraveau, M; Berger, E; Magnin, E; Vandel, P; Galmiche, J; Chopard, G
Cognitive evaluation of young subjects is now widely carried out for non-traumatic diseases such as multiple sclerosis, HIV, or sleep disorders. This evaluation requires normative data based on healthy adult samples. However, most clinicians use a set of tests that were normed in an isolated manner from different samples using different cutoff criteria. Thus, the score of an individual may be considered either normal or impaired according to the norms used. It is well established that healthy adults obtained low-test scores when a battery of tests is administered. Thus, the knowledge of low base rates is required so as to minimize false diagnosis of cognitive impairment. The aim of this study was twofold (1) to provide normative data for RAPID-II battery in healthy adults, and (2) estimate the proportion of healthy adults having low scores across this battery. Norms for the 44 test scores of the RAPID-II test battery were developed using the overall sample of 335 individuals based on three categories of age (20 to 29, 30 to 39, and 40 to 49 years) and two educational levels: Baccalaureate or higher educational degree (high educational level), lower than baccalaureate (low educational level). The 5th, 25th, 50th, and 75th percentiles were calculated from the six age and education subsamples and used to define norms. The frequency of low scores on the RAPID-II battery was calculated by simultaneously examining the performance of 33 primary scores. A low score was defined as less than or equal to the 5th percentile drawn from the six age and education normative subsamples. In addition, the percentages of low scores were also determined when all possible combinations of two-test scores across the RAPID-II were considered in the overall normative sample. Our data showed that 59.4% subjects of the normative sample obtained at least one or more low score. With more than 9 test scores, this percentage was equal to 0% in the normative sample. Among all combinations of two-test scores, 96% had a false positive rate<2%. Low scores are very common in young healthy subjects and are more obvious when simultaneously analyzing test scores across a battery of tests and are thus not necessarily indicative of cognitive impairment. The combinations of two-test scores can be a useful tool to improve the interpretation of low scores. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
ERIC Educational Resources Information Center
Sahin, Füsun
2017-01-01
Examining the testing processes, as well as the scores, is needed for a complete understanding of validity and fairness of computer-based assessments. Examinees' rapid-guessing and insufficient familiarity with computers have been found to be major issues that weaken the validity arguments of scores. This study has three goals: (a) improving…
Butler, Bennet A; Lawton, Cort D; Burgess, Jamie; Balderama, Earvin S; Barsness, Katherine A; Sarwark, John F
2017-12-06
Simulation-based education has been integrated into many orthopaedic residency programs to augment traditional teaching models. Here we describe the development and implementation of a combined didactic and simulation-based course for teaching medical students and interns how to properly perform a closed reduction and percutaneous pinning of a pediatric supracondylar humeral fracture. Subjects included in the study were either orthopaedic surgery interns or subinterns at our institution. Subjects all completed a combined didactic and simulation-based course on pediatric supracondylar humeral fractures. The first part of this course was an electronic (e)-learning module that the subjects could complete at home in approximately 40 minutes. The second part of the course was a 20-minute simulation-based skills learning session completed in the simulation center. Subject knowledge of closed reduction and percutaneous pinning of supracondylar humeral fractures was tested using a 30-question, multiple-choice, written test. Surgical skills were tested in the operating room or in a simulated operating room. Subject pre-intervention and post-intervention scores were compared to determine if and how much they had improved. A total of 21 subjects were tested. These subjects significantly improved their scores on both the written, multiple-choice test and skills test after completing the combined didactic and simulation module. Prior to the module, intern and subintern multiple-choice test scores were significantly worse than postgraduate year (PGY)-2 to PGY-5 resident scores (p < 0.01); after completion of the module, there was no significant difference in the multiple-choice test scores. After completing the module, there was no significant difference in skills test scores between interns and PGY-2 to PGY-5 residents. Both tests were validated using the scores obtained from PGY-2 to PGY-5 residents. Our combined didactic and simulation course significantly improved intern and subintern understanding of supracondylar humeral fractures and their ability to perform a closed reduction and percutaneous pinning of these fractures.
Effects of Student Population Density on Academic Achievement in Georgia Elementary Schools.
ERIC Educational Resources Information Center
Swift, Diane O'Rourke
The purpose of this study was to determine the relationship between school density and achievement test scores. The study utilized a bipolar sample in order to include schools whose achievement scores were at the top and bottom of the population spectrum when considering Iowa Tests of Basic Skills (ITBS) scores. Based on comparing test scores and…
The Essay Scoring and Scorer Reliability in TOEFL CBT.
ERIC Educational Resources Information Center
Lee, Yong-Won
An essay test is now an integral part of the computer based Test of English as a Foreign Language (TOEFL-CBT). This paper provides a brief overview of the current TOEFL-CBT essay test, describes the operational procedures for essay scoring, including the Online Scoring Network (OSN) of the Educational Testing Service (ETS), and discusses major…
ERIC Educational Resources Information Center
Schochet, Peter Z.; Chiang, Hanley S.
2010-01-01
This paper addresses likely error rates for measuring teacher and school performance in the upper elementary grades using value-added models applied to student test score gain data. Using realistic performance measurement system schemes based on hypothesis testing, we develop error rate formulas based on OLS and Empirical Bayes estimators.…
Aşkar, Petek; Altun, Arif; Cangöz, Banu; Cevik, Vildan; Kaya, Galip; Türksoy, Hasan
2012-04-01
The purpose of this study was to assess whether a computerized battery of neuropsychological tests could produce similar results as the conventional forms. Comparisons on 77 volunteer undergraduates were carried out with two neuropsychological tests: Line Orientation Test and Enhanced Cued Recall Test. Firstly, students were assigned randomly across the test medium (paper-and-pencil versus computerized). Secondly, the groups were given the same test in the other medium after a 30-day interval between tests. Results showed that the Enhanced Cued Recall Test-Computer-based did not correlate with the Enhanced Cued Recall Test-Paper-and-pencil results. Line Orientation Test-Computer-based scores, on the other hand, did correlate significantly with the Line Orientation Test-Paper-and-pencil version. In both tests, scores were higher on paper-and-pencil tests compared to computer-based tests. Total score difference between modalities was statistically significant for both Enhanced Cued Recall Tests and for the Line Orientation Test. In both computer-based tests, it took less time for participants to complete the tests.
Menkes, Daniel L; Reed, Mary
2008-01-01
To determine the effectiveness of didactic case-based instruction methodology to improve medical student comprehension of common neurological illnesses and neurological emergencies. Neurology department, academic university. 415 third and fourth year medical students performing a required four week neurology clerkship. Raw test scores on a 1 hour, 50-item clinical vignette based examination and open-ended questions in a post-clerkship feedback session. There was a statistically significant improvement in overall test scores (p<0.001). Didactic teaching sessions have a significant positive impact on neurology student clerkship test score performance and perception of their educational experience. Confirmation of these results across multiple specialties in a multi-center trial is warranted.
ERIC Educational Resources Information Center
Jacob, Brian A.
2016-01-01
Contrary to popular belief, modern cognitive assessments--including the new Common Core tests--produce test scores based on sophisticated statistical models rather than the simple percent of items a student answers correctly. While there are good reasons for this, it means that reported test scores depend on many decisions made by test designers,…
A Web-based course on infection control for physicians in training: an educational intervention.
Fakih, Mohamad G; Enayet, Iram; Minnick, Steven; Saravolatz, Louis D
2006-07-01
To evaluate the effectiveness of a Web-based course on infection control accessed by physicians in training. Educational intervention. A 607-bed urban teaching hospital. A total of 55 physicians in training beginning their first postgraduate year (the iPGY1 group) and 59 physicians completing their first, second, or third postgraduate year (the oPGY group). Individuals in the iPGY1 group took a Web-based course on infection control practices. Persons in the iPGY1 group who took the Web-based course completed an evaluation test consisting of 15 multiple-choice questions (total possible score, 15 points). The same test was given to persons in the oPGY group, who did not take the Web-based course. We compared scores of the Web-based test taken by subjects in the iPGY1 group immediately after the course with scores of the test they took 3 months after the course and with test scores of subjects in the oPGY group. The mean score (+/-SD) for subjects in the iPGY1 group who took the Web-based course was 10.6+/-2.2, compared with 8.0+/-2.5 for subjects in the oPGY group (P<.001). The mean score (+/-SD) for subjects in the iPGY1 group 3 months after completing the course decreased to 8.0+/-2.4 (P<.001 by the paired t test). For the oPGY group, significant differences were found between the scores (+/-SD) for subjects in the internal medicine (9.9+/-2.3), emergency medicine (8.4+/-1.7), pediatrics (7.0+/-1.7), and family medicine (5.8+/-1.6) residency programs (P<.001); there were no significant differences in scores according to the year of residency. Web-based infection control courses are an attractive teaching tool for physicians in training and need to be considered for teaching infection control. The evaluation of information retention will help identify physicians in training who require further training.
Scoring Yes-No Vocabulary Tests: Reaction Time vs. Nonword Approaches
ERIC Educational Resources Information Center
Pellicer-Sanchez, Ana; Schmitt, Norbert
2012-01-01
Despite a number of research studies investigating the Yes-No vocabulary test format, one main question remains unanswered: What is the best scoring procedure to adjust for testee overestimation of vocabulary knowledge? Different scoring methodologies have been proposed based on the inclusion and selection of nonwords in the test. However, there…
A Comparison of Decision-Making Methods for Criterion-Referenced Tests.
ERIC Educational Resources Information Center
Haladyna, Tom; Roid, Gale
The problems associated with misclassifying students when pass-fail decisions are based on test scores are discussed. One protection against misclassification is to set a confidence interval around the cutting score. Those whose scores fall above the interval are passed; those whose scores fall below the interval are failed; and those whose scores…
Khodaveisi, Masoud; Qaderian, Khosro; Oshvandi, Khodayar; Soltanian, Ali Reza; Vardanjani, Mehdi molavi
2017-01-01
Background and aims learning plays an important role in developing nursing skills and right care-taking. The Present study aims to evaluate two learning methods based on team –based learning and lecture-based learning in learning care-taking of patients with diabetes in nursing students. Method In this quasi-experimental study, 64 students in term 4 in nursing college of Bukan and Miandoab were included in the study based on knowledge and performance questionnaire including 15 questions based on knowledge and 5 questions based on performance on care-taking in patients with diabetes were used as data collection tool whose reliability was confirmed by cronbach alpha (r=0.83) by the researcher. To compare the mean score of knowledge and performance in each group in pre-test step and post-test step, pair –t test and to compare mean of scores in two groups of control and intervention, the independent t- test was used. Results There was not significant statistical difference between two groups in pre terms of knowledge and performance score (p=0.784). There was significant difference between the mean of knowledge scores and diabetes performance in the post-test in the team-based learning group and lecture-based learning group (p=0.001). There was significant difference between the mean score of knowledge of diabetes care in pre-test and post-test in base learning groups (p=0.001). Conclusion In both methods team-based and lecture-based learning approaches resulted in improvement in learning in students, but the rate of learning in the team-based learning approach is greater compared to that of lecture-based learning and it is recommended that this method be used as a higher education method in the education of students.
Ertmer, David J.
2012-01-01
Purpose This investigation sought to determine whether scores from a commonly used word-based articulation test are closely associated with speech intelligibility in children with hearing loss. If the scores are closely related, articulation testing results might be used to estimate intelligibility. If not, the importance of direct assessment of intelligibility would be reinforced. Methods Forty-four children with hearing losses produced words from the Goldman-Fristoe Test of Articulation-2 and sets of 10 short sentences. Correlation analyses were conducted between scores for seven word-based predictor variables and percent-intelligible scores derived from listener judgments of stimulus sentences. Results Six of seven predictor variables were significantly correlated with percent-intelligible scores. However, regression analysis revealed that no single predictor variable or multi- variable model accounted for more than 25% of the variability in intelligibility scores. Implications The findings confirm the importance of assessing connected speech intelligibility directly. PMID:20220022
ERIC Educational Resources Information Center
Educational Testing Service, 2008
2008-01-01
The Test of English as a Foreign Language[TM], better known as TOEFL[R], is designed to measure the English-language proficiency of people whose native language is not English. TOEFL scores are accepted by more than 6,000 colleges, universities, and licensing agencies in 130 countries. The test is also used by governments, and scholarship and…
Validation of the Narrowing Beam Walking Test in Lower Limb Prosthesis Users.
Sawers, Andrew; Hafner, Brian
2018-04-11
To evaluate the content, construct, and discriminant validity of the Narrowing Beam Walking Test (NBWT), a performance-based balance test for lower limb prosthesis users. Cross-sectional study. Research laboratory and prosthetics clinic. Unilateral transtibial and transfemoral prosthesis users (N=40). Not applicable. Content validity was examined by quantifying the percentage of participants receiving maximum or minimum scores (ie, ceiling and floor effects). Convergent construct validity was examined using correlations between participants' NBWT scores and scores or times on existing clinical balance tests regularly administered to lower limb prosthesis users. Known-groups construct validity was examined by comparing NBWT scores between groups of participants with different fall histories, amputation levels, amputation etiologies, and functional levels. Discriminant validity was evaluated by analyzing the area under each test's receiver operating characteristic (ROC) curve. No minimum or maximum scores were recorded on the NBWT. NBWT scores demonstrated strong correlations (ρ=.70‒.85) with scores/times on performance-based balance tests (timed Up and Go test, Four Square Step Test, and Berg Balance Scale) and a moderate correlation (ρ=.49) with the self-report Activities-specific Balance Confidence scale. NBWT performance was significantly lower among participants with a history of falls (P=.003), transfemoral amputation (P=.011), and a lower mobility level (P<.001). The NBWT also had the largest area under the ROC curve (.81) and was the only test to exhibit an area that was statistically significantly >.50 (ie, chance). The results provide strong evidence of content, construct, and discriminant validity for the NBWT as a performance-based test of balance ability. The evidence supports its use to assess balance impairments and fall risk in unilateral transtibial and transfemoral prosthesis users. Copyright © 2018 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Analysis of Added Value of Subscores with Respect to Classification
ERIC Educational Resources Information Center
Sinharay, Sandip
2014-01-01
Brennan noted that users of test scores often want (indeed, demand) that subscores be reported, along with total test scores, for diagnostic purposes. Haberman suggested a method based on classical test theory (CTT) to determine if subscores have added value over the total score. One way to interpret the method is that a subscore has added value…
Proposal for a new categorization of aseptic processing facilities based on risk assessment scores.
Katayama, Hirohito; Toda, Atsushi; Tokunaga, Yuji; Katoh, Shigeo
2008-01-01
Risk assessment of aseptic processing facilities was performed using two published risk assessment tools. Calculated risk scores were compared with experimental test results, including environmental monitoring and media fill run results, in three different types of facilities. The two risk assessment tools used gave a generally similar outcome. However, depending on the tool used, variations were observed in the relative scores between the facilities. For the facility yielding the lowest risk scores, the corresponding experimental test results showed no contamination, indicating that these ordinal testing methods are insufficient to evaluate this kind of facility. A conventional facility having acceptable aseptic processing lines gave relatively high risk scores. The facility showing a rather high risk score demonstrated the usefulness of conventional microbiological test methods. Considering the significant gaps observed in calculated risk scores and in the ordinal microbiological test results between advanced and conventional facilities, we propose a facility categorization based on risk assessment. The most important risk factor in aseptic processing is human intervention. When human intervention is eliminated from the process by advanced hardware design, the aseptic processing facility can be classified into a new risk category that is better suited for assuring sterility based on a new set of criteria rather than on currently used microbiological analysis. To fully benefit from advanced technologies, we propose three risk categories for these aseptic facilities.
Maddry, Joseph K; Varney, Shawn M; Sessions, Daniel; Heard, Kennon; Thaxton, Robert E; Ganem, Victoria J; Zarzabal, Lee A; Bebarta, Vikhyat S
2014-12-01
Simulation-based teaching (SIM) is a common method for medical education. SIM exposes residents to uncommon scenarios that require critical, timely actions. SIM may be a valuable training method for critically ill poisoned patients whose diagnosis and treatment depend on key clinical findings. Our objective was to compare medical simulation (SIM) to traditional lecture-based instruction (LEC) for training emergency medicine (EM) residents in the acute management of critically ill poisoned patients. EM residents completed two pre-intervention questionnaires: (1) a 24-item multiple-choice test of four toxicological emergencies and (2) a questionnaire using a five-point Likert scale to rate the residents' comfort level in diagnosing and treating patients with specific toxicological emergencies. After completing the pre-intervention questionnaires, residents were randomized to SIM or LEC instruction. Two toxicologists and three EM physicians presented four toxicology topics to both groups in four 20-min sessions. One group was in the simulation center, and the other in a lecture hall. Each group then repeated the multiple-choice test and questionnaire immediately after instruction and again at 3 months after training. Answers were not discussed. The primary outcome was comparison of immediate mean post-intervention test scores and final scores 3 months later between SIM and LEC groups. Test score outcomes between groups were compared at each time point (pre-test, post-instruction, 3-month follow-up) using Wilcoxon rank sum test. Data were summarized by descriptive statistics. Continuous variables were characterized by means (SD) and tested using t tests or Wilcoxon rank sum. Categorical variables were summarized by frequencies (%) and compared between training groups with chi-square or Fisher's exact test. Thirty-two EM residents completed pre- and post-intervention tests and comfort questionnaires on the study day. Both groups had higher post-intervention mean test scores (p < 0.001), but the LEC group showed a greater improvement compared to the SIM group (5.6 [2.3] points vs. 3.6 [2.4], p = 0.02). At the 3-month follow-up, 24 (75 %) tests and questionnaires were completed. There was no improvement in 3-month mean test scores in either group compared to immediate post-test scores. The SIM group had higher final mean test scores than the LEC group (16.6 [3.1] vs. 13.3 [2.2], p = 0.009). SIM and LEC groups reported similar diagnosis and treatment comfort level scores at baseline and improved equally after instruction. At 3 months, there was no difference between groups in comfort level scores for diagnosis or treatment. Lecture-based teaching was more effective than simulation-based instruction immediately after intervention. At 3 months, the SIM group showed greater retention than the LEC group. Resident comfort levels for diagnosis and treatment were similar regardless of the type of education.
Ruvinsky, Anatoly M
2007-06-01
We present results of testing the ability of eleven popular scoring functions to predict native docked positions using a recently developed method (Ruvinsky and Kozintsev, J Comput Chem 2005, 26, 1089) for estimation the entropy contributions of relative motions to protein-ligand binding affinity. The method is based on the integration of the configurational integral over clusters obtained from multiple docked positions. We use a test set of 100 PDB protein-ligand complexes and ensembles of 101 docked positions generated by (Wang et al. J Med Chem 2003, 46, 2287) for each ligand in the test set. To test the suggested method we compared the averaged root-mean square deviations (RMSD) of the top-scored ligand docked positions, accounting and not accounting for entropy contributions, relative to the experimentally determined positions. We demonstrate that the method increases docking accuracy by 10-21% when used in conjunction with the AutoDock scoring function, by 2-25% with G-Score, by 7-41% with D-Score, by 0-8% with LigScore, by 1-6% with PLP, by 0-12% with LUDI, by 2-8% with F-Score, by 7-29% with ChemScore, by 0-9% with X-Score, by 2-19% with PMF, and by 1-7% with DrugScore. We also compared the performance of the suggested method with the method based on ranking by cluster occupancy only. We analyze how the choice of a clustering-RMSD and a low bound of dense clusters impacts on docking accuracy of the scoring methods. We derive optimal intervals of the clustering-RMSD for 11 scoring functions.
Dichotomous scoring of Trails B in patients referred for a dementia evaluation.
Schmitt, Andrew L; Livingston, Ronald B; Smernoff, Eric N; Waits, Bethany L; Harris, James B; Davis, Kent M
2010-04-01
The Trail Making Test is a popular neuropsychological test and its interpretation has traditionally used time-based scores. This study examined an alternative approach to scoring that is simply based on the examinees' ability to complete the test. If an examinee is able to complete Trails B successfully, they are coded as "completers"; if not, they are coded as "noncompleters." To assess this approach to scoring Trails B, the performance of 97 diagnostically heterogeneous individuals referred for a dementia evaluation was examined. In this sample, 55 individuals successfully completed Trails B and 42 individuals were unable to complete it. Point-biserial correlations indicated a moderate-to-strong association (r(pb)=.73) between the Trails B completion variable and the Total Scale score of the Repeatable Battery for the Assessment of Neurological Status (RBANS), which was larger than the correlation between the Trails B time-based score and the RBANS Total Scale score (r(pb)=.60). As a screen for dementia status, Trails B completion showed a sensitivity of 69% and a specificity of 100% in this sample. These results suggest that dichotomous scoring of Trails B might provide a brief and clinically useful measure of dementia status.
Maenner, Matthew J; Greenberg, Jan S; Mailick, Marsha R
2015-05-01
Lower (versus higher) IQ scores have been shown to increase the risk of early mortality, however, the underlying mechanisms are poorly understood and previous studies underrepresent individuals with intellectual disability (ID) and women. This study followed one third of all senior-year students (approximately aged 17) attending public high school in Wisconsin, U.S. in 1957 (n = 10,317) until 2011. Men and women with the lowest IQ test scores (i.e., IQ scores ≤ 85) had increased rates of mortality compared to people with the highest IQ test scores, particularly for cardiovascular disease. Importantly, when educational attainment was held constant, people with lower IQ test scores did not have higher mortality by age 70 than people with higher IQ test scores. Individuals with lower IQ test scores likely experience multiple disadvantages throughout life that contribute to increased risk of early mortality.
A Summary Score for the Framingham Heart Study Neuropsychological Battery
Downer, Brian; Fardo, David W.; Schmitt, Frederick A.
2015-01-01
Objective To calculate three summary scores of the Framingham Heart Study neuropsychological battery and determine which score best differentiates between subjects classified as having normal cognition, test-based impaired learning and memory, test-based multidomain impairment, and dementia. Method The final sample included 2,503 participants. Three summary scores were assessed: (a) composite score that provided equal weight to each subtest, (b) composite score that provided equal weight to each cognitive domain assessed by the neuropsychological battery, and (c) abbreviated score comprised of subtests for learning and memory. Receiver operating characteristic analysis was used to determine which summary score best differentiated between the four cognitive states. Results The summary score that provided equal weight to each subtest best differentiated between the four cognitive states. Discussion A summary score that provides equal weight to each subtest is an efficient way to utilize all of the cognitive data collected by a neuropsychological battery. PMID:25804903
A Summary Score for the Framingham Heart Study Neuropsychological Battery.
Downer, Brian; Fardo, David W; Schmitt, Frederick A
2015-10-01
To calculate three summary scores of the Framingham Heart Study neuropsychological battery and determine which score best differentiates between subjects classified as having normal cognition, test-based impaired learning and memory, test-based multidomain impairment, and dementia. The final sample included 2,503 participants. Three summary scores were assessed: (a) composite score that provided equal weight to each subtest, (b) composite score that provided equal weight to each cognitive domain assessed by the neuropsychological battery, and (c) abbreviated score comprised of subtests for learning and memory. Receiver operating characteristic analysis was used to determine which summary score best differentiated between the four cognitive states. The summary score that provided equal weight to each subtest best differentiated between the four cognitive states. A summary score that provides equal weight to each subtest is an efficient way to utilize all of the cognitive data collected by a neuropsychological battery. © The Author(s) 2015.
An Exploration of the Base Rate Scores of the Millon Clinical Multiaxial Inventory-III
ERIC Educational Resources Information Center
Grove, William M.; Vrieze, Scott I.
2009-01-01
The Millon Clinical Multiaxial Inventory (3rd ed.; MCMI-III) is a widely used psychological assessment of clinical and personality disorders. Unlike typical tests, the MCMI-III uses a base-rate score transformation to incorporate prior probabilities of disorder (i.e., base rates) in test output and diagnostic thresholds. The authors describe the…
A comparison of interteaching and lecture in the college classroom.
Saville, Bryan K; Zinn, Tracy E; Neef, Nancy A; Van Norman, Renee; Ferreri, Summer J
2006-01-01
Interteaching is a new method of classroom instruction that is based on behavioral principles but offers more flexibility than other behaviorally based methods. We examined the effectiveness of interteaching relative to a traditional form of classroom instruction-the lecture. In Study 1, participants in a graduate course in special education took short quizzes after alternating conditions of interteaching and lecture. Quiz scores following interteaching were higher than quiz scores following lecture, although both methods improved performance relative to pretest measures. In Study 2, we also alternated interteaching and lecture but counterbalanced the conditions across two sections of an undergraduate research methods class. After each unit of information, participants from both sections took the same test. Again, test scores following interteaching were higher than test scores following lecture. In addition, students correctly answered more interteaching-based questions than lecture-based questions on a cumulative final test. In both studies, the majority of students reported a preference for interteaching relative to traditional lecture. In sum, the results suggest that interteaching may be an effective alternative to traditional lecture-based methods of instruction.
How Do Students Experience Testing on the University Computer?
ERIC Educational Resources Information Center
Whittington, Dale; And Others
1995-01-01
Reports a study of the administration mode, scores, and testing experiences of students taking the PreProfessional Skills Test (PPST) under differing conditions (computer based and paper and pencil). PPST scores and surveys of the students revealed varied test-taking strategies and computer-related alterations in test difficulty, construct,…
Building and Supporting a Case for Test Use
ERIC Educational Resources Information Center
Bachman, Lyle F.
2005-01-01
The fields of language testing and educational and psychological measurement have not, as yet, developed a set of principles and procedures for linking test scores and score-based inferences to test use and the consequences of test use. Although Messick (1989) discusses test use and consequences, his framework provides virtually no guidance on how…
The video-based test of communication skills: description, development, and preliminary findings.
Mazor, Kathleen M; Haley, Heather-Lyn; Sullivan, Kate; Quirk, Mark E
2007-01-01
The importance of assessing physician-patient communication skills is widely recognized, but assessment methods are limited. Objective structured clinical examinations are time-consuming and resource intensive. For practicing physicians, patient surveys may be useful, but these also require substantial resources. Clearly, it would be advantageous to develop alternative or supplemental methods for assessing communication skills of medical students, residents, and physicians. The Video-based Test of Communication Skills (VTCS) is an innovative, computer-administered test, consisting of 20 very short video vignettes. In each vignette, a patient makes a statement or asks a question. The examinee responds verbally, as if it was a real encounter and he or she were the physician. Responses are recorded for later scoring. Test administration takes approximately 1 h. Generalizability studies were conducted, and scores for two groups of physicians predicted to differ in their communication skills were compared. Preliminary results are encouraging; the estimated g coefficient for the communication score for 20-vignette test (scored by five raters) is 0.79; g for the personal/affective score under the same conditions is 0.62. Differences between physicians were in the predicted direction, with physicians considered "at risk" for communication difficulties scoring lower than those not so identified. The VTCS is a short, portable test of communication skills. Results reported here suggest that scores reflect differences in skill levels and are generalizable. However, these findings are based on very small sample sizes and must be considered preliminary. Additional work is required before it will be possible to argue confidently that this test in particular, and this approach to testing communication skills in general, is valuable and likely to make a substantial contribution to assessment in medical education.
Using Raters from India to Score a Large-Scale Speaking Test
ERIC Educational Resources Information Center
Xi, Xiaoming; Mollaun, Pam
2011-01-01
We investigated the scoring of the Speaking section of the Test of English as a Foreign Language[TM] Internet-based (TOEFL iBT[R]) test by speakers of English and one or more Indian languages. We explored the extent to which raters from India, after being trained and certified, were able to score the TOEFL examinees with mixed first languages…
Boevé, Anja J; Meijer, Rob R; Albers, Casper J; Beetsma, Yta; Bosker, Roel J
2015-01-01
The introduction of computer-based testing in high-stakes examining in higher education is developing rather slowly due to institutional barriers (the need of extra facilities, ensuring test security) and teacher and student acceptance. From the existing literature it is unclear whether computer-based exams will result in similar results as paper-based exams and whether student acceptance can change as a result of administering computer-based exams. In this study, we compared results from a computer-based and paper-based exam in a sample of psychology students and found no differences in total scores across the two modes. Furthermore, we investigated student acceptance and change in acceptance of computer-based examining. After taking the computer-based exam, fifty percent of the students preferred paper-and-pencil exams over computer-based exams and about a quarter preferred a computer-based exam. We conclude that computer-based exam total scores are similar as paper-based exam scores, but that for the acceptance of high-stakes computer-based exams it is important that students practice and get familiar with this new mode of test administration.
Can Machine Scoring Deal with Broad and Open Writing Tests as Well as Human Readers?
ERIC Educational Resources Information Center
McCurry, Doug
2010-01-01
This article considers the claim that machine scoring of writing test responses agrees with human readers as much as humans agree with other humans. These claims about the reliability of machine scoring of writing are usually based on specific and constrained writing tasks, and there is reason for asking whether machine scoring of writing requires…
Magyari, N; Szakács, V; Bartha, C; Szilágyi, B; Galamb, K; Magyar, M O; Hortobágyi, T; Kiss, R M; Tihanyi, J; Négyesi, J
2017-09-01
Aims The aim of this study was to examine the effects of gender on the relationship between Functional Movement Screen (FMS) and treadmill-based gait parameters. Methods Twenty elite junior athletes (10 women and 10 men) performed the FMS tests and gait analysis at a fixed speed. Between-gender differences were calculated for the relationship between FMS test scores and gait parameters, such as foot rotation, step length, and length of gait line. Results Gender did not affect the relationship between FMS and treadmill-based gait parameters. The nature of correlations between FMS test scores and gait parameters was different in women and men. Furthermore, different FMS test scores predicted different gait parameters in female and male athletes. FMS asymmetry and movement asymmetries measured by treadmill-based gait parameters did not correlate in either gender. Conclusion There were no interactions between FMS, gait parameters, and gender; however, correlation analyses support the idea that strength and conditioning coaches need to pay attention not only to how to score but also how to correctly use FMS.
Evidence-based practice knowledge, attitudes, and practice of online graduate nursing students.
Rojjanasrirat, Wilaiporn; Rice, Jan
2017-06-01
This study aimed to evaluate changes in evidence-based practice (EBP) knowledge, attitudes, and practice of nursing students before and after completing an online, graduate level, introductory research/EBP course. A prospective one-group pretest-posttest design. A private university in the Midwestern, USA. Sixty-three online nurse practitioner students in Master's program. A convenient sample of online graduate nursing students who enrolled in the research/EBP course was invited to participate in the study. Study outcomes were measured using the Evidence-Based Practice Questionnaire (EBPQ) before and after completing the course. Descriptive statistics and paired-Samples t-test was used to assess the mean differences between pre-and post-test scores. Overall, students' post-test EBP scores were significantly improved over pre-test scores, t(63)=-9.034, p<0.001). Statistically significant differences were found for practice of EBP mean scores t(63)=-12.78, p=0.001). No significant differences were found between pre and post-tests on knowledge and attitudes toward EBP scores. Most frequently cited barriers to EBP were lack of understanding of statistics, interpretation of findings, lack of time, and lack of library resources. Copyright © 2017 Elsevier Ltd. All rights reserved.
Digital education and dynamic assessment of tongue diagnosis based on Mashup technique.
Tsai, Chin-Chuan; Lo, Yen-Cheng; Chiang, John Y; Sainbuyan, Natsagdorj
2017-01-24
To assess the digital education and dynamic assessment of tongue diagnosis based on Mashup technique (DEDATD) according to specifific user's answering pattern, and provide pertinent information tailored to user's specifific needs supplemented by the teaching materials constantly updated through the Mashup technique. Fifty-four undergraduate students were tested with DEDATD developed. The effificacy of the DEDATD was evaluated based on the pre- and post-test performance, with interleaving training sessions targeting on the weakness of the student under test. The t-test demonstrated that signifificant difference was reached in scores gained during pre- and post-test sessions, and positive correlation between scores gained and length of time spent on learning, while no signifificant differences between the gender and post-test score, and the years of students in school and the progress in score gained. DEDATD, coupled with Mashup technique, could provide updated materials fifiltered through diverse sources located across the network. The dynamic assessment could tailor each individual learner's needs to offer custom-made learning materials. DEDATD poses as a great improvement over the traditional teaching methods.
Dividing the Force Concept Inventory into two equivalent half-length tests
NASA Astrophysics Data System (ADS)
Han, Jing; Bao, Lei; Chen, Li; Cai, Tianfang; Pi, Yuan; Zhou, Shaona; Tu, Yan; Koenig, Kathleen
2015-06-01
The Force Concept Inventory (FCI) is a 30-question multiple-choice assessment that has been a building block for much of the physics education research done today. In practice, there are often concerns regarding the length of the test and possible test-retest effects. Since many studies in the literature use the mean score of the FCI as the primary variable, it would be useful then to have different shorter tests that can produce FCI-equivalent scores while providing the benefits of being quicker to administer and overcoming the test-retest effects. In this study, we divide the 1995 version of the FCI into two half-length tests; each contains a different subset of the original FCI questions. The two new tests are shorter, still cover the same set of concepts, and produce mean scores equivalent to those of the FCI. Using a large quantitative data set collected at a large midwestern university, we statistically compare the assessment features of the two half-length tests and the full-length FCI. The results show that the mean error of equivalent scores between any two of the three tests is within 3%. Scores from all tests are well correlated. Based on the analysis, it appears that the two half-length tests can be a viable option for score based assessment that need to administer tests quickly or need to measure short-term gains where using identical pre- and post-test questions is a concern.
Spofford, Christina M; Bayman, Emine O; Szeluga, Debra J; From, Robert P
2012-01-01
Novel methods for teaching are needed to enhance the efficiency of academic anesthesia departments as well as provide approaches to learning that are aligned with current trends and advances in technology. A video was produced that taught the key elements of anesthesia machine checkout and room set up. Novice learners were randomly assigned to receive either the new video format or traditional lecture-based format for this topic during their regularly scheduled lecture series. Primary outcome was the difference in written examination score before and after teaching between the two groups. Secondary outcome was the satisfaction score of the trainees in the two groups. Forty-two students assigned to the video group and 36 students assigned to the lecture group completed the study. Students in each group similar interest in anesthesia, pre-test scores, post-test scores, and final exam scores. The median posttest to pretest difference was greater in the video groups (3.5 (3.0-5.0) vs 2.5 (2.0-3.0), for video and lecture groups respectively, p 0.002). Despite improved test scores, students reported higher satisfaction the traditional, lecture-based format (22.0 (18.0-24.0) vs 24.0 (20.0-28.0), for video and lecture groups respectively, p <0.004). Higher pre-test to post-test improvements were observed among students in the video-based teaching group, however students rated traditional, live lectures higher than newer video-based teaching.
Oosting, Ellen; Hoogeboom, Thomas J; Appelman-de Vries, Suzan A; Swets, Adam; Dronkers, Jaap J; van Meeteren, Nico L U
2016-01-01
The aim of this study was to evaluate the value of conventional factors, the Risk Assessment and Predictor Tool (RAPT) and performance-based functional tests as predictors of delayed recovery after total hip arthroplasty (THA). A prospective cohort study in a regional hospital in the Netherlands with 315 patients was attending for THA in 2012. The dependent variable recovery of function was assessed with the Modified Iowa Levels of Assistance scale. Delayed recovery was defined as taking more than 3 days to walk independently. Independent variables were age, sex, BMI, Charnley score, RAPT score and scores for four performance-based tests [2-minute walk test, timed up and go test (TUG), 10-meter walking test (10 mW) and hand grip strength]. Regression analysis with all variables identified older age (>70 years), Charnley score C, slow walking speed (10 mW >10.0 s) and poor functional mobility (TUG >10.5 s) as the best predictors of delayed recovery of function. This model (AUC 0.85, 95% CI 0.79-0.91) performed better than a model with conventional factors and RAPT scores, and significantly better (p = 0.04) than a model with only conventional factors (AUC 0.81, 95% CI 0.74-0.87). The combination of performance-based tests and conventional factors predicted inpatient functional recovery after THA. Two simple functional performance-based tests have a significant added value to a more conventional screening with age and comorbidities to predict recovery of functioning immediately after total hip surgery. Patients over 70 years old, with comorbidities, with a TUG score >10.5 s and a walking speed >1.0 m/s are at risk for delayed recovery of functioning. Those high risk patients need an accurate discharge plan and could benefit from targeted pre- and postoperative therapeutic exercise programs.
Boivin, Michael J; Weiss, Jonathan; Chhaya, Ronak; Seffren, Victoria; Awadu, Jorem; Sikorskii, Alla; Giordani, Bruno
2017-07-01
Tobii eye tracking was compared with webcam-based observer scoring on an animation viewing measure of attention (Early Childhood Vigilance Test; ECVT) to evaluate the feasibility of automating measurement and scoring. Outcomes from both scoring approaches were compared with the Mullen Scales of Early Learning (MSEL), Color-Object Association Test (COAT), and Behavior Rating Inventory of Executive Function for preschool children (BRIEF-P). A total of 44 children 44 to 65 months of age were evaluated with the ECVT, COAT, MSEL, and BRIEF-P. Tobii ×2-30 portable infrared cameras were programmed to monitor pupil direction during the ECVT 6-min animation and compared with observer-based PROCODER webcam scoring. Children watched 78% of the cartoon (Tobii) compared with 67% (webcam scoring), although the 2 measures were highly correlated (r = .90, p = .001). It is possible for 2 such measures to be highly correlated even if one is consistently higher than the other (Bergemann et al., 2012). Both ECVT Tobii and webcam ECVT measures significantly correlated with COAT immediate recall (r = .37, p = .02 vs. r = .38, p = .01, respectively) and total recall (r = .33, p = .06 vs. r = .42, p = .005) measures. However, neither the Tobii eye tracking nor PROCODER webcam ECVT measures of attention correlated with MSEL composite cognitive performance or BRIEF-P global executive composite. ECVT scoring using Tobii eye tracking is feasible with at-risk very young African children and consistent with webcam-based scoring approaches in their correspondence to one another and other neurocognitive performance-based measures. By automating measurement and scoring, eye tracking technologies can improve the efficiency and help better standardize ECVT testing of attention in younger children. This holds promise for other neurodevelopmental tests where eye movements, tracking, and gaze length can provide important behavioral markers of neuropsychological and neurodevelopmental processes associated with such tests. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Karr, Justin E; Garcia-Barrera, Mauricio A; Holdnack, James A; Iverson, Grant L
2017-05-01
Executive function consists of multiple cognitive processes that operate as an interactive system to produce volitional goal-oriented behavior, governed in large part by frontal microstructural and physiological networks. Identification of deficits in executive function in those with neurological or psychiatric conditions can be difficult because the normal variation in executive function test scores, in healthy adults when multiple tests are used, is largely unknown. This study addresses that gap in the literature by examining the prevalence of low scores on a brief battery of executive function tests. The sample consisted of 1,050 healthy individuals (ages 16-89) from the standardization sample for the Delis-Kaplan Executive Function System (D-KEFS). Seven individual test scores from the Trail Making Test, Color-Word Interference Test, and Verbal Fluency Test were analyzed. Low test scores, as defined by commonly used clinical cut-offs (i.e., ≤25th, 16th, 9th, 5th, and 2nd percentiles), occurred commonly among the adult portion of the D-KEFS normative sample (e.g., 62.8% of the sample had one or more scores ≤16th percentile, 36.1% had one or more scores ≤5th percentile), and the prevalence of low scores increased with lower intelligence and fewer years of education. The multivariate base rates (BR) in this article allow clinicians to understand the normal frequency of low scores in the general population. By use of these BRs, clinicians and researchers can improve the accuracy with which they identify executive dysfunction in clinical groups, such as those with traumatic brain injury or neurodegenerative diseases. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Equating Scores from Adaptive to Linear Tests
ERIC Educational Resources Information Center
van der Linden, Wim J.
2006-01-01
Two local methods for observed-score equating are applied to the problem of equating an adaptive test to a linear test. In an empirical study, the methods were evaluated against a method based on the test characteristic function (TCF) of the linear test and traditional equipercentile equating applied to the ability estimates on the adaptive test…
Kumar, Avinash B; Hata, J Steven; Bayman, Emine O; Krishnan, Sundar
2013-01-01
To determine whether a hybrid traditional and web-based curriculum improves test scores and enrollment among senior medical students in an elective critical care rotation. Retrospective study in a surgical ICU at a major academic center. One hundred twenty-one fourth year medical students completing an elective ICU clerkship between 2007 and 2010. Pre-test and post-test during a 4-week rotation. We implemented a hybrid curriculum that involved both traditional teaching methods and a new online core curriculum that incorporating audio, video, and text using screen capture technology. The curriculum was hosted on a secure online portal called ICON (Desire2Learn Inc., Ontario, Canada). The core curriculum covered topics that were considered essential to meet the didactic objectives of the rotation. MEASUREMENTS AND EVALUATIONS: A pre-test was administered online on day 1 of the rotation. A post-test was administered on the second to last day of the rotation. Both tests were composed of 20 questions randomly chosen from a question bank of 100 questions. The tests are managed (administering, grading, and reporting) exclusively online. One hundred twenty-one medical students have successfully completed the clerkship since implementing the new curriculum. Each group of students showed an improvement in the mean post-test score by at least 17%+ to 10%. The satisfaction scores of the clerkship improved consistently from 2007 and is currently rated at 4.31 ± 0.85 (on a 5-point scale). The rotation is in the top 25(th) percentile of all clinical clerkships offered at the University of Iowa. A systematically implemented hybrid web-based critical care curriculum can improve knowledge based test scores and overall clerkship satisfaction scores in a busy surgical ICU. Copyright © 2013 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Dee, Thomas S.; Dobbie, Will; Jacob, Brian A.; Rockoff, Jonah
2016-01-01
In this paper, we show that the design and decentralized, school-based scoring of New York's high school exit exams--the Regents Examinations--led to the systematic manipulation of test sores just below important proficiency cutoffs. Our estimates suggest that teachers inflate approximately 40 percent of test scores near the proficiency cutoffs.…
AP Trends: Tests Soar, Scores Slip--Gaps between Groups Spur Equity Concerns
ERIC Educational Resources Information Center
Cech, Scott J.
2008-01-01
More students are taking Advanced Placement tests, but the proportion of tests receiving what is deemed a passing score has dipped, and the mean score is down for the fourth year in a row. Data released here this week by the New York City-based nonprofit organization that owns the AP brand shows that a greater-than-ever proportion of students…
Development of the Enlisted Panel Research Data Base
1990-01-01
Loss Files, Accession File, Army Classification Battery Composite Scores pertaining to accession, the Skills Qualifying Test (SQT) data from the SQT...inclusive. Specific accession data variables, including composite score data from the Army Classification Battery Test (ACB), are cap- tured for each...included. To broaden the scope of information for each individual, Skill Qualifying Test (SQT) scores were kept beginning in 1980 and, as of fiscal year
Gunner, Jessica H; Miele, Andrea S; Lynch, Julie K; McCaffrey, Robert J
2012-06-01
There is currently no standard criterion for determining abnormal test scores in neuropsychology; thus, a number of different criteria are commonly used. We investigated base rates of abnormal scores in healthy older adults using raw and T-scores from indices of the Wisconsin Card Sorting Test and Stroop Color-Word Test. Abnormal scores were examined cumulatively at seven cutoffs including >1.0, >1.5, >2.0, >2.5, and >3.0 standard deviations (SD) from the mean as well as those below the 10th and 5th percentiles. In addition, the number of abnormal scores at each of the seven cutoffs was also examined. Results showed when considering raw scores, ∼15% of individuals obtained scores>1.0 SD from the mean, around 10% were less than the 10th percentile, and 5% fell >1.5 SD or <5th percentile from the mean. Using T-scores, approximately 15%-20% and 5%-10% of scores were >1.0 and >1.5 SD from the mean, respectively. Roughly 15% and 5% fell at the <10th and <5th percentiles, respectively. Both raw and T-scores>2.0 SD from the mean were infrequent. Although the presence of a single abnormal score at 1.0 and 1.5 SD from the mean or at the 10th and 5th percentiles was not unusual, the presence of ≥2 abnormal scores using any criteria was uncommon. Consideration of base rate data regarding the percentage of healthy individuals scoring in the abnormal range should help avoid classifying normal variability as neuropsychological impairment.
Incorporating IStation into Early Childhood Classrooms to Improve Reading Comprehension
ERIC Educational Resources Information Center
Luo, Tian; Lee, Guang-Lea; Molina, Cynthia
2017-01-01
Aim/Purpose: IStation is an adaptive computer-based reading program that adapts to the learner's academic needs. This study investigates if the IStation computer-based reading program promotes reading improvement scores as shown on the STAR Reading test and the IStation test scaled scores for elementary school third-grade learners on different…
ERIC Educational Resources Information Center
Clariana, Roy B.; Wallace, Patricia
2007-01-01
This proof-of-concept investigation describes a computer-based approach for deriving the knowledge structure of individuals and of groups from their written essays, and considers the convergent criterion-related validity of the computer-based scores relative to human rater essay scores and multiple-choice test scores. After completing a…
A web-based normative calculator for the uniform data set (UDS) neuropsychological test battery.
Shirk, Steven D; Mitchell, Meghan B; Shaughnessy, Lynn W; Sherman, Janet C; Locascio, Joseph J; Weintraub, Sandra; Atri, Alireza
2011-11-11
With the recent publication of new criteria for the diagnosis of preclinical Alzheimer's disease (AD), there is a need for neuropsychological tools that take premorbid functioning into account in order to detect subtle cognitive decline. Using demographic adjustments is one method for increasing the sensitivity of commonly used measures. We sought to provide a useful online z-score calculator that yields estimates of percentile ranges and adjusts individual performance based on sex, age and/or education for each of the neuropsychological tests of the National Alzheimer's Coordinating Center Uniform Data Set (NACC, UDS). In addition, we aimed to provide an easily accessible method of creating norms for other clinical researchers for their own, unique data sets. Data from 3,268 clinically cognitively-normal older UDS subjects from a cohort reported by Weintraub and colleagues (2009) were included. For all neuropsychological tests, z-scores were estimated by subtracting the raw score from the predicted mean and then dividing this difference score by the root mean squared error term (RMSE) for a given linear regression model. For each neuropsychological test, an estimated z-score was calculated for any raw score based on five different models that adjust for the demographic predictors of SEX, AGE and EDUCATION, either concurrently, individually or without covariates. The interactive online calculator allows the entry of a raw score and provides five corresponding estimated z-scores based on predictions from each corresponding linear regression model. The calculator produces percentile ranks and graphical output. An interactive, regression-based, normative score online calculator was created to serve as an additional resource for UDS clinical researchers, especially in guiding interpretation of individual performances that appear to fall in borderline realms and may be of particular utility for operationalizing subtle cognitive impairment present according to the newly proposed criteria for Stage 3 preclinical Alzheimer's disease.
The Impact of Conditional Scores on the Performance of DETECT.
ERIC Educational Resources Information Center
Zhang, Yanwei Oliver; Yu, Feng; Nandakumar, Ratna
DETECT is a nonparametric, conditional covariance-based procedure to identify dimensional structure and the degree of multidimensionality of test data. The ability composite or conditional score used to estimate conditional covariance plays a significant role in the performance of DETECT. The number correct score of all items in the test (T) and…
IQ Scores Should Be Corrected for the Flynn Effect in High-Stakes Decisions
ERIC Educational Resources Information Center
Fletcher, Jack M.; Stuebing, Karla K.; Hughes, Lisa C.
2010-01-01
IQ test scores should be corrected for high stakes decisions that employ these assessments, including capital offense cases. If scores are not corrected, then diagnostic standards must change with each generation. Arguments against corrections, based on standards of practice, information present and absent in test manuals, and related issues,…
Standard Errors and Confidence Intervals of Norm Statistics for Educational and Psychological Tests.
Oosterhuis, Hannah E M; van der Ark, L Andries; Sijtsma, Klaas
2016-11-14
Norm statistics allow for the interpretation of scores on psychological and educational tests, by relating the test score of an individual test taker to the test scores of individuals belonging to the same gender, age, or education groups, et cetera. Given the uncertainty due to sampling error, one would expect researchers to report standard errors for norm statistics. In practice, standard errors are seldom reported; they are either unavailable or derived under strong distributional assumptions that may not be realistic for test scores. We derived standard errors for four norm statistics (standard deviation, percentile ranks, stanine boundaries and Z-scores) under the mild assumption that the test scores are multinomially distributed. A simulation study showed that the standard errors were unbiased and that corresponding Wald-based confidence intervals had good coverage. Finally, we discuss the possibilities for applying the standard errors in practical test use in education and psychology. The procedure is provided via the R function check.norms, which is available in the mokken package.
An analysis of a digital variant of the Trail Making Test using machine learning techniques.
Dahmen, Jessamyn; Cook, Diane; Fellows, Robert; Schmitter-Edgecombe, Maureen
2017-01-01
The goal of this work is to develop a digital version of a standard cognitive assessment, the Trail Making Test (TMT), and assess its utility. This paper introduces a novel digital version of the TMT and introduces a machine learning based approach to assess its capabilities. Using digital Trail Making Test (dTMT) data collected from (N = 54) older adult participants as feature sets, we use machine learning techniques to analyze the utility of the dTMT and evaluate the insights provided by the digital features. Predicted TMT scores correlate well with clinical digital test scores (r = 0.98) and paper time to completion scores (r = 0.65). Predicted TICS exhibited a small correlation with clinically derived TICS scores (r = 0.12 Part A, r = 0.10 Part B). Predicted FAB scores exhibited a small correlation with clinically derived FAB scores (r = 0.13 Part A, r = 0.29 for Part B). Digitally derived features were also used to predict diagnosis (AUC of 0.65). Our findings indicate that the dTMT is capable of measuring the same aspects of cognition as the paper-based TMT. Furthermore, the dTMT's additional data may be able to help monitor other cognitive processes not captured by the paper-based TMT alone.
Lippert, Christoph; Xiang, Jing; Horta, Danilo; Widmer, Christian; Kadie, Carl; Heckerman, David; Listgarten, Jennifer
2014-11-15
Set-based variance component tests have been identified as a way to increase power in association studies by aggregating weak individual effects. However, the choice of test statistic has been largely ignored even though it may play an important role in obtaining optimal power. We compared a standard statistical test-a score test-with a recently developed likelihood ratio (LR) test. Further, when correction for hidden structure is needed, or gene-gene interactions are sought, state-of-the art algorithms for both the score and LR tests can be computationally impractical. Thus we develop new computationally efficient methods. After reviewing theoretical differences in performance between the score and LR tests, we find empirically on real data that the LR test generally has more power. In particular, on 15 of 17 real datasets, the LR test yielded at least as many associations as the score test-up to 23 more associations-whereas the score test yielded at most one more association than the LR test in the two remaining datasets. On synthetic data, we find that the LR test yielded up to 12% more associations, consistent with our results on real data, but also observe a regime of extremely small signal where the score test yielded up to 25% more associations than the LR test, consistent with theory. Finally, our computational speedups now enable (i) efficient LR testing when the background kernel is full rank, and (ii) efficient score testing when the background kernel changes with each test, as for gene-gene interaction tests. The latter yielded a factor of 2000 speedup on a cohort of size 13 500. Software available at http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm/. heckerma@microsoft.com Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Mitra, Nilesh Kumar; Barua, Ankur
2015-03-03
The impact of web-based formative assessment practices on performance of undergraduate medical students in summative assessments is not widely studied. This study was conducted among third-year undergraduate medical students of a designated university in Malaysia to compare the effect, on performance in summative assessment, of repeated computer-based formative assessment with automated feedback with that of single paper-based formative assessment with face-to face feedback. This quasi-randomized trial was conducted among two groups of undergraduate medical students who were selected by stratified random technique from a cohort undertaking the Musculoskeletal module. The control group C (n = 102) was subjected to a paper-based formative MCQ test. The experimental group E (n = 65) was provided three online formative MCQ tests with automated feedback. The summative MCQ test scores for both these groups were collected after the completion of the module. In this study, no significant difference was observed between the mean summative scores of the two groups. However, Band 1 students from group E with higher entry qualification showed higher mean score in the summative assessment. A trivial, but significant and positive correlation (r(2) = +0.328) was observed between the online formative test scores and summative assessment scores of group E. The proportionate increase of performance in group E was found to be almost double than group C. The use of computer based formative test with automated feedback improved the performance of the students with better academic background in the summative assessment. Computer-based formative test can be explored as an optional addition to the curriculum of pre-clinical integrated medical program to improve the performance of the students with higher academic ability.
Auditing for Score Inflation Using Self-Monitoring Assessments: Findings from Three Pilot Studies
ERIC Educational Resources Information Center
Koretz, Daniel; Jennings, Jennifer L.; Ng, Hui Leng; Yu, Carol; Braslow, David; Langi, Meredith
2016-01-01
Test-based accountability often produces score inflation. Most studies have evaluated inflation by comparing trends on a high-stakes test and a lower stakes audit test. However, Koretz and Beguin (2010) noted weaknesses of audit tests and suggested self-monitoring assessments (SMAs), which incorporate audit items into high-stakes tests. This…
Wolf, Timothy J; Dahl, Abigail; Auen, Colleen; Doherty, Meghan
2017-07-01
The objective of this study was to evaluate the inter-rater reliability, test-retest reliability, concurrent validity, and discriminant validity of the Complex Task Performance Assessment (CTPA): an ecologically valid performance-based assessment of executive function. Community control participants (n = 20) and individuals with mild stroke (n = 14) participated in this study. All participants completed the CTPA and a battery of cognitive assessments at initial testing. The control participants completed the CTPA at two different times one week apart. The intra-class correlation coefficient (ICC) for inter-rater reliability for the total score on the CTPA was .991. The ICCs for all of the sub-scores of the CTPA were also high (.889-.977). The CTPA total score was significantly correlated to Condition 4 of the DKEFS Color-Word Interference Test (p = -.425), and the Wechsler Test of Adult Reading (p = -.493). Finally, there were significant differences between control subjects and individuals with mild stroke on the total score of the CTPA (p = .007) and all sub-scores except interpretation failures and total items incorrect. These results are also consistent with other current executive function performance-based assessments and indicate that the CTPA is a reliable and valid performance-based measure of executive function.
ERIC Educational Resources Information Center
Fahle, Erin M.; Reardon, Sean F.
2017-01-01
This paper provides the first population-based evidence on how much standardized test scores vary among public school districts within each state and how segregation explains that variation. Using roughly 300 million standardized test score records in math and ELA for grades 3 through 8 from every U.S. public school district during the 2008-09 to…
Boevé, Anja J.; Meijer, Rob R.; Albers, Casper J.; Beetsma, Yta; Bosker, Roel J.
2015-01-01
The introduction of computer-based testing in high-stakes examining in higher education is developing rather slowly due to institutional barriers (the need of extra facilities, ensuring test security) and teacher and student acceptance. From the existing literature it is unclear whether computer-based exams will result in similar results as paper-based exams and whether student acceptance can change as a result of administering computer-based exams. In this study, we compared results from a computer-based and paper-based exam in a sample of psychology students and found no differences in total scores across the two modes. Furthermore, we investigated student acceptance and change in acceptance of computer-based examining. After taking the computer-based exam, fifty percent of the students preferred paper-and-pencil exams over computer-based exams and about a quarter preferred a computer-based exam. We conclude that computer-based exam total scores are similar as paper-based exam scores, but that for the acceptance of high-stakes computer-based exams it is important that students practice and get familiar with this new mode of test administration. PMID:26641632
Measures of Partial Knowledge and Unexpected Responses in Multiple-Choice Tests
ERIC Educational Resources Information Center
Chang, Shao-Hua; Lin, Pei-Chun; Lin, Zih-Chuan
2007-01-01
This study investigates differences in the partial scoring performance of examinees in elimination testing and conventional dichotomous scoring of multiple-choice tests implemented on a computer-based system. Elimination testing that uses the same set of multiple-choice items rewards examinees with partial knowledge over those who are simply…
Interactional Competence: Challenges for Validity.
ERIC Educational Resources Information Center
Young, Richard F.
One of the ways in which language testing interfaces with applied linguistics is in the definition and validation of the constructs that underlie language tests. When language testers and score users interpret scores on a test, they do so by implicit and explicit reference to the construct on which the test is based. Equally, when applied to new…
Wu, JC; Lai, LC; Sheets, CG; Earthman, J; Newcomb, R
2011-01-01
Statement of problem A new fabrication process has been developed where a titanium coping, which has a gold colored titanium nitride outer layer can be reliably fused to porcelain, but the marginal adaptation characteristics are still undetermined. Purpose The primary purpose of this study is to compare the rate of Clinically Acceptable Marginal Adaptation (CAMA-defined as a marginal gap mean ≤60 μm) of cathode-arc vapor-deposited titanium with the CAMA rate for the cast base metal copings. In addition, the study will evaluate the marginal gap scores themselves to assess their mean difference between the two study groups. Finally, the study will present two analyses of group differences in variability to support the contention that the titanium copings perform more consistently than their base metal counterparts. Material and methods Thirty-seven cathode-arc vapor-deposited titanium copings and 40 cast base metal copings were evaluated by computer-based image analysis using an optical microscope. The conventional lost wax technique was used to fabricate the 40 cast base metal copings that were 0.3 mm thick. The titanium copings were 0.3 mm thick and were formed by a collection of atomic titanium vapor onto a refractory die duplicate in a high vacuum chamber. Fifty vertical marginal gap measurements were collected from each of the 77 copings and the mean of these measurements was computed to form a gap score for each coping. Next, the gap score was compared to the 60 μm criterion to classify each coping as to whether it did or did not achieve Clinically Acceptable Marginal Adaption (CAMA). A comparison of the CAMA rates for each type of coping was used to address the primary purpose of this study. In addition, the gap scores themselves were used to test the (one-sided) hypothesis that the mean of the titanium gap scores is smaller than the mean of the base metal gap scores. Finally, the assertion that the titanium copings provide more consistency in their marginal gap performance was tested in two ways. First, the means of the titanium gap scores were compared to the means of the marginal gap scores for the base metal copings. Second, the standard deviations of the marginal gap scores for the titanium copings were compared with those for the base metal copings. Results Statistical comparison of the CAMA rates for each type of coping showed that the CAMA criterion was achieved by 24 of the 37 (64.86%) titanium copings, while 19 of the 40 (47.50%) base metal copings met this same standard. Noninferiority of the titanium copings was established by the 2-sided 90% Confidence Interval for the 17.36% difference in these rates (−0.95%, 35.68%) and noninferiority of titanium coping adaption was also demonstrated by the Wald Test rejection of the tentative hypothesis of inferiority (Z-score=1.9191, one-sided p=0.0275). The mean of the vertical marginal gap scores for the titanium copings (56.9025) was significantly less than the mean of the marginal gap scores for the base metal copings (71.9041) as shown by the Satterthwaite t-score=−2.29 (one-sided p=0.0126). To compare the adaption consistency of the titanium copings to the base metal counterparts the difference between the variance of the marginal gap scores for the titanium copings (594.843) and the variance of the marginal gap scores for the base metal copings (1510.901) was found to be statistically significant (Folded-F test score=2.63, p=0.0042). Our second method for showing that the titanium copings performed more consistently than the base metal comparisons was to use a one-sided test to show that the mean of the standard deviations of the vertical gap measurements for each titanium coping (29.9835) was significantly lower than the mean of the standard deviations of the vertical gap measurements for each base metal coping (36.1332). This test produced a Satterthwaite’s t-score of −2.24 (one-sided p=0.0141), indicating the titanium adaption was significantly more consistent. Conclusions Cathode-arc vapor deposited titanium copings exhibited a higher rate of Clinically Acceptable Marginal Adaption (CAMA) than the comparison base metal copings. Comparison of the coping marginal adaption score variances and direct assessment of the coping marginal adaption scores provided additional evidence that the titanium copings performed better and with more consistency than their base metal counterparts. PMID:21640242
MacDonald, Karen V; Bombard, Yvonne; Deal, Ken; Trudeau, Maureen; Leighl, Natasha; Marshall, Deborah A
2016-07-01
Women with early-stage breast cancer, of whom only 15% will experience a recurrence, are often conflicted or uncertain about taking chemotherapy. Gene expression profiling (GEP) of tumours informs risk prediction, potentially affecting treatment decisions. We examined whether receiving a GEP test score reduces decisional conflict in chemotherapy treatment decision making. A general population sample of 200 women completed the decisional conflict scale (DCS) at baseline (no GEP test score scenario) and after (scenario with GEP test score added) completing a discrete choice experiment survey for early-stage breast cancer chemotherapy. We scaled the 16-item DCS total scores and subscores from 0 to 100 and calculated means, standard deviations and change in scores, with significance (p < 0.05) based on matched pairs t-tests. We identified five respondent subgroups based on preferred treatment option; almost 40% did not change their chemotherapy decision after receiving GEP testing information. Total score and all subscores (uncertainty, informed, values clarity, support, and effective decision) decreased significantly in the respondent subgroup who were unsure about taking chemotherapy initially but changed to no chemotherapy (n =33). In the subgroup of respondents (n = 25) who chose chemotherapy initially but changed to unsure, effective decision subscore increased significantly. In the overall sample, changes in total and all subscores were non-significant. GEP testing adds value for women initially unsure about chemotherapy treatment with a decrease in decisional conflict. However, for women who are confident about their treatment decisions, GEP testing may not add value. Decisions to request GEP testing should be personalised based on patient preferences. Copyright © 2016 Elsevier Ltd. All rights reserved.
Winton, Lisa M; Ferguson, Elizabeth M N; Hsu, Chiu-Hsieh; Agee, Neal; Eubanks, Ryan D; O'Neill, Patrick J; Goldberg, Ross F; Kopelman, Tammy R; Nodora, Jesse N; Caruso, Daniel M; Komenaka, Ian K
To determine whether use of self-assessment (SA) questions affects the effectiveness of weekly didactic grand rounds presentations. From 26 consecutive grand rounds presentations from August 2013 to April 2014, a 52-question multiple-choice test was administered based on 2 questions from each presentation. Community teaching institution. General surgery residents, students, and attending physicians. The test was administered to 66 participants. The mean score was 41.8%. There was no difference in test score based on experience with similar scores for junior residents, senior residents, and attending surgeons (43%, 46%, and 44%; p = 0.13). Most participants felt they would be most interested in presentations directly related to their surgical specialty. Participants, however, did not score differently on topics which were the focus of the program (40% vs. 42%; p = 0.85). Journal club presentations (39% vs. others 42%; p = 0.33) also did not affect the score. The Pearson correlation coefficient for attendance was 0.49 (p < 0.0001) demonstrated that attendance was very important. Participation in the weekly SA was significantly associated with improved score as those who participated in SA scored over 20% higher than those who did not (59% vs. 38%; p < 0.0001). Based on multiple linear regression for mean score, SA explained the variation in score more than attendance. The current study found that without preparation approximately 40% of material presented is retained after 10 months. Participation in weekly SA significantly improved retention of information from grand rounds presentations. Copyright © 2016 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
Lippert, Christoph; Xiang, Jing; Horta, Danilo; Widmer, Christian; Kadie, Carl; Heckerman, David; Listgarten, Jennifer
2014-01-01
Motivation: Set-based variance component tests have been identified as a way to increase power in association studies by aggregating weak individual effects. However, the choice of test statistic has been largely ignored even though it may play an important role in obtaining optimal power. We compared a standard statistical test—a score test—with a recently developed likelihood ratio (LR) test. Further, when correction for hidden structure is needed, or gene–gene interactions are sought, state-of-the art algorithms for both the score and LR tests can be computationally impractical. Thus we develop new computationally efficient methods. Results: After reviewing theoretical differences in performance between the score and LR tests, we find empirically on real data that the LR test generally has more power. In particular, on 15 of 17 real datasets, the LR test yielded at least as many associations as the score test—up to 23 more associations—whereas the score test yielded at most one more association than the LR test in the two remaining datasets. On synthetic data, we find that the LR test yielded up to 12% more associations, consistent with our results on real data, but also observe a regime of extremely small signal where the score test yielded up to 25% more associations than the LR test, consistent with theory. Finally, our computational speedups now enable (i) efficient LR testing when the background kernel is full rank, and (ii) efficient score testing when the background kernel changes with each test, as for gene–gene interaction tests. The latter yielded a factor of 2000 speedup on a cohort of size 13 500. Availability: Software available at http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm/. Contact: heckerma@microsoft.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25075117
NASA Astrophysics Data System (ADS)
Saigo, Barbara Woodworth
The researcher collaborated with four high school biology teachers who had been involved for 2-1/2 years in a constructivism-based professional development experience that emphasized teaching for conceptual change and using classroom-based inquiry as a basis for making instructional decisions. The researcher and teachers designed a five-day instructional unit on biosystematics using two contrasting approaches, comprising the treatment variable. The "traditional" unit emphasized lecture, written materials, and some laboratory activities. The "constructivist" unit emphasized a specific, inquiry-based, conceptual change strategy and collaborative learning. The study used a quasi-experimental, factorial design to explore impact of instructional approach (the treatment variable) on student performance (the dependent variable) on repeated measures (three) of a biology concept test. Additional independent variables considered were gender, cumulative GPA, and the section in which students were enrolled. Scores on the biology concept test were compiled for the 3 constructivist sections (N = 44) and the 3 traditional sections (N = 42). Analysis of Covariance (ANCOVA) was applied. The main findings in regard to the primary research question were that instructional approach did not have a significant relationship to immediate post test scores or gain, but that one month after instruction students in the constructivist group demonstrated less loss of gain than those in the traditional group; i.e., their longer-term retention was greater. Also, GPA*instructional approach effects were detected for post-post-test gain. GPA and gender were significantly associated with pre-test, post-test, and post-post scores; however, in terms of change (gain) from pre-test to post-test and pre-test to post-post-test, GPA and gender were not significant effects. Section was a significant effect for all three tests, in terms of both score and gain. Gender*section effects were detected for post-test gain and post-post-test scores.
Precision analysis of a quantitative CT liver surface nodularity score.
Smith, Andrew; Varney, Elliot; Zand, Kevin; Lewis, Tara; Sirous, Reza; York, James; Florez, Edward; Abou Elkassem, Asser; Howard-Claudio, Candace M; Roda, Manohar; Parker, Ellen; Scortegagna, Eduardo; Joyner, David; Sandlin, David; Newsome, Ashley; Brewster, Parker; Lirette, Seth T; Griswold, Michael
2018-04-26
To evaluate precision of a software-based liver surface nodularity (LSN) score derived from CT images. An anthropomorphic CT phantom was constructed with simulated liver containing smooth and nodular segments at the surface and simulated visceral and subcutaneous fat components. The phantom was scanned multiple times on a single CT scanner with adjustment of image acquisition and reconstruction parameters (N = 34) and on 22 different CT scanners from 4 manufacturers at 12 imaging centers. LSN scores were obtained using a software-based method. Repeatability and reproducibility were evaluated by intraclass correlation (ICC) and coefficient of variation. Using abdominal CT images from 68 patients with various stages of chronic liver disease, inter-observer agreement and test-retest repeatability among 12 readers assessing LSN by software- vs. visual-based scoring methods were evaluated by ICC. There was excellent repeatability of LSN scores (ICC:0.79-0.99) using the CT phantom and routine image acquisition and reconstruction parameters (kVp 100-140, mA 200-400, and auto-mA, section thickness 1.25-5.0 mm, field of view 35-50 cm, and smooth or standard kernels). There was excellent reproducibility (smooth ICC: 0.97; 95% CI 0.95, 0.99; CV: 7%; nodular ICC: 0.94; 95% CI 0.89, 0.97; CV: 8%) for LSN scores derived from CT images from 22 different scanners. Inter-observer agreement for the software-based LSN scoring method was excellent (ICC: 0.84; 95% CI 0.79, 0.88; CV: 28%) vs. good for the visual-based method (ICC: 0.61; 95% CI 0.51, 0.69; CV: 43%). Test-retest repeatability for the software-based LSN scoring method was excellent (ICC: 0.82; 95% CI 0.79, 0.84; CV: 12%). The software-based LSN score is a quantitative CT imaging biomarker with excellent repeatability, reproducibility, inter-observer agreement, and test-retest repeatability.
Peña-Casanova, Jordi; Quiñones-Ubeda, Sonia; Gramunt-Fombuena, Nina; Aguilar, Miquel; Casas, Laura; Molinuevo, José Luis; Robles, Alfredo; Rodríguez, Dolores; Barquero, María Sagrario; Antúnez, Carmen; Martínez-Parra, Carlos; Frank-García, Anna; Fernández, Manuel; Molano, Ana; Alfonso, Verónica; Sol, Josep M; Blesa, Rafael
2009-06-01
As part of the Spanish Multicenter Normative Studies (NEURONORMA project), we provide age- and education-adjusted norms for the Boston naming test and Token test. The sample consists of 340 and 348 participants, respectively, who are cognitively normal, community-dwelling, and ranging in age from 50 to 94 years. Tables are provided to convert raw scores to age-adjusted scaled scores. These were further converted into education-adjusted scaled scores by applying regression-based adjustments. Age and education affected the score of the both tests, but sex was found to be unrelated to naming and verbal comprehension efficiency. Our norms should provide clinically useful data for evaluating elderly Spaniards. The normative data presented here were obtained from the same study sample as all the other NEURONORMA norms and the same statistical procedures for data analyses were applied. These co-normed data allow clinicians to compare scores from one test with all tests.
Jones, Nathaniel S; Walter, Kevin D; Caplinger, Roger; Wright, Daniel; Raasch, William G; Young, Craig
2014-07-01
The purpose of the present study was to investigate the possible effects of sociocultural influences, specifically pertaining to language and education, on baseline neuropsychological concussion testing as obtained via immediate postconcussion assessment and cognitive testing (ImPACT) of players from a professional baseball team. A retrospective chart review. Baseline testing of a professional baseball organization. Four hundred five professional baseball players. Age, languages spoken, hometown country location (United States/Canada vs overseas), and years of education. The 5 ImPACT composite scores (verbal memory, visual memory, visual motor speed, reaction time, impulse control) and ImPACT total symptom score from the initial baseline testing. The result of t tests revealed significant differences (P < 0.05) when comparing native English to native Spanish speakers in many scores. Even when corrected for education, the significant differences (P < 0.05) remained in some scores. Sociocultural differences may result in differences in computer-based neuropsychological testing scores.
ERIC Educational Resources Information Center
Liu, Liqun; Neilson, William S.
2011-01-01
In this paper college admissions are based on test scores and students can exert two types of effort: real learning and exam preparation. The former improves skills but the latter is more effective in raising test scores. In this setting the students with the lowest skills are no longer the ones with the lowest aptitude, but instead are the ones…
Li, Leah
2012-01-01
Summary Studies of cognitive development in children are often based on tests designed for specific ages. Examination of the changes of these scores over time may not be meaningful. This paper investigates the influence of early life factors on cognitive development using maths and reading test scores at ages 7, 11, and 16 years in a British birth cohort born in 1958. The distributions of these test scores differ between ages, for example, 20% participants scored the top mark in the reading test at 7 and the distribution of reading score at 16 is heavily skewed. In this paper, we group participants into 5 ordered categories, approximately 20% in each category according to their test scores at each age. Multilevel models for a repeated ordinal outcome are applied to relate the ordinal scale of maths and reading ability to early life factors. PMID:22661923
A new computer-based Farnsworth Munsell 100-hue test for evaluation of color vision.
Ghose, Supriyo; Parmar, Twinkle; Dada, Tanuj; Vanathi, Murugesan; Sharma, Sourabh
2014-08-01
To evaluate a computer-based Farnsworth-Munsell (FM) 100-hue test and compare it with a manual FM 100-hue test in normal and congenital color-deficient individuals. Fifty color defective subjects and 200 normal subjects with a best-corrected visual acuity ≥ 6/12 were compared using a standard manual FM 100-hue test and a computer-based FM 100-hue test under standard operating conditions as recommended by the manufacturer after initial trial testing. Parameters evaluated were total error scores (TES), type of defect and testing time. Pearson's correlation coefficient was used to determine the relationship between the test scores. Cohen's kappa was used to assess agreement of color defect classification between the two tests. A receiver operating characteristic curve was used to determine the optimal cut-off score for the computer-based FM 100-hue test. The mean time was 16 ± 1.5 (range 6-20) min for the manual FM 100-hue test and 7.4 ± 1.4 (range 5-13) min for the computer-based FM 100-hue test, thus reducing testing time to <50 % (p < 0.05). For grading color discrimination, Pearson's correlation coefficient for TES between the two tests was 0.91 (p < 0.001). For color defect classification, Cohen's agreement coefficient was 0.98 (p < 0.01). The computer-based FM 100-hue is an effective and rapid method for detecting, classifying and grading color vision anomalies.
Rosen, Jules; Mulsant, Benoit H; Marino, Patricia; Groening, Christopher; Young, Robert C; Fox, Debra
2008-10-30
Despite the importance of establishing shared scoring conventions and assessing interrater reliability in clinical trials in psychiatry, these elements are often overlooked. Obstacles to rater training and reliability testing include logistic difficulties in providing live training sessions, or mailing videotapes of patients to multiple sites and collecting the data for analysis. To address some of these obstacles, a web-based interactive video system was developed. It uses actors of diverse ages, gender and race to train raters how to score the Hamilton Depression Rating Scale and to assess interrater reliability. This system was tested with a group of experienced and novice raters within a single site. It was subsequently used to train raters of a federally funded multi-center clinical trial on scoring conventions and to test their interrater reliability. The advantages and limitations of using interactive video technology to improve the quality of clinical trials are discussed.
Effectiveness of Jigsaw learning compared to lecture-based learning in dental education.
Sagsoz, O; Karatas, O; Turel, V; Yildiz, M; Kaya, E
2017-02-01
The objective of this study was to evaluate the success levels of students using the Jigsaw learning method in dental education. Fifty students with similar grade point average (GPA) scores were selected and randomly assigned into one of two groups (n = 25). A pretest concerning 'adhesion and bonding agents in dentistry' was administered to all students before classes. The Jigsaw learning method was applied to the experimental group for 3 weeks. At the same time, the control group was taking classes using the lecture-based learning method. At the end of the 3 weeks, all students were retested (post-test) on the subject. A retention test was administered 3 weeks after the post-test. Mean scores were calculated for each test for the experimental and control groups, and the data obtained were analysed using the independent samples t-test. No significant difference was determined between the Jigsaw and lecture-based methods at pretest or post-test. The highest mean test score was observed in the post-test with the Jigsaw method. In the retention test, success with the Jigsaw method was significantly higher than that with the lecture-based method. The Jigsaw method is as effective as the lecture-based method. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Task-based learning versus problem-oriented lecture in neurology continuing medical education.
Vakani, Farhan; Jafri, Wasim; Ahmad, Amina; Sonawalla, Aziz; Sheerani, Mughis
2014-01-01
To determine whether general practitioners learned better with task-based learning or problem-oriented lecture in a Continuing Medical Education (CME) set-up. Quasi-experimental study. The Aga Khan University, Karachi campus, from April to June 2012. Fifty-nine physicians were given a choice to opt for either Task-based Learning (TBL) or Problem Oriented Lecture (PBL) in a continuing medical education set-up about headaches. The TBL group had 30 participants divided into 10 small groups, and were assigned case-based tasks. The lecture group had 29 participants. Both groups were given a pre and a post-test. Pre/post assessment was done using one-best MCQs. The reliability coefficient of scores for both the groups was estimated through Cronbach's alpha. An item analysis for difficulty and discriminatory indices was calculated for both the groups. Paired t-test was used to determine the difference between pre- and post-test scores of both groups. Independent t-test was used to compare the impact of the two teaching methods in terms of learning through scores produced by MCQ test. Cronbach's alpha was 0.672 for the lecture group and 0.881 for TBL group. Item analysis for difficulty (p) and discriminatory indexes (d) was obtained for both groups. The results for the lecture group showed pre-test (p) = 42% vs. post-test (p) = 43%; pre- test (d) = 0.60 vs. post-test (d) = 0.40. The TBL group showed pre -test (p) = 48% vs. post-test (p) = 70%; pre-test (d) = 0.69 vs. post-test (d) = 0.73. Lecture group pre-/post-test mean scores were (8.52 ± 2.95 vs. 12.41 ± 2.65; p < 0.001), where TBL group showed (9.70 ± 3.65 vs. 14 ± 3.99; p < 0.001). Independent t-test exhibited an insignificant difference at baseline (lecture 8.52 ± 2.95 vs. TBL 9.70 ± 3.65; p = 0.177). The post-scores were not statistically different lecture 12.41 ± 2.65 vs. TBL 14 ± 3.99; p = 0.07). Both delivery methods were found to be equally effective, showing statistically insignificant differences. However, TBL groups' post-test higher mean scores and radical increase in the post-test difficulty index demonstrated improved learning through TBL delivery and calls for further exploration of longitudinal studies in the context of CME.
Equivalency of Computer-based and Paper-and-pencil Testing.
ERIC Educational Resources Information Center
DeAngelis, Susan
2000-01-01
Dental hygiene students (n=15) took a first examination on computer then paper; 15 others took the paper test first. Computer test scores were higher than paper for the first exam. Student acceptance of the computer format was mixed. Computer exams reduced scoring and grade reporting time. (SK)
Relationships of Declining Test Scores and Grade Inflation.
ERIC Educational Resources Information Center
Bellott, Fred K.
The relationship between declining scores on national standardized tests and grade inflation is explored. Grade inflation refers to the indicated measure of evaluation of student performance having higher placement than is usual based on the performances. Data for this study were taken from the American College Testing (ACT) Program Class Profile…
Mata, Caio Augusto Sterse; Ota, Luiz Hirotoshi; Suzuki, Iunis; Telles, Adriana; Miotto, Andre; Leão, Luiz Eduardo Vilaça
2012-01-01
This study compares the traditional live lecture to a web-based approach in the teaching of bronchoscopy and evaluates the positive and negative aspects of both methods. We developed a web-based bronchoscopy curriculum, which integrates texts, images and animations. It was applied to first-year interns, who were later administered a multiple-choice test. Another group of eight first-year interns received the traditional teaching method and the same test. The two groups were compared using the Student's t-test. The mean scores (± SD) of students who used the website were 14.63 ± 1.41 (range 13-17). The test scores of the other group had the same range, with a mean score of 14.75 ± 1. The Student's t-test showed no difference between the test results. The common positive point noted was the presence of multimedia content. The web group cited as positive the ability to review the pages, and the other one the role of the teacher. Web-based bronchoscopy education showed results similar to the traditional live lecture in effectiveness.
Tests for detecting overdispersion in models with measurement error in covariates.
Yang, Yingsi; Wong, Man Yu
2015-11-30
Measurement error in covariates can affect the accuracy in count data modeling and analysis. In overdispersion identification, the true mean-variance relationship can be obscured under the influence of measurement error in covariates. In this paper, we propose three tests for detecting overdispersion when covariates are measured with error: a modified score test and two score tests based on the proposed approximate likelihood and quasi-likelihood, respectively. The proposed approximate likelihood is derived under the classical measurement error model, and the resulting approximate maximum likelihood estimator is shown to have superior efficiency. Simulation results also show that the score test based on approximate likelihood outperforms the test based on quasi-likelihood and other alternatives in terms of empirical power. By analyzing a real dataset containing the health-related quality-of-life measurements of a particular group of patients, we demonstrate the importance of the proposed methods by showing that the analyses with and without measurement error correction yield significantly different results. Copyright © 2015 John Wiley & Sons, Ltd.
Item response theory scoring and the detection of curvilinear relationships.
Carter, Nathan T; Dalal, Dev K; Guan, Li; LoPilato, Alexander C; Withrow, Scott A
2017-03-01
Psychologists are increasingly positing theories of behavior that suggest psychological constructs are curvilinearly related to outcomes. However, results from empirical tests for such curvilinear relations have been mixed. We propose that correctly identifying the response process underlying responses to measures is important for the accuracy of these tests. Indeed, past research has indicated that item responses to many self-report measures follow an ideal point response process-wherein respondents agree only to items that reflect their own standing on the measured variable-as opposed to a dominance process, wherein stronger agreement, regardless of item content, is always indicative of higher standing on the construct. We test whether item response theory (IRT) scoring appropriate for the underlying response process to self-report measures results in more accurate tests for curvilinearity. In 2 simulation studies, we show that, regardless of the underlying response process used to generate the data, using the traditional sum-score generally results in high Type 1 error rates or low power for detecting curvilinearity, depending on the distribution of item locations. With few exceptions, appropriate power and Type 1 error rates are achieved when dominance-based and ideal point-based IRT scoring are correctly used to score dominance and ideal point response data, respectively. We conclude that (a) researchers should be theory-guided when hypothesizing and testing for curvilinear relations; (b) correctly identifying whether responses follow an ideal point versus dominance process, particularly when items are not extreme is critical; and (c) IRT model-based scoring is crucial for accurate tests of curvilinearity. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Comparison of Lecture-Based Learning vs Discussion-Based Learning in Undergraduate Medical Students.
Zhao, Beiqun; Potter, Donald D
2016-01-01
To compare lecture-based learning (LBL) and discussion-based learning (DBL) by assessing immediate and long-term knowledge retention and application of practical knowledge in third- and fourth-year medical students. A prospective, randomized control trial was designed to study the effects of DBL. Medical students were randomly assigned to intervention (DBL) or control (LBL) groups. Both the groups were instructed regarding the management of gastroschisis. The control group received a PowerPoint presentation, whereas the intervention group was guided only by an objectives list and a gastroschisis model. Students were evaluated using a multiple-choice pretest (Pre-Test MC) immediately before the teaching session, a posttest (Post-Test MC) following the session, and a follow-up test (Follow-Up MC) at 3 months. A practical examination (PE), which tested simple skills and management decisions, was administered at the end of the clerkship (Initial PE) and at 3 months after clerkship (Follow-Up PE). Students were also given a self-evaluation immediately following the Post-Test MC to gauge satisfaction and comfort level in the management of gastroschisis. University of Iowa Hospitals and Clinics and the Carver College of Medicine, Iowa City, IA. A total of 49 third- and fourth-year medical students who were enrolled in the general surgery clerkship were eligible for this study. Enrollment into the study was completely voluntary. Of the 49 eligible students, 36 students agreed to participate in the study, and 27 completed the study. Mean scores for the Pre-Test MC, Post-Test MC, and Follow-Up MC were similar between the control and intervention groups. In the control group, the Post-Test MC scores were significantly greater than Pre-Test MC scores (8.92 ± 0.79 vs 4.00 ± 1.04, p < 0.0001), whereas the Follow-Up MC scores were significantly lower than Post-Test MC scores (7.17 ± 1.75 vs 8.92 ± 0.79, p = 0.005). In the control group, the Follow-Up MC scores were significantly greater than Pre-Test MC scores (7.17 ± 1.75 vs 4.00 ± 1.04, p < 0.0001). Analysis of variance for all control group MC examinations had a p < 0.0001. In the intervention group, the Post-Test MC scores were significantly greater than Pre-Test MC scores (8.33 ± 1.23 vs 4.60 ± 1.55, p < 0.0001), whereas the Follow-Up MC scores were significantly lower than Post-Test MC scores (7.13 ± 1.77 vs 8.33 ± 1.23, p = 0.04). In the intervention group, the Follow-Up MC scores were significantly greater than Pre-Test MC scores (7.13 ± 1.77 vs 4.60 ± 1.55, p = 0.0002). Analysis of variance for all intervention group MC examinations had a p < 0.0001. Mean scores for the Initial PE were significantly higher for the intervention group compared with the control group's score (7.47 ± 1.68 vs 5.25 ± 2.34, p = 0.008). Mean scores for the Follow-Up PE were significantly higher for the intervention group compared with the control group's score (7.87 ± 1.77 vs 5.83 ± 2.04, p = 0.005). A comparison of Initial PE vs Follow-Up PE was not significant in either group. Students in the intervention group were more comfortable in the immediate management of gastroschisis and placement of a silo and felt that the educational experience was more worthwhile than students in the control group did. After a single instructional session, there was a significant difference in the students' scores between the control and the intervention groups on both administrations of the PEs. There were no significant differences between the 2 groups in any administration of the MC examinations. This seems to suggest that DBL may lead to better practical knowledge and potentially improved long-term knowledge retention when compared with LBL. Students in the DBL group also felt more comfortable with the management of gastroschisis and were more satisfied with the educational session. Copyright © 2015 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
The Effect of Pretest Exercise on Baseline Computerized Neurocognitive Test Scores.
Pawlukiewicz, Alec; Yengo-Kahn, Aaron M; Solomon, Gary
2017-10-01
Baseline neurocognitive assessment plays a critical role in return-to-play decision making following sport-related concussions. Prior studies have assessed the effect of a variety of modifying factors on neurocognitive baseline test scores. However, relatively little investigation has been conducted regarding the effect of pretest exercise on baseline testing. The aim of our investigation was to determine the effect of pretest exercise on baseline Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) scores in adolescent and young adult athletes. We hypothesized that athletes undergoing self-reported strenuous exercise within 3 hours of baseline testing would perform more poorly on neurocognitive metrics and would report a greater number of symptoms than those who had not completed such exercise. Cross-sectional study; Level of evidence, 3. The ImPACT records of 18,245 adolescent and young adult athletes were retrospectively analyzed. After application of inclusion and exclusion criteria, participants were dichotomized into groups based on a positive (n = 664) or negative (n = 6609) self-reported history of strenuous exercise within 3 hours of the baseline test. Participants with a positive history of exercise were then randomly matched, based on age, sex, education level, concussion history, and hours of sleep prior to testing, on a 1:2 basis with individuals who had reported no pretest exercise. The baseline ImPACT composite scores of the 2 groups were then compared. Significant differences were observed for the ImPACT composite scores of verbal memory, visual memory, reaction time, and impulse control as well as for the total symptom score. No significant between-group difference was detected for the visual motor composite score. Furthermore, pretest exercise was associated with a significant increase in the overall frequency of invalid test results. Our results suggest a statistically significant difference in ImPACT composite scores between individuals who report strenuous exercise prior to baseline testing compared with those who do not. Since return-to-play decision making often involves documentation of return to neurocognitive baseline, the baseline test scores must be valid and accurate. As a result, we recommend standardization of baseline testing such that no strenuous exercise takes place 3 hours prior to test administration.
Lee, Yi-Hsuan; von Davier, Alina A
2013-07-01
Maintaining a stable score scale over time is critical for all standardized educational assessments. Traditional quality control tools and approaches for assessing scale drift either require special equating designs, or may be too time-consuming to be considered on a regular basis with an operational test that has a short time window between an administration and its score reporting. Thus, the traditional methods are not sufficient to catch unusual testing outcomes in a timely manner. This paper presents a new approach for score monitoring and assessment of scale drift. It involves quality control charts, model-based approaches, and time series techniques to accommodate the following needs of monitoring scale scores: continuous monitoring, adjustment of customary variations, identification of abrupt shifts, and assessment of autocorrelation. Performance of the methodologies is evaluated using manipulated data based on real responses from 71 administrations of a large-scale high-stakes language assessment.
Arneja, Jugpal S; Narasimhan, Kailash; Bouwman, David; Bridge, Patrick D
2009-12-01
In-training evaluations in graduate medical education have typically been challenging. Although the majority of standardized examination delivery methods have become computer-based, in-training examinations generally remain pencil-paper-based, if they are performed at all. Audience response systems present a novel way to stimulate and evaluate the resident-learner. The purpose of this study was to assess the outcomes of audience response systems testing as compared with traditional testing in a plastic surgery residency program. A prospective 1-year pilot study of 10 plastic surgery residents was performed using audience response systems-delivered testing for the first half of the academic year and traditional pencil-paper testing for the second half. Examination content was based on monthly "Core Quest" curriculum conferences. Quantitative outcome measures included comparison of pretest and posttest and cumulative test scores of both formats. Qualitative outcomes from the individual participants were obtained by questionnaire. When using the audience response systems format, pretest and posttest mean scores were 67.5 and 82.5 percent, respectively; using traditional pencil-paper format, scores were 56.5 percent and 79.5 percent. A comparison of the cumulative mean audience response systems score (85.0 percent) and traditional pencil-paper score (75.0 percent) revealed statistically significantly higher scores with audience response systems (p = 0.01). Qualitative outcomes revealed increased conference enthusiasm, greater enjoyment of testing, and no user difficulties with the audience response systems technology. The audience response systems modality of in-training evaluation captures participant interest and reinforces material more effectively than traditional pencil-paper testing does. The advantages include a more interactive learning environment, stimulation of class participation, immediate feedback to residents, and immediate tabulation of results for the educator. Disadvantages include start-up costs and lead-time preparation.
ERIC Educational Resources Information Center
Valant, Jon; Newark, Daniel A.
2016-01-01
For decades, researchers have documented large differences in average test scores between minority and White students and between poor and wealthy students. These gaps are a focal point of reformers' and policymakers' efforts to address educational inequities. However, the U.S. public's views on achievement gaps have received little attention from…
An Analysis of a Digital Variant of the Trail Making Test Using Machine Learning Techniques
Dahmen, Jessamyn; Cook, Diane; Fellows, Robert; Schmitter-Edgecombe, Maureen
2017-01-01
BACKGROUND The goal of this work is to develop a digital version of a standard cognitive assessment, the Trail Making Test (TMT), and assess its utility. OBJECTIVE This paper introduces a novel digital version of the TMT and introduces a machine learning based approach to assess its capabilities. METHODS Using digital Trail Making Test (dTMT) data collected from (N=54) older adult participants as feature sets, we use machine learning techniques to analyze the utility of the dTMT and evaluate the insights provided by the digital features. RESULTS Predicted TMT scores correlate well with clinical digital test scores (r=0.98) and paper time to completion scores (r=0.65). Predicted TICS exhibited a small correlation with clinically-derived TICS scores (r=0.12 Part A, r=0.10 Part B). Predicted FAB scores exhibited a small correlation with clinically-derived FAB scores (r=0.13 Part A, r=0.29 for Part B). Digitally-derived features were also used to predict diagnosis (AUC of 0.65). CONCLUSION Our findings indicate that the dTMT is capable of measuring the same aspects of cognition as the paper-based TMT. Furthermore, the dTMT’s additional data may be able to help monitor other cognitive processes not captured by the paper-based TMT alone. PMID:27886019
Fundamental Use of Surgical Energy (FUSE) certification: validation and predictors of success.
Robinson, Thomas N; Olasky, Jaisa; Young, Patricia; Feldman, Liane S; Fuchshuber, Pascal R; Jones, Stephanie B; Madani, Amin; Brunt, Michael; Mikami, Dean; Jackson, Gretchen P; Mischna, Jessica; Schwaitzberg, Steven; Jones, Daniel B
2016-03-01
The Fundamental Use of Surgical Energy (FUSE) program includes a Web-based didactic curriculum and a high-stakes multiple-choice question examination with the goal to provide certification of knowledge on the safe use of surgical energy-based devices. The purpose of this study was (1) to set a passing score through a psychometrically sound process and (2) to determine what pretest factors predicted passing the FUSE examination. Beta-testing of multiple-choice questions on 62 topics of importance to the safe use of surgical energy-based devices was performed. Eligible test takers were physicians with a minimum of 1 year of surgical training who were recruited by FUSE task force members. A pretest survey collected baseline information. A total of 227 individuals completed the FUSE beta-test, and 208 completed the pretest survey. The passing/cut score for the first test form of the FUSE multiple-choice examination was determined using the modified Angoff methodology and for the second test form was determined using a linear equating methodology. The overall passing rate across the two examination forms was 81.5%. Self-reported time studying the FUSE Web-based curriculum for a minimum of >2 h was associated with a passing examination score (p < 0.001). Performance was not different based on increased years of surgical practice (p = 0.363), self-reported expertise on one or more types of energy-based devices (p = 0.683), participation in the FUSE postgraduate course (p = 0.426), or having reviewed the FUSE manual (p = 0.428). Logistic regression found that studying the FUSE didactics for >2 h predicted a passing score (OR 3.61; 95% CI 1.44-9.05; p = 0.006) independent of the other baseline characteristics recorded. The development of the FUSE examination, including the passing score, followed a psychometrically sound process. Self-reported time studying the FUSE curriculum predicted a passing score independent of other pretest characteristics such as years in practice and self-reported expertise.
ERIC Educational Resources Information Center
Zapata-Rivera, Diego, Ed.; Zwick, Rebecca, Ed.
2011-01-01
This volume includes 3 papers based on presentations at a workshop on communicating assessment information to particular audiences, held at Educational Testing Service (ETS) on November 4th, 2010, to explore some issues that influence score reports and new advances that contribute to the effectiveness of these reports. Jessica Hullman, Rebecca…
Li, Yang; Yang, Jianyi
2017-04-24
The prediction of protein-ligand binding affinity has recently been improved remarkably by machine-learning-based scoring functions. For example, using a set of simple descriptors representing the atomic distance counts, the RF-Score improves the Pearson correlation coefficient to about 0.8 on the core set of the PDBbind 2007 database, which is significantly higher than the performance of any conventional scoring function on the same benchmark. A few studies have been made to discuss the performance of machine-learning-based methods, but the reason for this improvement remains unclear. In this study, by systemically controlling the structural and sequence similarity between the training and test proteins of the PDBbind benchmark, we demonstrate that protein structural and sequence similarity makes a significant impact on machine-learning-based methods. After removal of training proteins that are highly similar to the test proteins identified by structure alignment and sequence alignment, machine-learning-based methods trained on the new training sets do not outperform the conventional scoring functions any more. On the contrary, the performance of conventional functions like X-Score is relatively stable no matter what training data are used to fit the weights of its energy terms.
A Simple Symptom Score for Acute HIV Infection in a San Diego Community Based Screening Program.
Lin, Timothy C; Gianella, Sara; Tenenbaum, Tara; Little, Susan J; Hoenigl, Martin
2017-12-25
Treatment of acute HIV infection (AHI) decreases transmission and preserves immune function, but AHI diagnosis remains resource-intensive. Risk-based scores predictive for AHI have been described for high-risk groups, however symptom-based scores could be more generalizable across populations. Adults who tested either positive for AHI (antibody-negative, HIV nucleic acid test [NAT]-positive) or HIV NAT-negative with the community-based Early Test HIV screening program in San Diego were retrospectively randomized 2:1 into a derivation and validation set. In the former, symptoms significant for AHI in a multivariate logistic regression model were assigned a score value (the odds ratio rounded to the nearest integer). The score was assessed in the validation set using receiver operating characteristics and areas under the curve (AUC). An optimal cut-off score was found using Youden's index. Of 998 participants (including 737 men who have sex with men (MSM), 149 non-MSM men, 109 ciswomen and 3 trans women), 113 had AHI (including 109 MSM). Compared to HIV-negative cases, AHI cases reported more symptoms (median 4 vs 0, p<0.01). Fever, myalgia and weight loss were significantly associated with AHI in the multivariate model and corresponded to 11, 8 and 4 score points, respectively. The summed score yielded AUC of 0.85 (95%CI 0.77-0.93). A score of ≥11 was 72% sensitive, 96% specific with diagnostic odds ratio of 70.27 (95%CI 28.14-175.93). A 3-symptom score accurately predicted AHI in a community based screening program and may inform allocation of resources in settings that do not routinely screen for AHI. © The Author(s) 2017. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail: journals.permissions@oup.com.
Zeng, Rui; Xiang, Lian-rui; Zeng, Jing; Zuo, Chuan
2017-01-01
Background We aimed to introduce team-based learning (TBL) as one of the teaching methods for diagnostics and to compare its teaching effectiveness with that of the traditional teaching methods. Methods We conducted a randomized controlled trial on diagnostics teaching involving 111 third-year medical undergraduates, using TBL as the experimental intervention, compared with lecture-based learning as the control, for teaching the two topics of symptomatology. Individual Readiness Assurance Test (IRAT)-baseline and Group Readiness Assurance Test (GRAT) were performed in members of each TBL subgroup. The scores in Individual Terminal Test 1 (ITT1) immediately after class and Individual Terminal Test 2 (ITT2) 1 week later were compared between the two groups. The questionnaire and interview were also implemented to survey the attitude of students and teachers toward TBL. Results There was no significant difference between the two groups in ITT1 (19.85±4.20 vs 19.70±4.61), while the score of the TBL group was significantly higher than that of the control group in ITT2 (19.15±3.93 vs 17.46±4.65). In the TBL group, the scores of the two terminal tests after the teaching intervention were significantly higher than the baseline test score of individuals. IRAT-baseline, ITT1, and ITT2 scores of students at different academic levels in the TBL teaching exhibited significant differences, but the ITT1-IRAT-baseline and ITT2-IRAT-baseline indicated no significant differences among the three subgroups. Conclusion Our TBL in symptomatology approach was highly accepted by students in the improvement of interest and self-directed learning and resulted in an increase in knowledge acquirements, which significantly improved short-term test scores compared with lecture-based learning. TBL is regarded as an effective teaching method worthy of promoting. PMID:28331383
Zeng, Rui; Xiang, Lian-Rui; Zeng, Jing; Zuo, Chuan
2017-01-01
We aimed to introduce team-based learning (TBL) as one of the teaching methods for diagnostics and to compare its teaching effectiveness with that of the traditional teaching methods. We conducted a randomized controlled trial on diagnostics teaching involving 111 third-year medical undergraduates, using TBL as the experimental intervention, compared with lecture-based learning as the control, for teaching the two topics of symptomatology. Individual Readiness Assurance Test (IRAT)-baseline and Group Readiness Assurance Test (GRAT) were performed in members of each TBL subgroup. The scores in Individual Terminal Test 1 (ITT1) immediately after class and Individual Terminal Test 2 (ITT2) 1 week later were compared between the two groups. The questionnaire and interview were also implemented to survey the attitude of students and teachers toward TBL. There was no significant difference between the two groups in ITT1 (19.85±4.20 vs 19.70±4.61), while the score of the TBL group was significantly higher than that of the control group in ITT2 (19.15±3.93 vs 17.46±4.65). In the TBL group, the scores of the two terminal tests after the teaching intervention were significantly higher than the baseline test score of individuals. IRAT-baseline, ITT1, and ITT2 scores of students at different academic levels in the TBL teaching exhibited significant differences, but the ITT1-IRAT-baseline and ITT2-IRAT-baseline indicated no significant differences among the three subgroups. Our TBL in symptomatology approach was highly accepted by students in the improvement of interest and self-directed learning and resulted in an increase in knowledge acquirements, which significantly improved short-term test scores compared with lecture-based learning. TBL is regarded as an effective teaching method worthy of promoting.
Taggart, Tamara; Taboada, Arianna; Stein, Judith A; Milburn, Norweeta G; Gere, David; Lightfoot, Alexandra F
2016-07-01
AMP! (Arts-based, Multiple component, Peer-education) is an HIV intervention developed for high school adolescents. AMP! uses interactive theater-based scenarios developed by trained college undergraduates to deliver messages addressing HIV/STI prevention strategies, healthy relationships, and stigma reduction towards people living with HIV/AIDS. We used a pre-test/post-test, control group study design to simultaneously assess intervention effect on ninth grade students in an urban county in California (N = 159) and a suburban county in North Carolina (N = 317). In each location, the control group received standard health education curricula delivered by teachers; the intervention group received AMP! in addition to standard health education curricula. Structural equation modeling was used to determine intervention effects. The post-test sample was 46 % male, 90 % self-identified as heterosexual, 32 % reported receiving free or reduced lunch, and 49 % White. Structural models indicated that participation in AMP! predicted higher scores on HIV knowledge (p = 0.05), HIV awareness (p = 0.01), and HIV attitudes (p = 0.05) at the post-test. Latent means comparison analyses revealed post-test scores were significantly higher than pre-test scores on HIV knowledge (p = 0.001), HIV awareness (p = 0.001), and HIV attitudes (p = 0.001). Further analyses indicated that scores rose for both groups, but the post-test scores of intervention participants were significantly higher than controls (HIV knowledge (p = 0.01), HIV awareness (p = 0.01), and HIV attitudes (p = 0.05)). Thus, AMP!'s theater-based approach shows promise for addressing multiple adolescent risk factors and attitudes concerning HIV in school settings.
Peng, Jiangjun; Leung, Yee; Leung, Kwong-Sak; Wong, Man-Hon; Lu, Gang; Ballester, Pedro J.
2018-01-01
It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets, a widely used test set, and four SFs. Three of these SFs employ machine learning instead of the classical linear regression approach of the fourth SF (X-Score which has the best test set performance out of 16 classical SFs). We have found that random forest (RF)-based RF-Score-v3 outperforms X-Score even when 68% of the most similar proteins are removed from the training set. In addition, unlike X-Score, RF-Score-v3 is able to keep learning with an increasing training set size, becoming substantially more predictive than X-Score when the full 1105 complexes are used for training. These results show that machine-learning SFs owe a substantial part of their performance to training on complexes with dissimilar proteins to those in the test set, against what has been previously concluded using the same data. Given that a growing amount of structural and interaction data will be available from academic and industrial sources, this performance gap between machine-learning SFs and classical SFs is expected to enlarge in the future. PMID:29538331
Li, Hongjian; Peng, Jiangjun; Leung, Yee; Leung, Kwong-Sak; Wong, Man-Hon; Lu, Gang; Ballester, Pedro J
2018-03-14
It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets, a widely used test set, and four SFs. Three of these SFs employ machine learning instead of the classical linear regression approach of the fourth SF (X-Score which has the best test set performance out of 16 classical SFs). We have found that random forest (RF)-based RF-Score-v3 outperforms X-Score even when 68% of the most similar proteins are removed from the training set. In addition, unlike X-Score, RF-Score-v3 is able to keep learning with an increasing training set size, becoming substantially more predictive than X-Score when the full 1105 complexes are used for training. These results show that machine-learning SFs owe a substantial part of their performance to training on complexes with dissimilar proteins to those in the test set, against what has been previously concluded using the same data. Given that a growing amount of structural and interaction data will be available from academic and industrial sources, this performance gap between machine-learning SFs and classical SFs is expected to enlarge in the future.
Velickaite, V; Ferreira, D; Cavallin, L; Lind, L; Ahlström, H; Kilander, L; Westman, E; Larsson, E-M
2018-04-01
To find cut-off values for different medial temporal lobe atrophy (MTA) measures (right, left, average, and highest), accounting for gender and education, investigate the association with cognitive performance, and to compare with decline of cognitive function over 5 years in a large population-based cohort. Three hundred and ninety 75-year-old individuals were examined with magnetic resonance imaging of the brain and cognitive testing. The Scheltens's scale was used to assess visually MTA scores (0-4) in all subjects. Cognitive tests were repeated in 278 of them after 5 years. Normal MTA cut-off values were calculated based on the 10th percentile. Most 75-year-old individuals had MTA score ≤2. Men had significantly higher MTA scores than women. Scores for left and average MTA were significantly higher in highly educated individuals. Abnormal MTA was associated with worse results in cognitive test and individuals with abnormal right MTA had faster cognitive decline. At age 75, gender and education are confounders for MTA grading. A score of ≥2 is abnormal for low-educated women and a score of ≥2.5 is abnormal for men and high-educated women. Subjects with abnormal right MTA, but normal MMSE scores had developed worse MMSE scores 5 years later. • Gender and education are confounders for MTA grading. • We suggest cut-off values for 75-year-olds, taking gender and education into account. • Males have higher MTA scores than women. • Higher MTA scores are associated with worse cognitive performance.
ERIC Educational Resources Information Center
Koedel, Cory; Betts, Julian
2009-01-01
Value-added measures of teacher quality may be sensitive to the quantitative properties of the student tests upon which they are based. This paper focuses on the sensitivity of value- added to test-score-ceiling effects. Test-score ceilings are increasingly common in testing instruments across the country as education policy continues to emphasize…
Loanwords and Vocabulary Size Test Scores: A Case of Different Estimates for Different L1 Learners
ERIC Educational Resources Information Center
Laufer, Batia; McLean, Stuart
2016-01-01
The article investigated how the inclusion of loanwords in vocabulary size tests affected the test scores of two L1 groups of EFL learners: Hebrew and Japanese. New BNC- and COCA-based vocabulary size tests were constructed in three modalities: word form recall, word form recognition, and word meaning recall. Depending on the test modality, the…
2017-01-01
Background Nonadherence produces considerable health consequences and economic burden to patients and payers. One approach to improve medication nonadherence that has gained interest in recent years is the use of smartphone adherence apps. The development of smartphone adherence apps has increased rapidly since 2012; however, literature evaluating the clinical app and effectiveness of smartphone adherence apps to improve medication adherence is generally lacking. Objective The aims of this study were to (1) provide an updated evaluation and comparison of medication adherence apps in the marketplace by assessing the features, functionality, and health literacy (HL) of the highest-ranking adherence apps and (2) indirectly measure the validity of our rating methodology by determining the relationship between our app evaluations and Web-based consumer ratings. Methods Two independent reviewers assessed the features and functionality using a 4-domain rating tool of all adherence apps identified based on developer claims. The same reviewers downloaded and tested the 100 highest-ranking apps including an additional domain for assessment of HL. Pearson product correlations were estimated between the consumer ratings and our domain and total scores. Results A total of 824 adherence apps were identified; of these, 645 unique apps were evaluated after applying exclusion criteria. The median initial score based on descriptions was 14 (max of 68; range 0-60). As a result, 100 of the highest-scoring unique apps underwent user testing. The median overall user-tested score was 31.5 (max of 73; range 0-60). The majority of the user tested the adherence apps that underwent user testing reported a consumer rating score in their respective online marketplace. The mean consumer rating was 3.93 (SD 0.84). The total user-tested score was positively correlated with consumer ratings (r=.1969, P=.04). Conclusions More adherence apps are available in the Web-based marketplace, and the quality of these apps varies considerably. Consumer ratings are positively but weakly correlated with user-testing scores suggesting that our rating tool has some validity but that consumers and clinicians may assess adherence app quality differently. PMID:28428169
Reddy, Yogesh N V; Carter, Rickey E; Obokata, Masaru; Redfield, Margaret M; Borlaug, Barry A
2018-05-23
Background -Diagnosis of heart failure with preserved ejection fraction (HFpEF) is challenging in euvolemic patients with dyspnea, and no evidence-based criteria are available. We sought to develop and then validate non-invasive diagnostic criteria that could be used to estimate the likelihood that HFpEF is present among patients with unexplained dyspnea in order to guide further testing. Methods -Consecutive patients with unexplained dyspnea referred for invasive hemodynamic exercise testing were retrospectively evaluated. Diagnosis of HFpEF (case) or non-cardiac dyspnea (control) was ascertained by invasive hemodynamic exercise testing. Logistic regression was performed to evaluate the ability of clinical findings to discriminate cases from controls. A scoring system was developed and then validated in a separate test cohort. Results -The derivation cohort included 414 consecutive patients (267 HFpEF and 147 controls, HFpEF prevalence 64%). The test cohort included 100 consecutive patients (61 HFpEF, prevalence 61%). Obesity, atrial fibrillation, age>60 years, treatment with 2 or more antihypertensives, echocardiographic E/e' ratio>9 and echocardiographic pulmonary artery systolic pressure>35 mmHg were selected as the final set of predictive variables. A weighted score based on these six variables was used to create a composite score (H 2 FPEF score) ranging from 0-9. The odds of HFpEF doubled for each 1 unit score increase [OR 1.98 [1.74-2.30], p<0.0001], with an AUC of 0.841 (p<0.0001). The H 2 FPEF score was superior to a currently-used algorithm based upon expert consensus (increase in AUC of +0.169 [+0.120 to +0.217], p<0.0001). Performance in the independent test cohort was maintained [AUC 0.886, p<0.0001]. Conclusions -The H 2 FPEF score, which relies upon simple clinical characteristics and echocardiography, enables discrimination of HFpEF from non-cardiac causes of dyspnea, and can assist in determination of the need for further diagnostic testing in the evaluation of patients with unexplained exertional dyspnea.
Does Matching Quality Matter in Mode Comparison Studies?
ERIC Educational Resources Information Center
Zeng, Ji; Yin, Ping; Shedden, Kerby A.
2015-01-01
This article provides a brief overview and comparison of three matching approaches in forming comparable groups for a study comparing test administration modes (i.e., computer-based tests [CBT] and paper-and-pencil tests [PPT]): (a) a propensity score matching approach proposed in this article, (b) the propensity score matching approach used by…
Monitoring the Performance of Human and Automated Scores for Spoken Responses
ERIC Educational Resources Information Center
Wang, Zhen; Zechner, Klaus; Sun, Yu
2018-01-01
As automated scoring systems for spoken responses are increasingly used in language assessments, testing organizations need to analyze their performance, as compared to human raters, across several dimensions, for example, on individual items or based on subgroups of test takers. In addition, there is a need in testing organizations to establish…
Qualitative Dimensions in Scoring the Rey Visual Memory Test of Malingering.
ERIC Educational Resources Information Center
Griffin, G. A. Elmer; And Others
1996-01-01
A new qualitative scoring system for the Rey Visual Memory Test was tested for its ability to distinguish between malingerers and nonmalingerers. The new system, based on the types of errors made, was able to distinguish between 53 psychiatrically disabled and 64 normal nonmalingerers, and between nonmalingerers and 91 possible malingerers. (SLD)
Rajan, Shobana; Khanna, Ashish; Argalious, Maged; Kimatian, Stephen J; Mascha, Edward J; Makarova, Natalya; Nada, Eman M; Elsharkawy, Hesham; Firoozbakhsh, Farhad; Avitsian, Rafi
2016-02-01
Simulation-based learning is emerging as an alternative educational tool in this era of a relative shortfall of teaching anesthesiologists. The objective of the study is to assess whether screen-based (interactive computer simulated) case scenarios are more effective than problem-based learning discussions (PBLDs) in improving test scores 4 and 8 weeks after these interventions in anesthesia residents during their first neuroanesthesia rotation. Prospective, nonblinded quasi-crossover study. Cleveland Clinic. Anesthesiology residents. Two case scenarios were delivered from the Anesoft software as screen-based sessions, and parallel scripts were developed for 2 PBLDs. Each resident underwent both types of training sessions, starting with the PBLD session, and the 2 cases were alternated each month (ie, in 1 month, the screen-based intervention used case 1 and the PBLD used case 2, and vice versa for the next month). Test scores before the rotation (baseline), immediately after the rotation (4 weeks after the start of the rotation), and 8 weeks after the start of rotation were collected on each topic from each resident. The effect of training method on improvement in test scores was assessed using a linear mixed-effects model. Compared to the departmental standard of PBLD, the simulation method did not improve either the 4- or 8-week mean test scores (P = .41 and P = .40 for training method effect on 4- and 8-week scores, respectively). Resident satisfaction with the simulation module on a 5-point Likert scale showed subjective evidence of a positive impact on resident education. Screen-based simulators were not more effective than PBLD for education during the neuroanesthesia rotation in anesthesia residency. Copyright © 2016 Elsevier Inc. All rights reserved.
Paap, Kenneth R; Sawi, Oliver
2016-12-01
Studies testing for individual or group differences in executive functioning can be compromised by unknown test-retest reliability. Test-retest reliabilities across an interval of about one week were obtained from performance in the antisaccade, flanker, Simon, and color-shape switching tasks. There is a general trade-off between the greater reliability of single mean RT measures, and the greater process purity of measures based on contrasts between mean RTs in two conditions. The individual differences in RT model recently developed by Miller and Ulrich was used to evaluate the trade-off. Test-retest reliability was statistically significant for 11 of the 12 measures, but was of moderate size, at best, for the difference scores. The test-retest reliabilities for the Simon and flanker interference scores were lower than those for switching costs. Standard practice evaluates the reliability of executive-functioning measures using split-half methods based on data obtained in a single day. Our test-retest measures of reliability are lower, especially for difference scores. These reliability measures must also take into account possible day effects that classical test theory assumes do not occur. Measures based on single mean RTs tend to have acceptable levels of reliability and convergent validity, but are "impure" measures of specific executive functions. The individual differences in RT model shows that the impurity problem is worse than typically assumed. However, the "purer" measures based on difference scores have low convergent validity that is partly caused by deficiencies in test-retest reliability. Copyright © 2016 Elsevier B.V. All rights reserved.
The Usefulness of the Bock Model for Scoring with Information from Incorrect Responses.
ERIC Educational Resources Information Center
Huynh, Huynh; Casteel, Jim
1987-01-01
In the context of pass/fail decisions, using the Bock multi-nominal latent trait model for moderate-length tests does not produce decisions that differ substantially from those based on the raw scores. The Bock decisions appear to relate less strongly to outside criteria than those based on the raw scores. (Author/JAZ)
Resiliency in the Face of Adversity: A Short Longitudinal Test of the Trait Hypothesis.
Karaırmak, Özlem; Figley, Charles
2017-01-01
Resilience represents coping with adversity and is in line with a more positive paradigm for viewing responses to adversity. Most research has focused on resilience as coping-a state-based response to adversity. However, a competing hypothesis views resilience or resiliency as a trait that exists across time and types of adversity. We tested undergraduates enrolled in social work classes at a large southern university at two time periods during a single semester using measures of adversity, positive and negative affect, and trait-based resiliency. Consistent with the trait-based resiliency, and in contrast to state-based resilience, resiliency scores were not strongly correlated with adversity at both testing points but were with positive affect, and resiliency scores remained the same over time despite adversity variations. There was no gender or ethnic group difference in resilience scores. Black/African Americans reported significantly less negative affect and more positive affect than White/Caucasians.
Reliability Generalization of the Alcohol Use Disorder Identification Test.
ERIC Educational Resources Information Center
Shields, Alan L.; Caruso, John C.
2002-01-01
Evaluated the reliability of scores from the Alcohol Use Disorders Identification Test (AUDIT; J. Sounders and others, 1993) in a reliability generalization study based on 17 empirical journal articles. Results show AUDIT scores to be generally reliable for basic assessment. (SLD)
[The Visual Association Test to study episodic memory in clinical geriatric psychology].
Diesfeldt, Han; Prins, Marleen; Lauret, Gijs
2018-04-01
The Visual Association Test (VAT) is a brief learning task that consists of six line drawings of pairs of interacting objects (association cards). Subjects are asked to name or identify each object and later are presented with one object from the pair (the cue) and asked to name the other (the target). The VAT was administered in a consecutive sample of 174 psychogeriatric day care participants with mild to major neurocognitive disorder. Comparison of test performance with normative data from non-demented subjects revealed that 69% scored within the range of a major deficit (0-8 over two recall trials), 14% a minor, and 17% no deficit (9-10, and ≥10 respectively).VAT-scores correlated with another test of memory function, the Cognitive Screening Test (CST), based on the Short Portable Mental Status Questionnaire (r = 0.53). Tests of executive functioning (Expanded Mental Control Test, Category Fluency, Clock Drawing) did not add significantly to the explanation of variance in VAT-scores.Fifty-five participants (31.6%) were faced with initial problems in naming or identifying one or more objects on the cue cards or association cards. If necessary, naming was aided by the investigator. Initial difficulties in identifying cue objects were associated with lower VAT-scores, but this did not hold for difficulties in identifying target objects.A hierarchical multiple regression analysis was used to examine whether linear or quadratic trends best fitted VAT performance across the range of CST scores. The regression model revealed a linear but not a quadratic trend. The best fitting linear model implied that VAT scores differentiated between CST scores in the lower, as well as in the upper range, indicating the absence of floor and ceiling effects, respectively. Moreover, the VAT compares favourably to word list-learning tasks being more attractive in its presentation of interacting visual objects and cued recall based on incidental learning of the association between cues and targets.For practical purposes and based on documented sensitivity and specificity, Bayesian probability tables give predictive power of age-specific VAT cutoff scores for the presence or absence of a major neurocognitive disorder across a range of a priori probabilities or base rates.
ERIC Educational Resources Information Center
Brooks, Aarti P.
2009-01-01
Cooperative learning allows individuals with varying abilities to work alongside their peers. Students are placed into achievement levels based on placement test scores. The Regular College Preparatory (RCP) level is a score of 59% or lower and Academic College Preparatory (ACP) level is a score of 60-92% on the placement test. The purpose of this…
Predicting preference-based SF-6D index scores from the SF-8 health survey.
Wang, P; Fu, A Z; Wee, H L; Lee, J; Tai, E S; Thumboo, J; Luo, N
2013-09-01
To develop and test functions for predicting the preference-based SF-6D index scores from the SF-8 health survey. This study was a secondary analysis of data collected in a population health survey in which respondents (n = 7,529) completed both the SF-36 and the SF-8 questionnaires. We examined seven ordinary least-square estimators for their performance in predicting SF-6D scores from the SF-8 at both the individual and the group levels. In general, all functions performed similarly well in predicting SF-6D scores, and the predictions at the group level were better than predictions at the individual level. At the individual level, 42.5-51.5% of prediction errors were smaller than the minimally important difference (MID) of the SF-6D scores, depending on the function specifications, while almost all prediction errors of the tested functions were smaller than the MID of SF-6D at the group level. At both individual and group levels, the tested functions predicted lower than actual scores at the higher end of the SF-6D scale. Our study developed functions to generate preference-based SF-6D index scores from the SF-8 health survey, the first of its kind. Further research is needed to evaluate the performance and validity of the prediction functions.
Li, Liwei; Wang, Bo; Meroueh, Samy O
2011-09-26
The community structure-activity resource (CSAR) data sets are used to develop and test a support vector machine-based scoring function in regression mode (SVR). Two scoring functions (SVR-KB and SVR-EP) are derived with the objective of reproducing the trend of the experimental binding affinities provided within the two CSAR data sets. The features used to train SVR-KB are knowledge-based pairwise potentials, while SVR-EP is based on physicochemical properties. SVR-KB and SVR-EP were compared to seven other widely used scoring functions, including Glide, X-score, GoldScore, ChemScore, Vina, Dock, and PMF. Results showed that SVR-KB trained with features obtained from three-dimensional complexes of the PDBbind data set outperformed all other scoring functions, including best performing X-score, by nearly 0.1 using three correlation coefficients, namely Pearson, Spearman, and Kendall. It was interesting that higher performance in rank ordering did not translate into greater enrichment in virtual screening assessed using the 40 targets of the Directory of Useful Decoys (DUD). To remedy this situation, a variant of SVR-KB (SVR-KBD) was developed by following a target-specific tailoring strategy that we had previously employed to derive SVM-SP. SVR-KBD showed a much higher enrichment, outperforming all other scoring functions tested, and was comparable in performance to our previously derived scoring function SVM-SP.
Busch, Robyn M; Lineweaver, Tara T; Ferguson, Lisa; Haut, Jennifer S
2015-06-01
Reliable change indices (RCIs) and standardized regression-based (SRB) change score norms permit evaluation of meaningful changes in test scores following treatment interventions, like epilepsy surgery, while accounting for test-retest reliability, practice effects, score fluctuations due to error, and relevant clinical and demographic factors. Although these methods are frequently used to assess cognitive change after epilepsy surgery in adults, they have not been widely applied to examine cognitive change in children with epilepsy. The goal of the current study was to develop RCIs and SRB change score norms for use in children with epilepsy. Sixty-three children with epilepsy (age range: 6-16; M=10.19, SD=2.58) underwent comprehensive neuropsychological evaluations at two time points an average of 12 months apart. Practice effect-adjusted RCIs and SRB change score norms were calculated for all cognitive measures in the battery. Practice effects were quite variable across the neuropsychological measures, with the greatest differences observed among older children, particularly on the Children's Memory Scale and Wisconsin Card Sorting Test. There was also notable variability in test-retest reliabilities across measures in the battery, with coefficients ranging from 0.14 to 0.92. Reliable change indices and SRB change score norms for use in assessing meaningful cognitive change in children following epilepsy surgery are provided for measures with reliability coefficients above 0.50. This is the first study to provide RCIs and SRB change score norms for a comprehensive neuropsychological battery based on a large sample of children with epilepsy. Tables to aid in evaluating cognitive changes in children who have undergone epilepsy surgery are provided for clinical use. An Excel sheet to perform all relevant calculations is also available to interested clinicians or researchers. Copyright © 2015 Elsevier Inc. All rights reserved.
Aslam, Tariq M; Tahir, Humza J; Parry, Neil R A; Murray, Ian J; Kwak, Kun; Heyes, Richard; Salleh, Mahani M; Czanner, Gabriela; Ashworth, Jane
2016-10-01
To report on the utility of a computer tablet-based method for automated testing of visual acuity in children based on the principles of game design. We describe the testing procedure and present repeatability as well as agreement of the score with accepted visual acuity measures. Reliability and validity study. Setting: Manchester Royal Eye Hospital Pediatric Ophthalmology Outpatients Department. Total of 112 sequentially recruited patients. For each patient 1 eye was tested with the Mobile Assessment of Vision by intERactIve Computer for Children (MAVERIC-C) system, consisting of a software application running on a computer tablet, housed in a bespoke viewing chamber. The application elicited touch screen responses using a game design to encourage compliance and automatically acquire visual acuity scores of participating patients. Acuity was then assessed by an examiner with a standard chart-based near ETDRS acuity test before the MAVERIC-C assessment was repeated. Reliability of MAVERIC-C near visual acuity score and agreement of MAVERIC-C score with near ETDRS chart for visual acuity. Altogether, 106 children (95%) completed the MAVERIC-C system without assistance. The vision scores demonstrated satisfactory reliability, with test-retest VA scores having a mean difference of 0.001 (SD ±0.136) and limits of agreement of 2 SD (LOA) of ±0.267. Comparison with the near EDTRS chart showed agreement with a mean difference of -0.0879 (±0.106) with LOA of ±0.208. This study demonstrates promising utility for software using a game design to enable automated testing of acuity in children with ophthalmic disease in an objective and accurate manner. Copyright © 2016 Elsevier Inc. All rights reserved.
McKeough, D Michael; Mattern-Baxter, Katrin; Barakatt, Edward
2010-01-01
The purpose of this study was to determine if a computer-aided instruction learning module improves students' knowledge of the neuroanatomy/physiology and clinical examination of the dorsal column-medial lemniscal (DCML) system. Sixty-one physical therapy students enrolled in a clinical neurology course in entry-level PT educational programs at two universities participated in the study. Students from University-1 (U1;) had not had a previous neuroanatomy course, while students from University-2 (U2;) had taken a neuroanatomy course in the previous semester. Before and after working with the learning module, students took a paper-and-pencil test on the neuroanatomy/physiology and clinical examination of the DCML system. Kruskal-Wallis one-way ANOVA and Mann-Whitney tests were used to determine if differences existed between neuroanatomy/physiology examination scores and clinical examination scores before and after taking the learning module, and between student groups based on university attended. For students from U1, neuroanatomy/physiology post-test scores improved significantly over pre-test scores (p < 0.001), while post-test scores of students from U2 did not (p = 0.60). Neuroanatomy/physiology pre-test scores from U2 were significantly better than those from U1 (p < 0.001); there was no significant difference in post-test scores (p = 0.062). Clinical examination pre-test and post-test scores from U2 were significantly better than those from U1 (p < 0.001). Clinical examination post-test scores improved significantly from the pre-test scores for both U1 (p < 0.001) and U2 (p < 0.001).
Diederich, Emily; Thomas, Laura; Mahnken, Jonathan; Lineberry, Matthew
2018-06-01
Within simulation-based mastery learning (SBML) courses, there is inconsistent inclusion of learner pretesting, which requires considerable resources and is contrary to popular instructional frameworks. However, it may have several benefits, including its direct benefit as a form of deliberate practice and its facilitation of more learner-specific subsequent deliberate practice. We consider an unexplored potential benefit of pretesting: its ability to predict variable long-term learner performance. Twenty-seven residents completed an SBML course in central line insertion. Residents were tested on simulated central line insertion precourse, immediately postcourse, and after between 64 and 82 weeks. We analyzed pretest scores' prediction of delayed test scores, above and beyond prediction by program year, line insertion experiences in the interim, and immediate posttest scores. Pretest scores related strongly to delayed test scores (r = 0.59, P = 0.01; disattenuated ρ = 0.75). The number of independent central lines inserted also related to year-delayed test scores (r = 0.44, P = 0.02); other predictors did not discernibly relate. In a regression model jointly predicting delayed test scores, pretest was a significant predictor (β = 0.487, P = 0.011); number of independent insertions was not (β = 0.234, P = 0.198). This study suggests that pretests can play a major role in predicting learner variance in learning gains from SBML courses, thus facilitating more targeted refresher training. It also exposes a risk in SBML courses that learners who meet immediate mastery standards may be incorrectly assumed to have equal long-term learning gains.
Østergaard, Mia L; Nielsen, Kristina R; Albrecht-Beste, Elisabeth; Konge, Lars; Nielsen, Michael B
2018-01-01
This study aimed to develop a test with validity evidence for abdominal diagnostic ultrasound with a pass/fail-standard to facilitate mastery learning. The simulator had 150 real-life patient abdominal scans of which 15 cases with 44 findings were selected, representing level 1 from The European Federation of Societies for Ultrasound in Medicine and Biology. Four groups of experience levels were constructed: Novices (medical students), trainees (first-year radiology residents), intermediates (third- to fourth-year radiology residents) and advanced (physicians with ultrasound fellowship). Participants were tested in a standardized setup and scored by two blinded reviewers prior to an item analysis. The item analysis excluded 14 diagnoses. Both internal consistency (Cronbach's alpha 0.96) and inter-rater reliability (0.99) were good and there were statistically significant differences (p < 0.001) between all four groups, except the intermediate and advanced groups (p = 1.0). There was a statistically significant correlation between experience and test scores (Pearson's r = 0.82, p < 0.001). The pass/fail-standard failed all novices (no false positives) and passed all advanced (no false negatives). All intermediate participants and six out of 14 trainees passed. We developed a test for diagnostic abdominal ultrasound with solid validity evidence and a pass/fail-standard without any false-positive or false-negative scores. • Ultrasound training can benefit from competency-based education based on reliable tests. • This simulation-based test can differentiate between competency levels of ultrasound examiners. • This test is suitable for competency-based education, e.g. mastery learning. • We provide a pass/fail standard without false-negative or false-positive scores.
Stein, Marjorie W; Frank, Susan J; Roberts, Jeffrey H; Finkelstein, Malka; Heo, Moonseong
2016-05-01
The aim of this study was to determine whether group-based or didactic teaching is more effective to teach ACR Appropriateness Criteria to medical students. An identical pretest, posttest, and delayed multiple-choice test was used to evaluate the efficacy of the two teaching methods. Descriptive statistics comparing test scores were obtained. On the posttest, the didactic group gained 12.5 points (P < .0001), and the group-based learning students gained 16.3 points (P < .0001). On the delayed test, the didactic group gained 14.4 points (P < .0001), and the group-based learning students gained 11.8 points (P < .001). The gains in scores on both tests were statistically significant for both groups. However, the differences in scores were not statistically significant comparing the two educational methods. Compared with didactic lectures, group-based learning is more enjoyable, time efficient, and equally efficacious. The choice of educational method can be individualized for each institution on the basis of group size, time constraints, and faculty availability. Copyright © 2016 American College of Radiology. Published by Elsevier Inc. All rights reserved.
Lee, Shu-Ping; Su, Hui-Kai; Lee, Shin-Da
2012-06-01
This study investigated the effects of immediate feedback on computer-based foreign language listening comprehension tests and on intrapersonal test-associated anxiety in 72 English major college students at a Taiwanese University. Foreign language listening comprehension of computer-based tests designed by MOODLE, a dynamic e-learning environment, with or without immediate feedback together with the state-trait anxiety inventory (STAI) were tested and repeated after one week. The analysis indicated that immediate feedback during testing caused significantly higher anxiety and resulted in significantly higher listening scores than in the control group, which had no feedback. However, repeated feedback did not affect the test anxiety and listening scores. Computer-based immediate feedback did not lower debilitating effects of anxiety but enhanced students' intrapersonal eustress-like anxiety and probably improved their attention during listening tests. Computer-based tests with immediate feedback might help foreign language learners to increase attention in foreign language listening comprehension.
On the Performance of the Marginal Homogeneity Test to Detect Rater Drift.
Sgammato, Adrienne; Donoghue, John R
2018-06-01
When constructed response items are administered repeatedly, "trend scoring" can be used to test for rater drift. In trend scoring, raters rescore responses from the previous administration. Two simulation studies evaluated the utility of Stuart's Q measure of marginal homogeneity as a way of evaluating rater drift when monitoring trend scoring. In the first study, data were generated based on trend scoring tables obtained from an operational assessment. The second study tightly controlled table margins to disentangle certain features present in the empirical data. In addition to Q , the paired t test was included as a comparison, because of its widespread use in monitoring trend scoring. Sample size, number of score categories, interrater agreement, and symmetry/asymmetry of the margins were manipulated. For identical margins, both statistics had good Type I error control. For a unidirectional shift in margins, both statistics had good power. As expected, when shifts in the margins were balanced across categories, the t test had little power. Q demonstrated good power for all conditions and identified almost all items identified by the t test. Q shows substantial promise for monitoring of trend scoring.
Using screen-based simulation to improve performance during pediatric resuscitation.
Biese, Kevin J; Moro-Sutherland, Donna; Furberg, Robert D; Downing, Brian; Glickman, Larry; Murphy, Alison; Jackson, Cheryl L; Snyder, Graham; Hobgood, Cherri
2009-12-01
To assess the ability of a screen-based simulation-training program to improve emergency medicine and pediatric resident performance in critical pediatric resuscitation knowledge, confidence, and skills. A pre-post, interventional design was used. Three measures of performance were created and assessed before and after intervention: a written pre-course knowledge examination, a self-efficacy confidence score, and a skills-based high-fidelity simulation code scenario. For the high-fidelity skills assessment, independent physician raters recorded and reviewed subject performance. The intervention consisted of eight screen-based pediatric resuscitation scenarios that subjects had 4 weeks to complete. Upon completion of the scenarios, all three measures were repeated. For the confidence assessment, summary pre- and post-test summary confidence scores were compared using a t-test, and for the skills assessment, pre-scores were compared with post-test measures for each individual using McNemar's chi-square test for paired samples. Twenty-six of 35 (71.3%) enrolled subjects completed the institutional review board-approved study. Increases were observed in written test scores, confidence, and some critical interventions in high-fidelity simulation. The mean improvement in cumulative confidence scores for all residents was 10.1 (SD +/-4.9; range 0-19; p < 0.001), with no resident feeling less confident after the intervention. Although overall performance in simulated codes did not change significantly, with average scores of 6.65 (+/-1.76) to 7.04 (+/-1.37) out of 9 possible points (p = 0.58), improvement was seen in the administering of appropriate amounts of IV fluids (59-89%, p = 0.03). In this study, improvements in resident knowledge, confidence, and performance of certain skills in simulated pediatric cardiac arrest scenarios suggest that screen-based simulations may be an effective way to enhance resuscitation skills of pediatric providers. These results should be confirmed using a randomized design with an appropriate control group. (c) 2009 by the Society for Academic Emergency Medicine.
Measuring Decision-Making During Thyroidectomy: Validity Evidence for a Web-Based Assessment Tool.
Madani, Amin; Gornitsky, Jordan; Watanabe, Yusuke; Benay, Cassandre; Altieri, Maria S; Pucher, Philip H; Tabah, Roger; Mitmaker, Elliot J
2018-02-01
Errors in judgment during thyroidectomy can lead to recurrent laryngeal nerve injury and other complications. Despite the strong link between patient outcomes and intraoperative decision-making, methods to evaluate these complex skills are lacking. The purpose of this study was to develop objective metrics to evaluate advanced cognitive skills during thyroidectomy and to obtain validity evidence for them. An interactive online learning platform was developed ( www.thinklikeasurgeon.com ). Trainees and surgeons from four institutions completed a 33-item assessment, developed based on a cognitive task analysis and expert Delphi consensus. Sixteen items required subjects to make annotations on still frames of thyroidectomy videos, and accuracy scores were calculated based on an algorithm derived from experts' responses ("visual concordance test," VCT). Seven items were short answer (SA), requiring users to type their answers, and scores were automatically calculated based on their similarity to a pre-populated repertoire of correct responses. Test-retest reliability, internal consistency, and correlation of scores with self-reported experience and training level (novice, intermediate, expert) were calculated. Twenty-eight subjects (10 endocrine surgeons and otolaryngologists, 18 trainees) participated. There was high test-retest reliability (intraclass correlation coefficient = 0.96; n = 10) and internal consistency (Cronbach's α = 0.93). The assessment demonstrated significant differences between novices, intermediates, and experts in total score (p < 0.01), VCT score (p < 0.01) and SA score (p < 0.01). There was high correlation between total case number and total score (ρ = 0.95, p < 0.01), between total case number and VCT score (ρ = 0.93, p < 0.01), and between total case number and SA score (ρ = 0.83, p < 0.01). This study describes the development of novel metrics and provides validity evidence for an interactive Web-based platform to objectively assess decision-making during thyroidectomy.
Medical students perception of test anxiety triggered by different assessment modalities.
Guraya, Salman Y; Guraya, Shaista S; Habib, Fawzia; AlQuiliti, Khalid W; Khoshhal, Khalid I
2018-05-06
Test anxiety is well known among medical students. However, little is known about test anxiety produced by different components of exam individually. This study aimed to stratify varying levels of test anxiety provoked by each exam modality and to explore the students perceptions about confounding factors. A self-administered questionnaire was administered to medical students. The instrument contained four main themes; lifestyle, psychological and specific factors of information needs, learning styles, and perceived difficulty level of each assessment tool. A highest test anxiety score of 5 was ranked for "not scheduling available time" and "insufficient exercise" by 28.8 and 28.3% students, respectively. For "irrational thoughts about exam" and "fear to fail", a highest test anxiety score of 5 was scored by 28.8 and 25.7% students, respectively. The highest total anxiety score of 1255 was recorded for long case exam, followed by 975 for examiner-based objective structured clinical examination. Excessive course load and course not well covered by faculty were thought to be the main confounding factors. The examiner-based assessment modalities induced high test anxiety. Faculty is urged to cover core contents within stipulated time and to rigorously reform and update existing curricula to prepare relevant course material.
NASA Astrophysics Data System (ADS)
Harris, Michael W.
This study examined the effectiveness of a specific instructional strategy employed to improve performance on the end-of-the-year Criterion-Referenced Competency Test (CRCT) as mandated by the No Child Left Behind (NCLB) Act of 2001. A growing body of evidence suggests that the perceived pressure to produce adequate aggregated scores on the CRCT causes teachers to neglect other relevant aspects of teaching and attend less to individualized instruction. Rooted in constructivist theory, inquiry-based programs provide a o developmental plan of instruction that affords the opportunity for each student to understand their academic needs and strengths. However, the utility of inquiry-based instruction is largely unknown due to the lack of evaluation studies. To address this problem, this quantitative evaluation measured the impact of the Audet and Jordan inquiry-based instructional model on CRCT test scores of 102 students in a sixth-grade science classroom in one north Georgia school. A series of binomial tests of proportions tested differences between CRCT scores of the program participants and those of a matched control sample selected from other district schools that did not adopt the program. The study found no significant differences on CRCT test scores between the treatment and control groups. The study also found no significant performance differences among genders in the sample using inquiry instruction. This implies that the utility of inquiry education might exist outside the domain of test scores. This study can contribute to social change by informing a reevaluation of the instructional strategies that ideally will serve NCLB high-stakes assessment mandates, while also affording students the individual-level skills needed to become productive members of society.
Development and Validation of a Mobile Device-based External Ventricular Drain Simulator.
Morone, Peter J; Bekelis, Kimon; Root, Brandon K; Singer, Robert J
2017-10-01
Multiple external ventricular drain (EVD) simulators have been created, yet their cost, bulky size, and nonreusable components limit their accessibility to residency programs. To create and validate an animated EVD simulator that is accessible on a mobile device. We developed a mobile-based EVD simulator that is compatible with iOS (Apple Inc., Cupertino, California) and Android-based devices (Google, Mountain View, California) and can be downloaded from the Apple App and Google Play Store. Our simulator consists of a learn mode, which teaches users the procedure, and a test mode, which assesses users' procedural knowledge. Twenty-eight participants, who were divided into expert and novice categories, completed the simulator in test mode and answered a postmodule survey. This was graded using a 5-point Likert scale, with 5 representing the highest score. Using the survey results, we assessed the module's face and content validity, whereas construct validity was evaluated by comparing the expert and novice test scores. Participants rated individual survey questions pertaining to face and content validity a median score of 4 out of 5. When comparing test scores, generated by the participants completing the test mode, the experts scored higher than the novices (mean, 71.5; 95% confidence interval, 69.2 to 73.8 vs mean, 48; 95% confidence interval, 44.2 to 51.6; P < .001). We created a mobile-based EVD simulator that is inexpensive, reusable, and accessible. Our results demonstrate that this simulator is face, content, and construct valid. Copyright © 2017 by the Congress of Neurological Surgeons
Standard Error Estimation of 3PL IRT True Score Equating with an MCMC Method
ERIC Educational Resources Information Center
Liu, Yuming; Schulz, E. Matthew; Yu, Lei
2008-01-01
A Markov chain Monte Carlo (MCMC) method and a bootstrap method were compared in the estimation of standard errors of item response theory (IRT) true score equating. Three test form relationships were examined: parallel, tau-equivalent, and congeneric. Data were simulated based on Reading Comprehension and Vocabulary tests of the Iowa Tests of…
Adapting Educational Measurement to the Demands of Test-Based Accountability
ERIC Educational Resources Information Center
Koretz, Daniel
2015-01-01
Accountability has become a primary function of large-scale testing in the United States. The pressure on educators to raise scores is vastly greater than it was several decades ago. Research has shown that high-stakes testing can generate behavioral responses that inflate scores, often severely. I argue that because of these responses, using…
Al-Dahir, Sara; Bryant, Kendrea; Kennedy, Kathleen B; Robinson, Donna S
2014-05-15
To evaluate the efficacy of faculty-led problem-based learning (PBL) vs online simulated-patient case in fourth-year (P4) pharmacy students. Fourth-year pharmacy students were randomly assigned to participate in either online branched-case learning using a virtual simulation platform or a small-group discussion. Preexperience and postexperience student assessments and a survey instrument were completed. While there were no significant differences in the preexperience test scores between the groups, there was a significant increase in scores in both the virtual-patient group and the PBL group between the preexperience and postexperience tests. The PBL group had higher postexperience test scores (74.8±11.7) than did the virtual-patient group (66.5±13.6) (p=0.001). The PBL method demonstrated significantly greater improvement in postexperience test scores than did the virtual-patient method. Both were successful learning methods, suggesting that a diverse approach to simulated patient cases may reach more student learning styles.
Applications of "Integrated Data Viewer'' (IDV) in the classroom
NASA Astrophysics Data System (ADS)
Nogueira, R.; Cutrim, E. M.
2006-06-01
Conventionally, weather products utilized in synoptic meteorology reduce phenomena occurring in four dimensions to a 2-dimensional form. This constitutes a road-block for non-atmospheric-science majors who need to take meteorology as a non-mathematical and complementary course to their major programs. This research examines the use of Integrated Data Viewer-IDV as a teaching tool, as it allows a 4-dimensional representation of weather products. IDV was tested in the teaching of synoptic meteorology, weather analysis, and weather map interpretation to non-science students in the laboratory sessions of an introductory meteorology class at Western Michigan University. Comparison of student exam scores according to the laboratory teaching techniques, i.e., traditional lab manual and IDV was performed for short- and long-term learning. Results of the statistical analysis show that the Fall 2004 students in the IDV-based lab session retained learning. However, in the Spring 2005 the exam scores did not reflect retention in learning when compared with IDV-based and MANUAL-based lab scores (short term learning, i.e., exam taken one week after the lab exercise). Testing the long-term learning, seven weeks between the two exams in the Spring 2005, show no statistically significant difference between IDV-based group scores and MANUAL-based group scores. However, the IDV group obtained exam score average slightly higher than the MANUAL group. Statistical testing of the principal hypothesis in this study, leads to the conclusion that the IDV-based method did not prove to be a better teaching tool than the traditional paper-based method. Future studies could potentially find significant differences in the effectiveness of both manual and IDV methods if the conditions had been more controlled. That is, students in the control group should not be exposed to the weather analysis using IDV during lecture.
ERIC Educational Resources Information Center
Basom, Margaret; And Others
1994-01-01
Researchers examined relationships between the SRI Gallup Pre-Professional Teacher Interview and performance-based student teaching evaluations and between SRI Interview and California Student Achievement Test (CAT) scores. A relationship between SRI Interview scores and performance-based student teaching evaluations surfaces. CAT scores did not…
Lizunov, A Y; Gonchar, A L; Zaitseva, N I; Zosimov, V V
2015-10-26
We analyzed the frequency with which intraligand contacts occurred in a set of 1300 protein-ligand complexes [ Plewczynski et al. J. Comput. Chem. 2011 , 32 , 742 - 755 .]. Our analysis showed that flexible ligands often form intraligand hydrophobic contacts, while intraligand hydrogen bonds are rare. The test set was also thoroughly investigated and classified. We suggest a universal method for enhancement of a scoring function based on a potential of mean force (PMF-based score) by adding a term accounting for intraligand interactions. The method was implemented via in-house developed program, utilizing an Algo_score scoring function [ Ramensky et al. Proteins: Struct., Funct., Genet. 2007 , 69 , 349 - 357 .] based on the Tarasov-Muryshev PMF [ Muryshev et al. J. Comput.-Aided Mol. Des. 2003 , 17 , 597 - 605 .]. The enhancement of the scoring function was shown to significantly improve the docking and scoring quality for flexible ligands in the test set of 1300 protein-ligand complexes [ Plewczynski et al. J. Comput. Chem. 2011 , 32 , 742 - 755 .]. We then investigated the correlation of the docking results with two parameters of intraligand interactions estimation. These parameters are the weight of intraligand interactions and the minimum number of bonds between the ligand atoms required to take their interaction into account.
Moradi, Elaheh; Hallikainen, Ilona; Hänninen, Tuomo; Tohka, Jussi
2017-01-01
Rey's Auditory Verbal Learning Test (RAVLT) is a powerful neuropsychological tool for testing episodic memory, which is widely used for the cognitive assessment in dementia and pre-dementia conditions. Several studies have shown that an impairment in RAVLT scores reflect well the underlying pathology caused by Alzheimer's disease (AD), thus making RAVLT an effective early marker to detect AD in persons with memory complaints. We investigated the association between RAVLT scores (RAVLT Immediate and RAVLT Percent Forgetting) and the structural brain atrophy caused by AD. The aim was to comprehensively study to what extent the RAVLT scores are predictable based on structural magnetic resonance imaging (MRI) data using machine learning approaches as well as to find the most important brain regions for the estimation of RAVLT scores. For this, we built a predictive model to estimate RAVLT scores from gray matter density via elastic net penalized linear regression model. The proposed approach provided highly significant cross-validated correlation between the estimated and observed RAVLT Immediate (R = 0.50) and RAVLT Percent Forgetting (R = 0.43) in a dataset consisting of 806 AD, mild cognitive impairment (MCI) or healthy subjects. In addition, the selected machine learning method provided more accurate estimates of RAVLT scores than the relevance vector regression used earlier for the estimation of RAVLT based on MRI data. The top predictors were medial temporal lobe structures and amygdala for the estimation of RAVLT Immediate and angular gyrus, hippocampus and amygdala for the estimation of RAVLT Percent Forgetting. Further, the conversion of MCI subjects to AD in 3-years could be predicted based on either observed or estimated RAVLT scores with an accuracy comparable to MRI-based biomarkers.
NASA Astrophysics Data System (ADS)
Young, Jerry Wayne
The purpose of this study was to determine the effects of four instructional methods (direct instruction, computer-aided instruction, video observation, and microcomputer-based lab activities), gender, and time of testing (pretest, immediate posttest for determining the immediate effect of instruction, and a delayed posttest two weeks later to determine the retained effect of the instruction) on the achievement of sixth graders who were learning to interpret graphs of displacement and velocity. The dependent variable of achievement was reflected in the scores earned by students on a testing instrument of established validity and reliability. The 107 students participating in the study were divided by gender and were then randomly assigned to the four treatment groups, each taught by a different teacher. Each group had approximately equal numbers of males and females. The students were pretested and then involved in two class periods of the instructional method which was unique to their group. Immediately following treatment they were posttested and two weeks later they were posttested again. The data in the form of test scores were analyzed with a two-way split-plot analysis of variance to determine if there was significant interaction among technique, gender, and time of testing. When significant interaction was indicated, the Tukey HSD test was used to determine specific mean differences. The results of the analysis indicated no gender effect. Only students in the direct instruction group and the microcomputer-based laboratory group had significantly higher posttest-1 scores than pretest scores. They also had significantly higher posttest-2 scores than pretest scores. This suggests that the learning was retained. The other groups experienced no significant differences among pretest, posttest-1, and posttest-2 scores. Recommendations are that direct instruction and microcomputer-based laboratory activities should be considered as effective stand-alone methods for teaching sixth grade students to interpret graphs of displacement and velocity. However, video and computer instruction may serve as supplemental activities.
Simulation-based assessment in anesthesiology: requirements for practical implementation.
Boulet, John R; Murray, David J
2010-04-01
Simulations have taken a central role in the education and assessment of medical students, residents, and practicing physicians. The introduction of simulation-based assessments in anesthesiology, especially those used to establish various competencies, has demanded fairly rigorous studies concerning the psychometric properties of the scores. Most important, major efforts have been directed at identifying, and addressing, potential threats to the validity of simulation-based assessment scores. As a result, organizations that wish to incorporate simulation-based assessments into their evaluation practices can access information regarding effective test development practices, the selection of appropriate metrics, the minimization of measurement errors, and test score validation processes. The purpose of this article is to provide a broad overview of the use of simulation for measuring physician skills and competencies. For simulations used in anesthesiology, studies that describe advances in scenario development, the development of scoring rubrics, and the validation of assessment results are synthesized. Based on the summary of relevant research, psychometric requirements for practical implementation of simulation-based assessments in anesthesiology are forwarded. As technology expands, and simulation-based education and evaluation takes on a larger role in patient safety initiatives, the groundbreaking work conducted to date can serve as a model for those individuals and organizations that are responsible for developing, scoring, or validating simulation-based education and assessment programs in anesthesiology.
GalaxyDock BP2 score: a hybrid scoring function for accurate protein-ligand docking
NASA Astrophysics Data System (ADS)
Baek, Minkyung; Shin, Woong-Hee; Chung, Hwan Won; Seok, Chaok
2017-07-01
Protein-ligand docking is a useful tool for providing atomic-level understanding of protein functions in nature and design principles for artificial ligands or proteins with desired properties. The ability to identify the true binding pose of a ligand to a target protein among numerous possible candidate poses is an essential requirement for successful protein-ligand docking. Many previously developed docking scoring functions were trained to reproduce experimental binding affinities and were also used for scoring binding poses. However, in this study, we developed a new docking scoring function, called GalaxyDock BP2 Score, by directly training the scoring power of binding poses. This function is a hybrid of physics-based, empirical, and knowledge-based score terms that are balanced to strengthen the advantages of each component. The performance of the new scoring function exhibits significant improvement over existing scoring functions in decoy pose discrimination tests. In addition, when the score is used with the GalaxyDock2 protein-ligand docking program, it outperformed other state-of-the-art docking programs in docking tests on the Astex diverse set, the Cross2009 benchmark set, and the Astex non-native set. GalaxyDock BP2 Score and GalaxyDock2 with this score are freely available at http://galaxy.seoklab.org/softwares/galaxydock.html.
ERIC Educational Resources Information Center
Liu, Jinghua; Zu, Jiyun; Curley, Edward; Carey, Jill
2014-01-01
The purpose of this study is to investigate the impact of discrete anchor items versus passage-based anchor items on observed score equating using empirical data.This study compares an "SAT"® critical reading anchor that contains more discrete items proportionally, compared to the total tests to be equated, to another anchor that…
Evaluating the efficacy of a chemistry video game
NASA Astrophysics Data System (ADS)
Shapiro, Marina
A quasi-experimental design pre-test/post-test intervention study utilizing a within group analysis was conducted with 45 undergraduate college chemistry students that investigated the effect of implementing a game-based learning environment into an undergraduate college chemistry course in order to learn if serious educational games (SEGs) can be used to achieve knowledge gains of complex chemistry concepts and to achieve increase in students' positive attitude toward chemistry. To evaluate if students learn chemistry concepts by participating in a chemistry game-based learning environment, a one-way repeated measures analysis of variance (ANOVA) was conducted across three time points (pre-test, post-test, delayed post-test which were chemistry content exams). Results showed that there was an increase in exam scores over time. The results of the ANOVA indicated a statistically significant time effect. To evaluate if students' attitude towards chemistry increased as a result of participating in a chemistry game-based learning environment a paired samples t-test was conducted using a chemistry attitudinal survey by Mahdi (2014) as the pre- and post-test. Results of the paired-samples t-test indicated that there was no significant difference in pre-attitudinal scores and post-attitudinal scores.
Socio-demographic and academic correlates of clinical reasoning in a dental school in South Africa.
Postma, T C; White, J G
2017-02-01
There are no empirical studies that describe factors that may influence the development of integrated clinical reasoning skills in dental education. Hence, this study examines the association between outcomes of clinical reasoning in relation with differences in instructional design and student factors. Progress test scores, including diagnostic and treatment planning scores, of fourth and fifth year dental students (2009-2011) at the University of Pretoria, South Africa served as the outcome measures in stepwise linear regression analyses. These scores were correlated with the instructional design (lecture-based teaching and learning (LBTL = 0) or case-based teaching and learning (CBTL = 1), students' grades in Oral Biology, indicators of socio-economic status (SES) and gender. CBTL showed an independent association with progress test scores. Oral Biology scores correlated with diagnostic component scores. Diagnostic component scores correlated with treatment planning scores in the fourth year of study but not in the fifth year of study. 'SES' correlated with progress test scores in year five only, while gender showed no correlation. The empirical evidence gathered in this study provides support for scaffolded inductive teaching and learning methods to develop clinical reasoning skills. Knowledge in Oral Biology and reading skills may be important attributes to develop to ensure that students are able to reason accurately in a clinical setting. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Video as an Effective Method to Deliver Pre-Test Information for Rapid HIV Testing
Clark, Melissa A.; Mayer, Kenneth H.; Seage, George R.; DeGruttola, Victor G.; Becker, Bruce M.
2008-01-01
Objectives Video-based delivery of HIV pre-test information might assist in streamlining HIV screening and testing efforts in the emergency department (ED). The objectives of this study were to determine if the video “Do you know about rapid HIV testing?” is an acceptable alternative to an in-person information session on rapid HIV pre-test information, in regards to comprehension of rapid HIV pre-test fundamentals; and to identify patients who might have difficulties in comprehending pre-test information. Methods This was a non-inferiority trial of 574 participants in an ED opt-in rapid HIV screening program who were randomly assigned to receive identical pre-test information from either an animated and live-action 9.5-minute video, or an in-person information session. Pre-test information comprehension was assessed using a questionnaire. The video would be accepted as not inferior to the in-person information session if the 95% confidence interval (CI) of the difference (Δ) in mean scores on the questionnaire between the two information groups was less than a 10% decrease in the in-person information session arm's mean score. Linear regression models were constructed to identify patients with lower mean scores based upon study arm assignment, demographic characteristics, and history of prior HIV testing. Results The questionnaire mean scores were 20.1 (95% CI = 19.7 to 20.5) for the video arm and 20.8 (95% CI = 20.4 to 21.2) for the in-person information session arm. The difference in mean scores compared to the mean score for the in-person information session met the non-inferiority criterion for this investigation (Δ = 0.68; 95% CI = 0.18 to 1.26). In a multivariable linear regression model, Blacks/African Americans, Hispanics, and those with Medicare and Medicaid insurance exhibited slightly lower mean scores, regardless of the pre-test information delivery format. There was a strong relationship between fewer years of formal education and lower mean scores on the questionnaire. Age, gender, type of insurance, partner/marital status, and history of prior HIV testing were not predictive of scores on the questionnaire. Conclusions In terms of patient comprehension of rapid HIV pre-test information fundamentals, the video was an acceptable substitute to pre-test information delivered by an HIV test counselor. Both the video and in-person information session were less effective in providing pre-test information for patients with fewer years of formal education. PMID:19120050
Li, Jiangeng; Su, Lei; Pang, Zenan
2015-12-01
Feature selection techniques have been widely applied to tumor gene expression data analysis in recent years. A filter feature selection method named marginal Fisher analysis score (MFA score) which is based on graph embedding has been proposed, and it has been widely used mainly because it is superior to Fisher score. Considering the heavy redundancy in gene expression data, we proposed a new filter feature selection technique in this paper. It is named MFA score+ and is based on MFA score and redundancy excluding. We applied it to an artificial dataset and eight tumor gene expression datasets to select important features and then used support vector machine as the classifier to classify the samples. Compared with MFA score, t test and Fisher score, it achieved higher classification accuracy.
Mathieu, Sylvain; Couderc, Marion; Glace, Baptiste; Tournadre, Anne; Malochet-Guinamand, Sandrine; Pereira, Bruno; Dubost, Jean-Jacques; Soubrier, Martin
2013-12-13
The script concordance test (SCT) is a method for assessing clinical reasoning of medical students by placing them in a context of uncertainty such as they will encounter in their future daily practice. Script concordance testing is going to be included as part of the computer-based national ranking examination (iNRE).This study was designed to create a script concordance test in rheumatology and use it for DCEM3 (fifth year) medical students administered via the online platform of the Clermont-Ferrand medical school. Our SCT for rheumatology teaching was constructed by a panel of 19 experts in rheumatology (6 hospital-based and 13 community-based). One hundred seventy-nine DCEM3 (fifth year) medical students were invited to take the test. Scores were computed using the scoring key available on the University of Montreal website. Reliability of the test was estimated by the Cronbach alpha coefficient for internal consistency. The test comprised 60 questions. Among the 26 students who took the test (26/179: 14.5%), 15 completed it in its entirety. The reference panel of rheumatologists obtained a mean score of 76.6 and the 15 students had a mean score of 61.5 (p = 0.001). The Cronbach alpha value was 0.82. An online SCT can be used as an assessment tool for medical students in rheumatology. This study also highlights the active participation of community-based rheumatologists, who accounted for the majority of the 19 experts in the reference panel.A script concordance test in rheumatology for 5th year medical students.
Local Linear Observed-Score Equating
ERIC Educational Resources Information Center
Wiberg, Marie; van der Linden, Wim J.
2011-01-01
Two methods of local linear observed-score equating for use with anchor-test and single-group designs are introduced. In an empirical study, the two methods were compared with the current traditional linear methods for observed-score equating. As a criterion, the bias in the equated scores relative to true equating based on Lord's (1980)…
An Empirical Investigation of Change in MCAT Scores upon Retest.
ERIC Educational Resources Information Center
Hynes, Kevin; Givner, Nathaniel
1980-01-01
An investigation of Medical College Admission Test (MCAT) retest scores indicates that limited retest improvement may result when initial scores are fairly low or below what might be predicted based on grade point averages. However, when initial scores approach the national, standardized MCAT mean, or are above what might be predicted, significant…
Web-based education in systems-based practice: a randomized trial.
Kerfoot, B Price; Conlin, Paul R; Travison, Thomas; McMahon, Graham T
2007-02-26
All accredited US residency programs are expected to offer curricula and evaluate their residents in 6 general competencies. Medical schools are now adopting similar competency frameworks. We investigated whether a Web-based program could effectively teach and assess elements of systems-based practice. We enrolled 276 medical students and 417 residents in the fields of surgery, medicine, obstetrics-gynecology, and emergency medicine in a 9-week randomized, controlled, crossover educational trial. Participants were asked to sequentially complete validated Web-based modules on patient safety and the US health care system. The primary outcome measure was performance on a 26-item validated online test administered before, between, and after the participants completed the modules. Six hundred forty (92.4%) of the 693 enrollees participated in the study; 512 (80.0%) of the participants completed all 3 tests. Participants' test scores improved significantly after completion of the first module (P<.001). Overall learning from the 9-week Web-based program, as measured by the increase in scores (posttest scores minus pretest scores), was 16 percentage points (95% confidence interval, 14-17 percentage points; P<.001) in public safety topics and 22 percentage points (95% confidence interval, 20-23 percentage points; P<.001) in US health care system topics. A Web-based educational program on systems-based practice competencies generated significant and durable learning across a broad range of medical students and residents.
Does the MCAT predict medical school and PGY-1 performance?
Saguil, Aaron; Dong, Ting; Gingerich, Robert J; Swygert, Kimberly; LaRochelle, Jeffrey S; Artino, Anthony R; Cruess, David F; Durning, Steven J
2015-04-01
The Medical College Admissions Test (MCAT) is a high-stakes test required for entry to most U. S. medical schools; admissions committees use this test to predict future accomplishment. Although there is evidence that the MCAT predicts success on multiple choice-based assessments, there is little information on whether the MCAT predicts clinical-based assessments of undergraduate and graduate medical education performance. This study looked at associations between the MCAT and medical school grade point average (GPA), Medical Licensing Examination (USMLE) scores, observed patient care encounters, and residency performance assessments. This study used data collected as part of the Long-Term Career Outcome Study to determine associations between MCAT scores, USMLE Step 1, Step 2 clinical knowledge and clinical skill, and Step 3 scores, Objective Structured Clinical Examination performance, medical school GPA, and PGY-1 program director (PD) assessment of physician performance for students graduating 2010 and 2011. MCAT data were available for all students, and the PGY PD evaluation response rate was 86.2% (N = 340). All permutations of MCAT scores (first, last, highest, average) were weakly associated with GPA, Step 2 clinical knowledge scores, and Step 3 scores. MCAT scores were weakly to moderately associated with Step 1 scores. MCAT scores were not significantly associated with Step 2 clinical skills Integrated Clinical Encounter and Communication and Interpersonal Skills subscores, Objective Structured Clinical Examination performance or PGY-1 PD evaluations. MCAT scores were weakly to moderately associated with assessments that rely on multiple choice testing. The association is somewhat stronger for assessments occurring earlier in medical school, such as USMLE Step 1. The MCAT was not able to predict assessments relying on direct clinical observation, nor was it able to predict PD assessment of PGY-1 performance. Reprint & Copyright © 2015 Association of Military Surgeons of the U.S.
ERIC Educational Resources Information Center
Cornish, Greg; Wines, Robin
The Number Test of the ACER Mathematics Profile Series, contains 30 items, for each of three suggested grade levels: 7-8, 8-9, and 9-10. Raw scores on all tests in the ACER Mathematics Profile Series (Number, Operations, Space and Measurement) are converted to a common scale called MAPS, a major feature of the Series. Based on the Rasch Model,…
Examining the Feasibility and Effect of Transitioning GED Tests to Computer
ERIC Educational Resources Information Center
Higgins, Jennifer; Patterson, Margaret Becker; Bozman, Martha; Katz, Michael
2010-01-01
This study examined the feasibility of administering GED Tests using a computer based testing system with embedded accessibility tools and the impact on test scores and test-taker experience when GED Tests are transitioned from paper to computer. Nineteen test centers across five states successfully installed the computer based testing program,…
Do School-Based Tutoring Programs Significantly Improve Student Performance on Standardized Tests?
ERIC Educational Resources Information Center
Rothman, Terri; Henderson, Mary
2011-01-01
This study used a pre-post, nonequivalent control group design to examine the impact of an in-district, after-school tutoring program on eighth grade students' standardized test scores in language arts and mathematics. Students who had scored in the near-passing range on either the language arts or mathematics aspect of a standardized test at the…
Naro, Daniel; Rummel, Christian; Schindler, Kaspar; Andrzejak, Ralph G
2014-09-01
The rank-based nonlinear predictability score was recently introduced as a test for determinism in point processes. We here adapt this measure to time series sampled from time-continuous flows. We use noisy Lorenz signals to compare this approach against a classical amplitude-based nonlinear prediction error. Both measures show an almost identical robustness against Gaussian white noise. In contrast, when the amplitude distribution of the noise has a narrower central peak and heavier tails than the normal distribution, the rank-based nonlinear predictability score outperforms the amplitude-based nonlinear prediction error. For this type of noise, the nonlinear predictability score has a higher sensitivity for deterministic structure in noisy signals. It also yields a higher statistical power in a surrogate test of the null hypothesis of linear stochastic correlated signals. We show the high relevance of this improved performance in an application to electroencephalographic (EEG) recordings from epilepsy patients. Here the nonlinear predictability score again appears of higher sensitivity to nonrandomness. Importantly, it yields an improved contrast between signals recorded from brain areas where the first ictal EEG signal changes were detected (focal EEG signals) versus signals recorded from brain areas that were not involved at seizure onset (nonfocal EEG signals).
NASA Astrophysics Data System (ADS)
Naro, Daniel; Rummel, Christian; Schindler, Kaspar; Andrzejak, Ralph G.
2014-09-01
The rank-based nonlinear predictability score was recently introduced as a test for determinism in point processes. We here adapt this measure to time series sampled from time-continuous flows. We use noisy Lorenz signals to compare this approach against a classical amplitude-based nonlinear prediction error. Both measures show an almost identical robustness against Gaussian white noise. In contrast, when the amplitude distribution of the noise has a narrower central peak and heavier tails than the normal distribution, the rank-based nonlinear predictability score outperforms the amplitude-based nonlinear prediction error. For this type of noise, the nonlinear predictability score has a higher sensitivity for deterministic structure in noisy signals. It also yields a higher statistical power in a surrogate test of the null hypothesis of linear stochastic correlated signals. We show the high relevance of this improved performance in an application to electroencephalographic (EEG) recordings from epilepsy patients. Here the nonlinear predictability score again appears of higher sensitivity to nonrandomness. Importantly, it yields an improved contrast between signals recorded from brain areas where the first ictal EEG signal changes were detected (focal EEG signals) versus signals recorded from brain areas that were not involved at seizure onset (nonfocal EEG signals).
Testing a computer-based ostomy care training resource for staff nurses.
Bales, Isabel
2010-05-01
Fragmented teaching and ostomy care provided by nonspecialized clinicians unfamiliar with state-of-the-art care and products have been identified as problems in teaching ostomy care to the new ostomate. After conducting a literature review of theories and concepts related to the impact of nurse behaviors and confidence on ostomy care, the author developed a computer-based learning resource and assessed its effect on staff nurse confidence. Of 189 staff nurses with a minimum of 1 year acute-care experience employed in the acute care, emergency, and rehabilitation departments of an acute care facility in the Midwestern US, 103 agreed to participate and returned completed pre- and post-tests, each comprising the same eight statements about providing ostomy care. F and P values were computed for differences between pre- and post test scores. Based on a scale where 1 = totally disagree and 5 = totally agree with the statement, baseline confidence and perceived mean knowledge scores averaged 3.8 and after viewing the resource program post-test mean scores averaged 4.51, a statistically significant improvement (P = 0.000). The largest difference between pre- and post test scores involved feeling confident in having the resources to learn ostomy skills independently. The availability of an electronic ostomy care resource was rated highly in both pre- and post testing. Studies to assess the effects of increased confidence and knowledge on the quality and provision of care are warranted.
Cooper, William B; Tobey, Emily; Loizou, Philipos C
2008-08-01
The purpose of this study was to explore the utility/possibility of using the Montreal Battery for Evaluation of Amusia (MBEA) test (Peretz, et al., Ann N Y Acad Sci, 999, 58-75) to assess the music perception abilities of cochlear implant (CI) users. The MBEA was used to measure six different aspects of music perception (Scale, Contour, Interval, Rhythm, Meter, and Melody Memory) by CI users and normal-hearing (NH) listeners presented with stimuli processed via CI simulations. The spectral resolution (number of channels) was varied in the CI simulations to determine: (a) the number of channels (4, 6, 8, 12, and 16) needed to achieve the highest levels of music perception and (b) the number of channels needed to produce levels of music perception performance comparable with that of CI users. CI users and NH listeners performed higher on temporal-based tests (Rhythm and Meter) than on pitch-based tests (Scale, Contour, and Interval)--a finding that is consistent with previous research studies. The CI users' scores on pitch-based tests were near chance. The CI users' (but not NH listeners') scores for the Memory test, a test that incorporates an integration of both temporal-based and pitch-based aspects of music, were significantly higher than the scores obtained for the pitch-based Scale test and significantly lower than the temporal-based Rhythm and Meter tests. The data from NH listeners indicated that 16 channels of stimulation did not provide the highest music perception scores and performance was as good as that obtained with 12 channels. This outcome is consistent with other studies showing that NH listeners listening to vocoded speech are not able to use effectively F0 cues present in the envelopes, even when the stimuli are processed with a large number (16) of channels. The CI user data seem to most closely match with the 4- and 6-channel NH listener conditions for the pitch-based tasks. Consistent with previous studies, both CI users and NH listeners showed the typical pattern of music perception in which scores are higher on tests measuring the perception of temporal aspects of music (Rhythm and Meter) than spectral (pitch) aspects of music (Scale, Contour, and Interval). On that regard, the pattern of results from this study indicates that the MBEA is a suitable test for measuring various aspects of music perception by CI users.
ERIC Educational Resources Information Center
Berryhill, Katie J.; Slater, Timothy F.
2017-01-01
As discipline-based astronomy education researchers become more interested in experimentally testing innovative teaching strategies to enhance learning in undergraduate introductory astronomy survey courses ("ASTRO 101"), scholars are placing increased attention toward better understanding factors impacting student gain scores on the…
Using Reading Rate and Comprehension CBM to Predict High-Stakes Achievement
ERIC Educational Resources Information Center
Miller, Kelli Caldwell; Bell, Sherry Mee; McCallum, R. Steve
2015-01-01
Because of the increased emphasis on standardized testing results, scores from a high-stakes, end-of-year test (Tennessee Comprehensive Assessment Program [TCAP] Reading Composite) were used as the standard against which scores from a group-administered, curriculum-based measure (CBM), Monitoring Instructional Responsiveness: Reading (MIR:R), were…
Standards and Criteria. Paper #10 in Occasional Paper Series.
ERIC Educational Resources Information Center
Glass, Gene V.
The logical and psychological bases for setting cutting scores for criterion-referenced tests are examined; they are found to be intrinsically arbitrary and are often examples of misdirected precision and axiomatization. The term, criterion referenced, originally referred to a technique for making test scores meaningful by controlling the test…
Validity Semantics in Educational and Psychological Assessment
ERIC Educational Resources Information Center
Hathcoat, John D.
2013-01-01
The semantics, or meaning, of validity is a fluid concept in educational and psychological testing. Contemporary controversies surrounding this concept appear to stem from the proper location of validity. Under one view, validity is a property of score-based inferences and entailed uses of test scores. This view is challenged by the…
How Have State Level Standards-Based Tests Related to Norm-Referenced Tests in Alaska?.
ERIC Educational Resources Information Center
Fenton, Ray
This overview of the Alaska system for test development, scoring, and reporting explored differences and similarities between norm-referenced and standards-based tests. The current Alaska testing program is based on legislation passed in 1997 and 1998, and is designed to meet the requirements of the federal No Child Left Behind Legislation. In…
Effects of training students to identify the semantic base of prose materials
Glover, John A.; Zimmer, John W.; Filbeck, Robert W.; Plake, Barbara S.
1980-01-01
Feedback and feedback plus points toward a course grade were applied to the attentional behaviors (defined as the ability to identify the semantic base of text passages) of 30 undergraduate students participating in a reading comprehension development program. Correct underlining was increased, extraneous underlining was decreased, and postreading comprehension test scores improved as a result of the procedures. Scores on a standardized test of reading comprehension also increased significantly. PMID:16795637
Oren, Carmel; Kennet-Cohen, Tamar; Turvall, Elliot; Allalouf, Avi
2014-01-01
The Psychometric Entrance Test (PET), used for admission to higher education in Israel together with the Matriculation (Bagrut), had in the past one general (total) score in which the weights for its domains: Verbal, Quantitative and English, were 2:2:1, respectively. In 2011, two additional total scores were introduced, with different weights for the Verbal and the Quantitative domains. This study compares the predictive validity of the three general scores of PET, and demonstrates validity in terms of utility. 100,863 freshmen students of all Israeli universities over the classes of 2005-2009. Regression weights and correlations of the predictors with FYGPA were computed. Simulations based on these results supplied the utility estimates. On average, PET is slightly more predictive than the Bagrut; using them both yields a better tool than either of them alone. Assigning differential weights to the components in the respective schools further improves the validity. The introduction of the new general scores of PET is validated by gathering and analyzing evidence based on relations of test scores to other variables. The utility of using the test can be demonstrated in ways different from correlations.
Inter-rater Agreement on Final Competency Testing Utilizing Standardized Patients.
Bowman, Dixie H; Ferber, Kyle L; Sima, Adam P
2016-01-01
The purpose of this study was to determine whether licensed physical therapists (n=8) serving as standardized patients (SPs) for practical examinations evaluate physical therapy students (n=51) equivalently to the physical therapy course instructor (n=1). The SPs completed the same assessment based on the evaluation criteria as did the instructor. The scores for the practical examination, answers to three questions, and the documentation note were summarized separately for the SP and the instructor by means and standard deviations. A paired t-test and an intraclass correlation coefficient (ICC) for each aspect of the score were calculated. ICC(1,1) values were reported along with corresponding 95% confidence intervals. The instructor had significantly higher scores for the practical exam and the overall score compared to the ratings from the SPs. No differences were observed between the instructor and SP scores on the three answers to the questions and documentation note scores. Based on the ICC values identified in this study, a physical therapist serving as an SP may not be an adequate replacement for an instructor when it comes to grading physical therapy students on all aspects of their competency tests.
Dayer, Lindsey E; Shilling, Rebecca; Van Valkenburg, Madalyn; Martin, Bradley C; Gubbins, Paul O; Hadden, Kristie; Heldenbrand, Seth
2017-04-19
Nonadherence produces considerable health consequences and economic burden to patients and payers. One approach to improve medication nonadherence that has gained interest in recent years is the use of smartphone adherence apps. The development of smartphone adherence apps has increased rapidly since 2012; however, literature evaluating the clinical app and effectiveness of smartphone adherence apps to improve medication adherence is generally lacking. The aims of this study were to (1) provide an updated evaluation and comparison of medication adherence apps in the marketplace by assessing the features, functionality, and health literacy (HL) of the highest-ranking adherence apps and (2) indirectly measure the validity of our rating methodology by determining the relationship between our app evaluations and Web-based consumer ratings. Two independent reviewers assessed the features and functionality using a 4-domain rating tool of all adherence apps identified based on developer claims. The same reviewers downloaded and tested the 100 highest-ranking apps including an additional domain for assessment of HL. Pearson product correlations were estimated between the consumer ratings and our domain and total scores. A total of 824 adherence apps were identified; of these, 645 unique apps were evaluated after applying exclusion criteria. The median initial score based on descriptions was 14 (max of 68; range 0-60). As a result, 100 of the highest-scoring unique apps underwent user testing. The median overall user-tested score was 31.5 (max of 73; range 0-60). The majority of the user tested the adherence apps that underwent user testing reported a consumer rating score in their respective online marketplace. The mean consumer rating was 3.93 (SD 0.84). The total user-tested score was positively correlated with consumer ratings (r=.1969, P=.04). More adherence apps are available in the Web-based marketplace, and the quality of these apps varies considerably. Consumer ratings are positively but weakly correlated with user-testing scores suggesting that our rating tool has some validity but that consumers and clinicians may assess adherence app quality differently. ©Lindsey E Dayer, Rebecca Shilling, Madalyn Van Valkenburg, Bradley C Martin, Paul O Gubbins, Kristie Hadden, Seth Heldenbrand. Originally published in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 19.04.2017.
Predicting clinical concussion measures at baseline based on motivation and academic profile.
Trinidad, Katrina J; Schmidt, Julianne D; Register-Mihalik, Johna K; Groff, Diane; Goto, Shiho; Guskiewicz, Kevin M
2013-11-01
The purpose of this study was to predict baseline neurocognitive and postural control performance using a measure of motivation, high school grade point average (hsGPA), and Scholastic Aptitude Test (SAT) score. Cross-sectional. Clinical research center. Eighty-eight National Collegiate Athletic Association Division I incoming student-athletes (freshman and transfers). Participants completed baseline clinical concussion measures, including a neurocognitive test battery (CNS Vital Signs), a balance assessment [Sensory Organization Test (SOT)], and motivation testing (Rey Dot Counting). Participants granted permission to access hsGPA and SAT total score. Standard scores for each CNS Vital Signs domain and SOT composite score. Baseline motivation, hsGPA, and SAT explained a small percentage of the variance of complex attention (11%), processing speed (12%), and composite SOT score (20%). Motivation, hsGPA, and total SAT score do not explain a significant amount of the variance in neurocognitive and postural control measures but may still be valuable to consider when interpreting neurocognitive and postural control measures.
Monneret, Denis
2017-01-01
The relationship between nonalcoholic fatty liver disease (NAFLD) and obstructive sleep apnea (OSA) has been well demonstrated, but remains to be evidenced in chronic obstructive pulmonary disease (COPD). Recently, Viglino et al. (Eur Respir J, 2017) attempted to determine the prevalence of liver fibrosis, steatosis and nonalcoholic steatohepatitis (NASH) in COPD patients, some of whom had OSA, basing the NAFLD diagnostic on three circulating biomarker-based liver scores: the FibroTest, SteatoTest and NashTest, from the Fibromax® panel. Among the main findings, the absence of OSA treatment emerged as independently associated with liver fibrosis and steatosis, when compared to effective treatment. However, besides the low number of treated patients, no polysomnographic respiratory data was provided, making it difficult to differentiate the impact of OSA from that of COPD in NAFLD prevalence. Furthermore, NAFLD diagnosis relied exclusively on circulating biomarker-based liver scores, without histological, imagery or other liver exploratory methods. Therefore, in this article, some methodological points are reminded and discussed, including the choice of OSA measurements, and the significance of ActiTest and AshTest scores from Fibromax® in this pathophysiological context. PMID:29225775
The effects of calculator-based laboratories on standardized test scores
NASA Astrophysics Data System (ADS)
Stevens, Charlotte Bethany Rains
Nationwide, the goal of providing a productive science and math education to our youth in today's educational institutions is centering itself around the technology being utilized in these classrooms. In this age of digital technology, educational software and calculator-based laboratories (CBL) have become significant devices in the teaching of science and math for many states across the United States. Among the technology, the Texas Instruments graphing calculator and Vernier Labpro interface, are among some of the calculator-based laboratories becoming increasingly popular among middle and high school science and math teachers in many school districts across this country. In Tennessee, however, it is reported that this type of technology is not regularly utilized at the student level in most high school science classrooms, especially in the area of Physical Science (Vernier, 2006). This research explored the effect of calculator based laboratory instruction on standardized test scores. The purpose of this study was to determine the effect of traditional teaching methods versus graphing calculator teaching methods on the state mandated End-of-Course (EOC) Physical Science exam based on ability, gender, and ethnicity. The sample included 187 total tenth and eleventh grade physical science students, 101 of which belonged to a control group and 87 of which belonged to the experimental group. Physical Science End-of-Course scores obtained from the Tennessee Department of Education during the spring of 2005 and the spring of 2006 were used to examine the hypotheses. The findings of this research study suggested the type of teaching method, traditional or calculator based, did not have an effect on standardized test scores. However, the students' ability level, as demonstrated on the End-of-Course test, had a significant effect on End-of-Course test scores. This study focused on a limited population of high school physical science students in the middle Tennessee Putnam County area. The study should be reproduced in various school districts in the state of Tennessee to compare the findings.
Isaac, Barney Thomas Jesudason; Thangakunam, Balamugesh; Cherian, Rekha A; Christopher, Devasahayam Jesudas
2015-01-01
For the follow-up of patients with idiopathic interstitial pneumonias (IIP), it is unclear which parameters of pulmonary function tests (PFT) and exercise testing would correlate best with high-resolution computed tomography (HRCT).. To find out the correlation of symptom scores, PFTs and exercise testing with HRCT scoring in patients diagnosed as idiopathic interstitial pneumonia. Cross-sectional study done in pulmonary medicine outpatients department of a tertiary care hospital in South India. Consecutive patients who were diagnosed as IIP by a standard algorithm were included into the study. Cough and dyspnea were graded for severity and duration. Pulmonary function tests and exercise testing parameters were noted. HRCT was scored based on an alveolar score, an interstitial score and a total score. The HRCT was correlated with each of the clinical and physiologic parameters. Pearson's/Spearman's correlation coefficient was used for the correlation of symptoms and parameters of ABG, PFT and 6MWT with the HRCT scores. A total of 94 patients were included in the study. Cough and dyspnea severity (r = 0.336 and 0.299), FVC (r = -0.48), TLC (r = -0.439) and DLCO and distance saturation product (DSP) (r = -0.368) and lowest saturation (r = -0.324) had significant correlation with total HRCT score. Among these, DLCO, particularly DLCO corrected % of predicted, correlated best with HRCT score (r = -0.721).. Symptoms, PFT and exercise testing had good correlation with HRCT. DLCO corrected % of predicted correlated best with HRCT.
Alluri, Ram Kiran; Tsing, Pamela; Lee, Edward; Napolitano, Jason
2016-01-01
The purpose of this study was to compare the efficacy of simulation versus lecture-based education among preclinical medical students. Twenty medical students participated in this randomized, controlled crossover study. Students were randomized to four groups. Each group received two simulations and two lectures covering four different topics. Students were administered a pre-test, post-test and delayed post-test. The mean percentage of questions answered correctly on each test was calculated. The mean of each student's change in score across the three tests was used to compare simulation- versus lecture-based education. Students in both the simulation and lecture groups demonstrated improvement between the pre-test and post-test (p < 0.05). Students in the simulation group demonstrated improvement between the immediate post-test and delayed post-test (p < 0.05), while students in the lecture group did not demonstrate improvement (p > 0.05). When comparing interventions, the change in score between the pre-test and post-test was similar among both the groups (p > 0.05). The change in score between the post-test and delayed post-test was greater in the simulation group (p < 0.05). High-fidelity simulation may serve as a viable didactic platform for preclinical medical education. Our study demonstrated equivalent immediate knowledge gain and superior long-term knowledge retention in comparison to lectures.
Shenker, Bennett S
2014-02-01
To validate a scoring system that evaluates the ability of Internet search engines to correctly predict diagnoses when symptoms are used as search terms. We developed a five point scoring system to evaluate the diagnostic accuracy of Internet search engines. We identified twenty diagnoses common to a primary care setting to validate the scoring system. One investigator entered the symptoms for each diagnosis into three Internet search engines (Google, Bing, and Ask) and saved the first five webpages from each search. Other investigators reviewed the webpages and assigned a diagnostic accuracy score. They rescored a random sample of webpages two weeks later. To validate the five point scoring system, we calculated convergent validity and test-retest reliability using Kendall's W and Spearman's rho, respectively. We used the Kruskal-Wallis test to look for differences in accuracy scores for the three Internet search engines. A total of 600 webpages were reviewed. Kendall's W for the raters was 0.71 (p<0.0001). Spearman's rho for test-retest reliability was 0.72 (p<0.0001). There was no difference in scores based on Internet search engine. We found a significant difference in scores based on the webpage's order on the Internet search engine webpage (p=0.007). Pairwise comparisons revealed higher scores in the first webpages vs. the fourth (corr p=0.009) and fifth (corr p=0.017). However, this significance was lost when creating composite scores. The five point scoring system to assess diagnostic accuracy of Internet search engines is a valid and reliable instrument. The scoring system may be used in future Internet research. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Andrade-Souza, Yuri M; Zadeh, Gelareh; Ramani, Meera; Scora, Daryl; Tsao, May N; Schwartz, Michael L
2005-10-01
The aim of this study was to validate the radiosurgery-based arteriovenous malformation (AVM) score and the modified Spetzler-Martin grading system to predict radiosurgical outcome. One hundred thirty-six patients with brain AVMs were randomly selected. These patients had undergone a linear accelerator radiosurgical procedure at a single center between 1989 and 2000. Patients were divided into four groups according to an AVM score, which was calculated from the lesion volume, lesion location, and patient age (Group 1, AVM score <1; Group 2, AVM score 1-1.49; Group 3, AVM score 1.5-2; and Group 4, AVM score >2). Patients with a Spetzler-Martin Grade III AVM were divided into Grades IIIA (lesion >3 cm) and IIIB (lesion <3 cm). Sixty-two female (45.6%) and 74 male (54.4%) patients with a median age of 37.5 years (mean 37.5 years, range 5-77 years) were followed up for a median of 40 months. The median tumor margin dose was 15 Gy (mean 17.23 Gy, range 15-25 Gy). The proportions of excellent outcomes according to the AVM score were as follows: 91.7% for Group 1, 74.1% for Group 2, 60% for Group 3, and 33.3% for Group 4 (chi-square test, degrees of freedom (df) = 3, p < 0.001). Based on the modified Spetzler-Martin system, Grade I lesions had 88.9% excellent results; Grade II, 69.6%; Grade IIIB, 61.5%; and Grades IIIA and IV, 44.8% (chi-square test, df = 3, p = 0.047). The radiosurgery-based AVM score can be used accurately to predict excellent results following a single radiosurgical treatment for AVM. The modified Spetzler-Martin system can also predict radiosurgical results for AVMs, thus making it possible to use this system while deciding between surgery and radiosurgery.
NASA Astrophysics Data System (ADS)
Mulkerrin, Elizabeth A.
The purpose of this study was to determine the effect of an 11th-grade and 12th-grade zoo-based academic high school experiential science program compared to a same school-district school-based academic high school experiential science program on students' pretest and posttest science, math, and reading achievement, and student perceptions of program relevance, rigor, and relationships. Science coursework delivery site served as the study's independent variable for the two naturally formed groups representing students (n = 18) who completed a zoo-based experiential academic high school science program and students (n = 18) who completed a school-based experiential academic high school science program. Students in the first group, a zoo-based experiential academic high school science program, completed real world, hands-on projects at the zoo while students in the second group, those students who completed a school-based experiential academic high school science program, completed real world, simulated projects in the classroom. These groups comprised the two research arms of the study. Both groups of students were selected from the same school district. The study's two dependent variables were achievement and school climate. Achievement was analyzed using norm-referenced 11th-grade pretest PLAN and 12th-grade posttest ACT test composite scores. Null hypotheses were rejected in the direction of improved test scores for both science program groups---students who completed the zoo-based experiential academic high school science program (p < .001) and students who completed the school-based experiential academic high school science program (p < .001). The posttest-posttest ACT test composite score comparison was not statistically different ( p = .93) indicating program equipoise for students enrolled in both science programs. No overall weighted grade point average score improvement was observed for students in either science group, however, null hypotheses were rejected in the direction of improved science grade point average scores for 11th-grade (p < .01) and 12th-grade (p = .01) students who completed the zoo-based experiential academic high school science program. Null hypotheses were not rejected for between group posttest science grade point average scores and school district criterion reference math and reading test scores. Finally, students who completed the zoo-based experiential academic high school science program had statistically improved pretest-posttest perceptions of program relationship scores (p < .05) and compared to students who completed the school-based experiential academic high school science program had statistically greater posttest perceptions of program relevance (p < .001), perceptions of program rigor (p < .001), and perceptions of program relationships (p < .001).
A Validity-Based Approach to Quality Control and Assurance of Automated Scoring
ERIC Educational Resources Information Center
Bejar, Isaac I.
2011-01-01
Automated scoring of constructed responses is already operational in several testing programmes. However, as the methodology matures and the demand for the utilisation of constructed responses increases, the volume of automated scoring is likely to increase at a fast pace. Quality assurance and control of the scoring process will likely be more…
ERIC Educational Resources Information Center
Chen, Haiwen
2012-01-01
In this article, linear item response theory (IRT) observed-score equating is compared under a generalized kernel equating framework with Levine observed-score equating for nonequivalent groups with anchor test design. Interestingly, these two equating methods are closely related despite being based on different methodologies. Specifically, when…
Information Technology and Literacy Assessment.
ERIC Educational Resources Information Center
Balajthy, Ernest
2002-01-01
Compares technology predictions from around 1989 with the technology of 2002. Discusses the place of computer-based assessment today, computer-scored testing, computer-administered formal assessment, Internet-based formal assessment, computerized adaptive tests, placement tests, informal assessment, electronic portfolios, information management,…
Implementation of an Improved Adaptive Testing Theory
ERIC Educational Resources Information Center
Al-A'ali, Mansoor
2007-01-01
Computer adaptive testing is the study of scoring tests and questions based on assumptions concerning the mathematical relationship between examinees' ability and the examinees' responses. Adaptive student tests, which are based on item response theory (IRT), have many advantages over conventional tests. We use the least square method, a…
QUASAR--scoring and ranking of sequence-structure alignments.
Birzele, Fabian; Gewehr, Jan E; Zimmer, Ralf
2005-12-15
Sequence-structure alignments are a common means for protein structure prediction in the fields of fold recognition and homology modeling, and there is a broad variety of programs that provide such alignments based on sequence similarity, secondary structure or contact potentials. Nevertheless, finding the best sequence-structure alignment in a pool of alignments remains a difficult problem. QUASAR (quality of sequence-structure alignments ranking) provides a unifying framework for scoring sequence-structure alignments that aids finding well-performing combinations of well-known and custom-made scoring schemes. Those scoring functions can be benchmarked against widely accepted quality scores like MaxSub, TMScore, Touch and APDB, thus enabling users to test their own alignment scores against 'standard-of-truth' structure-based scores. Furthermore, individual score combinations can be optimized with respect to benchmark sets based on known structural relationships using QUASAR's in-built optimization routines.
Moore, Tyler M.; Reise, Steven P.; Roalf, David R.; Satterthwaite, Theodore D.; Davatzikos, Christos; Bilker, Warren B.; Port, Allison M.; Jackson, Chad T.; Ruparel, Kosha; Savitt, Adam P.; Baron, Robert B.; Gur, Raquel E.; Gur, Ruben C.
2016-01-01
Traditional “paper-and-pencil” testing is imprecise in measuring speed and hence limited in assessing performance efficiency, but computerized testing permits precision in measuring itemwise response time. We present a method of scoring performance efficiency (combining information from accuracy and speed) at the item level. Using a community sample of 9,498 youths age 8-21, we calculated item-level efficiency scores on four neurocognitive tests, and compared the concurrent, convergent, discriminant, and predictive validity of these scores to simple averaging of standardized speed and accuracy-summed scores. Concurrent validity was measured by the scores' abilities to distinguish men from women and their correlations with age; convergent and discriminant validity were measured by correlations with other scores inside and outside of their neurocognitive domains; predictive validity was measured by correlations with brain volume in regions associated with the specific neurocognitive abilities. Results provide support for the ability of itemwise efficiency scoring to detect signals as strong as those detected by standard efficiency scoring methods. We find no evidence of superior validity of the itemwise scores over traditional scores, but point out several advantages of the former. The itemwise efficiency scoring method shows promise as an alternative to standard efficiency scoring methods, with overall moderate support from tests of four different types of validity. This method allows the use of existing item analysis methods and provides the convenient ability to adjust the overall emphasis of accuracy versus speed in the efficiency score, thus adjusting the scoring to the real-world demands the test is aiming to fulfill. PMID:26866796
ERIC Educational Resources Information Center
Attali, Yigal; Powers, Don; Freedman, Marshall; Harrison, Marissa; Obetz, Susan
2008-01-01
This report describes the development, administration, and scoring of open-ended variants of GRE® Subject Test items in biology and psychology. These questions were administered in a Web-based experiment to registered examinees of the respective Subject Tests. The questions required a short answer of 1-3 sentences, and responses were automatically…
Imperfect practice makes perfect: error management training improves transfer of learning.
Dyre, Liv; Tabor, Ann; Ringsted, Charlotte; Tolsgaard, Martin G
2017-02-01
Traditionally, trainees are instructed to practise with as few errors as possible during simulation-based training. However, transfer of learning may improve if trainees are encouraged to commit errors. The aim of this study was to assess the effects of error management instructions compared with error avoidance instructions during simulation-based ultrasound training. Medical students (n = 60) with no prior ultrasound experience were randomised to error management training (EMT) (n = 32) or error avoidance training (EAT) (n = 28). The EMT group was instructed to deliberately make errors during training. The EAT group was instructed to follow the simulator instructions and to commit as few errors as possible. Training consisted of 3 hours of simulation-based ultrasound training focusing on fetal weight estimation. Simulation-based tests were administered before and after training. Transfer tests were performed on real patients 7-10 days after the completion of training. Primary outcomes were transfer test performance scores and diagnostic accuracy. Secondary outcomes included performance scores and diagnostic accuracy during the simulation-based pre- and post-tests. A total of 56 participants completed the study. On the transfer test, EMT group participants attained higher performance scores (mean score: 67.7%, 95% confidence interval [CI]: 62.4-72.9%) than EAT group members (mean score: 51.7%, 95% CI: 45.8-57.6%) (p < 0.001; Cohen's d = 1.1, 95% CI: 0.5-1.7). There was a moderate improvement in diagnostic accuracy in the EMT group compared with the EAT group (16.7%, 95% CI: 10.2-23.3% weight deviation versus 26.6%, 95% CI: 16.5-36.7% weight deviation [p = 0.082; Cohen's d = 0.46, 95% CI: -0.06 to 1.0]). No significant interaction effects between group and performance improvements between the pre- and post-tests were found in either performance scores (p = 0.25) or diagnostic accuracy (p = 0.09). The provision of error management instructions during simulation-based training improves the transfer of learning to the clinical setting compared with error avoidance instructions. Rather than teaching to avoid errors, the use of errors for learning should be explored further in medical education theory and practice. © 2016 John Wiley & Sons Ltd and The Association for the Study of Medical Education.
Development and evaluation of a school-based asthma educational program.
Al Aloola, Noha Abdullah; Saba, Maya; Nissen, Lisa; Alewairdhi, Huda Abdullaziz; Alaloola, Alhnouf; Saini, Bandana
2017-05-01
To develop, implement, and evaluate the effects of a school-based asthma educational program on Saudi primary school teachers' asthma awareness and competence in delivering asthma-related first aid interventions. An asthma educational intervention program entitled "School Asthma Action Program" (SAAP) was designed based on pedagogical principles and implemented among teachers randomly selected from girls' primary schools in Riyadh, Saudi Arabia. This pilot study employed a pre-test/post-test experimental design. A previously tested asthma awareness questionnaire and a custom-designed asthma competence score sheet were used to evaluate the effects of the educational intervention program on teacher's asthma awareness and competence in providing asthma-related first aid interventions at schools. Forty-seven teachers from five different primary schools participated in the program. Of the 47 teachers, 39 completed both the pre- and post-program questionnaires. The SAAP improved teachers' awareness of asthma (teachers' median pre-program score was 11 (range 5-18) and their post-program score was 15 (range 7-18), p < 0.001) and their attitudes toward asthma management at schools (teachers' median pre-program score was 74 (range 15-75) and their post-program score was 75 (range 15-75), p = 0.043). Further, it improved teachers' competence in providing asthma-related first aid interventions (teachers' mean pre-program score was 1.4 ± 2.3 and their mean post-program score was 9.8 ± 0.5, p < 0.001). After completing the SAAP, a high proportion of teachers reported increased confidence in providing care to children with asthma at school. School-based asthma educational programs can significantly improve teachers' knowledge of asthma and their competence in providing asthma-related first aid interventions during emergencies.
Jeong, Jae Yoon; Jun, Dae Won; Bai, Daiseg; Kim, Ji Yean; Sohn, Joo Hyun; Ahn, Sang Bong; Kim, Sang Gyune; Kim, Tae Yeob; Kim, Hyoung Su; Jeong, Soung Won; Cho, Yong Kyun; Song, Do Seon; Kim, Hee Yeon; Jung, Young Kul; Yoon, Eileen L
2017-09-01
The aim of this study was to validate a new paper and pencil test battery to diagnose minimal hepatic encephalopathy (MHE) in Korea. A new paper and pencil test battery was composed of number connection test-A (NCT-A), number connection test-B (NCT-B), digit span test (DST), and symbol digit modality test (SDMT). The norm of the new test was based on 315 healthy individuals between the ages of 20 and 70 years old. Another 63 healthy subjects (n = 31) and cirrhosis patients (n = 32) were included as a validation cohort. All participants completed the new paper and pencil test, a critical flicker frequency (CFF) test and computerized cognitive function test (visual continuous performance test [CPT]). The scores on the NCT-A and NCT-B increased but those of DST and SDMT decreased according to age. Twelve of the cirrhotic patients (37.5%) were diagnosed with MHE based on the new paper and pencil test battery. The total score of the paper and pencil test battery showed good positive correlation with the CFF (r = 0.551, P < 0.001) and computerized cognitive function test. Also, this score was lower in patients with MHE compared to those without MHE (P < 0.001). Scores on the CFF (32.0 vs. 28.7 Hz, P = 0.028) and the computer base cognitive test decreased significantly in patients with MHE compared to those without MHE. Test-retest reliability was comparable. In conclusion, the new paper and pencil test battery including NCT-A, NCT-B, DST, and SDMT showed good correlation with neuropsychological tests. This new paper and pencil test battery could help to discriminate patients with impaired cognitive function in cirrhosis (registered at Clinical Research Information Service [CRIS], https://cris.nih.go.kr/cris, KCT0000955). © 2017 The Korean Academy of Medical Sciences.
Weiser, Mark; Zarka, Salman; Werbeloff, Nomi; Kravitz, Efrat; Lubin, Gad
2010-02-01
Although previous studies indicate that people with lower intelligence quotient (IQ) scores are more likely to become cigarette smokers, IQ scores of siblings discordant for smoking and of adolescents who began smoking between ages 18-21 years have not been studied systematically. Each year a random sample of Israeli military recruits complete a smoking questionnaire. Cognitive functioning is assessed by the military using standardized tests equivalent to IQ. Of 20 221 18-year-old males, 28.5% reported smoking at least one cigarette a day (smokers). An unadjusted comparison found that smokers scored 0.41 effect sizes (ES, P < 0.001) lower than non-smokers; adjusted analyses remained significant (adjusted ES = 0.27, P < 0.001). Adolescents smoking one to five, six to 10, 11-20 and 21+ cigarettes/day had cognitive test scores 0.14, 0.22, 0.33 and 0.5 adjusted ES poorer than those of non-smokers (P < 0.001). Adolescents who did not smoke by age 18, and then began to smoke between ages 18-21 had lower cognitive test scores compared to never-smokers (adjusted ES = 0.14, P < 0.001). An analysis of brothers discordant for smoking found that smoking brothers had lower cognitive scores than non-smoking brothers (adjusted ES = 0.27; P = 0.014). Controlled analyses from this large population-based cohort of male adolescents indicate that IQ scores are lower in male adolescents who smoke compared to non-smokers and in brothers who smoke compared to their non-smoking brothers. The IQs of adolescents who began smoking between ages 18-21 are lower than those of non-smokers. Adolescents with poorer IQ scores might be targeted for programmes designed to prevent smoking.
A weighted generalized score statistic for comparison of predictive values of diagnostic tests
Kosinski, Andrzej S.
2013-01-01
Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations which are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we present, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic which incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, it always reduces to the score statistic in the independent samples situation, and it preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the weighted generalized score test statistic in a general GEE setting. PMID:22912343
An Evaluation of Three Approximate Item Response Theory Models for Equating Test Scores.
ERIC Educational Resources Information Center
Marco, Gary L.; And Others
Three item response models were evaluated for estimating item parameters and equating test scores. The models, which approximated the traditional three-parameter model, included: (1) the Rasch one-parameter model, operationalized in the BICAL computer program; (2) an approximate three-parameter logistic model based on coarse group data divided…
Growth Models and Teacher Evaluation: What Teachers Need to Know and Do
ERIC Educational Resources Information Center
Katz, Daniel S.
2016-01-01
Including growth models based on student test scores in teacher evaluations effectively holds teachers individually accountable for students improving their test scores. While an attractive policy for state administrators and advocates of education reform, value-added measures have been fraught with problems, and their use in teacher evaluation is…
The Michigan Alcoholism Screening Test (MAST): A Statistical Validation Analysis
ERIC Educational Resources Information Center
Laux, John M.; Newman, Isadore; Brown, Russ
2004-01-01
This study extends the Michigan Alcoholism Screening Test (MAST; M. L. Selzer, 1971) literature base by examining 4 issues related to the validity of the MAST scores. Specifically, the authors examine the validity of the MAST scores in light of the presence of impression management, participant demographic variables, and item endorsement…
Relationship of Self Esteem of the Disadvantaged to School Success.
ERIC Educational Resources Information Center
Frerichs, Allen H.
This study shows that there is a positive correlation between self esteem and academic achievement for inner city black children. Seventy-eight grade 6 black students were divided into the following categories: upper one-third and lower third based on intelligence test scores, standardized reading test scores, and grade point average (GPA) from…
Commentary on "Validating the Interpretations and Uses of Test Scores"
ERIC Educational Resources Information Center
Brennan, Robert L.
2013-01-01
Kane's paper "Validating the Interpretations and Uses of Test Scores" is the most complete and clearest discussion yet available of the argument-based approach to validation. At its most basic level, validation as formulated by Kane is fundamentally a simply-stated two-step enterprise: (1) specify the claims inherent in a particular interpretation…
Update of the U.S. Army Research Institute’s Longitudinal Research Data Base of Enlisted Personnel
1992-08-01
accession data elements, including Composite score data from the Army Classification Battery Test (ACB), were captured for each individual. For each...individual, Skill Qualifying Test (SQT) scores were included beginning in 1980 and additional data were included from the Enlisted Master File (EMF). The...entry into active duty for current tour, pay grade, Composite and SQT scores , and military occupation specialty (MOS). The EPRDB is designed to play an
Niemeijer, Anuschka S; van Waelvelde, Hilde; Smits-Engelsman, Bouwien C M
2015-02-01
The Movement Assessment Battery for Children has been revised as the Movement ABC-2 (Henderson, Sugden, & Barnett, 2007). In Europe, the 15th percentile score on this test is recommended for one of the DSM-IV diagnostic criteria for Developmental Coordination Disorder (DCD). A representative sample of Dutch and Flemish children was tested to cross-validate the UK standard scores, including the 15th percentile score. First, the mean, SD and percentile scores of Dutch children were compared to those of UK normative samples. Item standard scores of Dutch speaking children deviated from the UK reference values suggesting necessary adjustments. Except for very young children, the Dutch-speaking samples performed better. Second, based on the mean and SD and clinical relevant cut-off scores (5th and 15th percentile), norms were adjusted for the Dutch population. For diagnostic use, researchers and clinicians should use the reference norms that are valid for the group of children they are testing. The results indicate that there possibly is an effect of testing procedure in other countries that validated the UK norms and/or cultural influence on the age norms of the Movement ABC-2. It is suggested to formulate criterion-based norms for age groups in addition to statistical norms. Copyright © 2014 Elsevier B.V. All rights reserved.
O'Connell, Megan E; Tuokko, Holly; Voll, Stacey; Simard, Martine; Griffith, Lauren E; Taler, Vanessa; Wolfson, Christina; Kirkland, Susan; Raina, Parminder
We detail a new approach to the creation of normative data for neuropsychological tests. The traditional approach to normative data creation is to make demographic adjustments based on observations of correlations between single neuropsychological tests and selected demographic variables. We argue, however, that this does not describe the implications for clinical practice, such as increased likelihood of misclassification of cognitive impairment, nor does it elucidate the impact on decision-making with a neuropsychological battery. We propose base rate analyses; specifically, differential base rates of impaired scores between theoretical and actual base rates as the basis for decisions to create demographic adjustments within normative data. Differential base rates empirically describe the potential clinical implications of failing to create an appropriate normative group. We demonstrate this approach with data from a short telephone-administered neuropsychological battery given to a large, neurologically healthy sample aged 45-85 years old. We explored whether adjustments for age and medical conditions were warranted based on differential base rates of spuriously impaired scores. Theoretical base rates underestimated the frequency of impaired scores in older adults and overestimated the frequency of impaired scores in younger adults, providing an evidence base for the creation of age-corrected normative data. In contrast, the number of medical conditions (numerous cardiovascular, hormonal, and metabolic conditions) was not related to differential base rates of impaired scores. Despite a small correlation between number of medical conditions and each neuropsychological variable, normative adjustments for number of medical conditions does not appear warranted. Implications for creation of normative data are discussed.
van Rosendael, Alexander R; Maliakal, Gabriel; Kolli, Kranthi K; Beecy, Ashley; Al'Aref, Subhi J; Dwivedi, Aeshita; Singh, Gurpreet; Panday, Mohit; Kumar, Amit; Ma, Xiaoyue; Achenbach, Stephan; Al-Mallah, Mouaz H; Andreini, Daniele; Bax, Jeroen J; Berman, Daniel S; Budoff, Matthew J; Cademartiri, Filippo; Callister, Tracy Q; Chang, Hyuk-Jae; Chinnaiyan, Kavitha; Chow, Benjamin J W; Cury, Ricardo C; DeLago, Augustin; Feuchtner, Gudrun; Hadamitzky, Martin; Hausleiter, Joerg; Kaufmann, Philipp A; Kim, Yong-Jin; Leipsic, Jonathon A; Maffei, Erica; Marques, Hugo; Pontone, Gianluca; Raff, Gilbert L; Rubinshtein, Ronen; Shaw, Leslee J; Villines, Todd C; Gransar, Heidi; Lu, Yao; Jones, Erica C; Peña, Jessica M; Lin, Fay Y; Min, James K
Machine learning (ML) is a field in computer science that demonstrated to effectively integrate clinical and imaging data for the creation of prognostic scores. The current study investigated whether a ML score, incorporating only the 16 segment coronary tree information derived from coronary computed tomography angiography (CCTA), provides enhanced risk stratification compared with current CCTA based risk scores. From the multi-center CONFIRM registry, patients were included with complete CCTA risk score information and ≥3 year follow-up for myocardial infarction and death (primary endpoint). Patients with prior coronary artery disease were excluded. Conventional CCTA risk scores (conventional CCTA approach, segment involvement score, duke prognostic index, segment stenosis score, and the Leaman risk score) and a score created using ML were compared for the area under the receiver operating characteristic curve (AUC). Only 16 segment based coronary stenosis (0%, 1-24%, 25-49%, 50-69%, 70-99% and 100%) and composition (calcified, mixed and non-calcified plaque) were provided to the ML model. A boosted ensemble algorithm (extreme gradient boosting; XGBoost) was used and the entire data was randomly split into a training set (80%) and testing set (20%). First, tuned hyperparameters were used to generate a trained model from the training data set (80% of data). Second, the performance of this trained model was independently tested on the unseen test set (20% of data). In total, 8844 patients (mean age 58.0 ± 11.5 years, 57.7% male) were included. During a mean follow-up time of 4.6 ± 1.5 years, 609 events occurred (6.9%). No CAD was observed in 48.7% (3.5% event), non-obstructive CAD in 31.8% (6.8% event), and obstructive CAD in 19.5% (15.6% event). Discrimination of events as expressed by AUC was significantly better for the ML based approach (0.771) vs the other scores (ranging from 0.685 to 0.701), P < 0.001. Net reclassification improvement analysis showed that the improved risk stratification was the result of down-classification of risk among patients that did not experience events (non-events). A risk score created by a ML based algorithm, that utilizes standard 16 coronary segment stenosis and composition information derived from detailed CCTA reading, has greater prognostic accuracy than current CCTA integrated risk scores. These findings indicate that a ML based algorithm can improve the integration of CCTA derived plaque information to improve risk stratification. Published by Elsevier Inc.
Khan, Asaduzzaman; Chien, Chi-Wen; Bagraith, Karl S
2015-04-01
To investigate whether using a parametric statistic in comparing groups leads to different conclusions when using summative scores from rating scales compared with using their corresponding Rasch-based measures. A Monte Carlo simulation study was designed to examine between-group differences in the change scores derived from summative scores from rating scales, and those derived from their corresponding Rasch-based measures, using 1-way analysis of variance. The degree of inconsistency between the 2 scoring approaches (i.e. summative and Rasch-based) was examined, using varying sample sizes, scale difficulties and person ability conditions. This simulation study revealed scaling artefacts that could arise from using summative scores rather than Rasch-based measures for determining the changes between groups. The group differences in the change scores were statistically significant for summative scores under all test conditions and sample size scenarios. However, none of the group differences in the change scores were significant when using the corresponding Rasch-based measures. This study raises questions about the validity of the inference on group differences of summative score changes in parametric analyses. Moreover, it provides a rationale for the use of Rasch-based measures, which can allow valid parametric analyses of rating scale data.
Rahm, Stefan; Wieser, Karl; Bauer, David E; Waibel, Felix Wa; Meyer, Dominik C; Gerber, Christian; Fucentese, Sandro F
2018-05-16
Most studies demonstrated, that training on a virtual reality based arthroscopy simulator leads to an improvement of technical skills in orthopaedic surgery. However, how long and what kind of training is optimal for young residents is unknown. In this study we tested the efficacy of a standardized, competency based training protocol on a validated virtual reality based knee- and shoulder arthroscopy simulator. Twenty residents and five experts in arthroscopy were included. All participants performed a test including knee -and shoulder arthroscopy tasks on a virtual reality knee- and shoulder arthroscopy simulator. The residents had to complete a competency based training program. Thereafter, the previously completed test was retaken. We evaluated the metric data of the simulator using a z-score and the Arthroscopic Surgery Skill Evaluation Tool (ASSET) to assess training effects in residents and performance levels in experts. The residents significantly improved from pre- to post training in the overall z-score: - 9.82 (range, - 20.35 to - 1.64) to - 2.61 (range, - 6.25 to 1.5); p < 0.001. The overall ASSET score improved from 55 (27 to 84) percent to 75 (48 to 92) percent; p < 0.001. The experts, however, achieved a significantly higher z-score in the shoulder tasks (p < 0.001 and a statistically insignificantly higher z-score in the knee tasks with a p = 0.921. The experts mean overall ASSET score (knee and shoulder) was significantly higher in the therapeutic tasks (p < 0.001) compared to the residents post training result. The use of a competency based simulator training with this specific device for 3-5 h is an effective tool to advance basic arthroscopic skills of resident in training from 0 to 5 years based on simulator measures and simulator based ASSET testing. Therefore, we conclude that this sort of training method appears useful to learn the handling of the camera, basic anatomy and the triangulation with instruments.
Thrall, Grace C; Coverdale, John H; Benjamin, Sophiya; Wiggins, Anna; Lane, Christianne Joy; Pato, Michele T
2016-10-01
This goal of this study was to evaluate the efficacy of team-based learning (TBL) on knowledge retention compared to traditional lectures with small break-out group discussion (teaching as usual (TAU)) using a randomized controlled trial. This randomized controlled trial was conducted during a daylong conference for psychiatric educators on attention-deficit hyperactivity disorder and the research literacy topic of efficacy versus effectiveness trials. Learners (n = 115) were randomized with concealed allocation to either TBL or TAU. Knowledge was measured prior to the intervention, immediately afterward, and 2 months later via multiple-choice tests. Participants were necessarily unblinded. Data enterers, data analysts, and investigators were blinded to group assignment in data analysis. Per-protocol analyses of test scores were performed using change in knowledge from baseline. The primary endpoint was test scores at 2 months. At baseline, there were no statistically significant differences between groups in pre-test knowledge. At immediate post-test, both TBL and TAU groups showed improved knowledge scores compared with their baseline scores. The TBL group performed better statistically on the immediate post-test than the TAU group (Cohen's d = 0.73; p < 0.001), although the differences in knowledge scores were not educationally meaningful, averaging just one additional test question correct (out of 15). On the 2-month remote post-test, there were no group differences in knowledge retention among the 42 % of participants who returned the 2-month test. Both TBL and TAU learners acquired new knowledge at the end of the intervention and retained knowledge over 2 months. At the end of the intervention day and after 2 months, knowledge test scores were not meaningfully different between TBL and TAU completers. In conclusion, this study failed to demonstrate the superiority of TBL over TAU on the primary outcome of knowledge retention at 2 months post-intervention.
Management of heart failure in the new era: the role of scores.
Mantegazza, Valentina; Badagliacca, Roberto; Nodari, Savina; Parati, Gianfranco; Lombardi, Carolina; Di Somma, Salvatore; Carluccio, Erberto; Dini, Frank Lloyd; Correale, Michele; Magrì, Damiano; Agostoni, Piergiuseppe
2016-08-01
Heart failure is a widespread syndrome involving several organs, still characterized by high mortality and morbidity, and whose clinical course is heterogeneous and hardly predictable.In this scenario, the assessment of heart failure prognosis represents a fundamental step in clinical practice. A single parameter is always unable to provide a very precise prognosis. Therefore, risk scores based on multiple parameters have been introduced, but their clinical utility is still modest. In this review, we evaluated several prognostic models for acute, right, chronic, and end-stage heart failure based on multiple parameters. In particular, for chronic heart failure we considered risk scores essentially based on clinical evaluation, comorbidities analysis, baroreflex sensitivity, heart rate variability, sleep disorders, laboratory tests, echocardiographic imaging, and cardiopulmonary exercise test parameters. What is at present established is that a single parameter is not sufficient for an accurate prediction of prognosis in heart failure because of the complex nature of the disease. However, none of the scoring systems available is widely used, being in some cases complex, not user-friendly, or based on expensive or not easily available parameters. We believe that multiparametric scores for risk assessment in heart failure are promising but their widespread use needs to be experienced.
NASA Astrophysics Data System (ADS)
Powell, P. E.
Educators have recently come to consider inquiry based instruction as a more effective method of instruction than didactic instruction. Experience based learning theory suggests that student performance is linked to teaching method. However, research is limited on inquiry teaching and its effectiveness on preparing students to perform well on standardized tests. The purpose of the study to investigate whether one of these two teaching methodologies was more effective in increasing student performance on standardized science tests. The quasi experimental quantitative study was comprised of two stages. Stage 1 used a survey to identify teaching methods of a convenience sample of 57 teacher participants and determined level of inquiry used in instruction to place participants into instructional groups (the independent variable). Stage 2 used analysis of covariance (ANCOVA) to compare posttest scores on a standardized exam by teaching method. Additional analyses were conducted to examine the differences in science achievement by ethnicity, gender, and socioeconomic status by teaching methodology. Results demonstrated a statistically significant gain in test scores when taught using inquiry based instruction. Subpopulation analyses indicated all groups showed improved mean standardized test scores except African American students. The findings benefit teachers and students by presenting data supporting a method of content delivery that increases teacher efficacy and produces students with a greater cognition of science content that meets the school's mission and goals.
Infant polysomnography: reliability and validity of infant arousal assessment.
Crowell, David H; Kulp, Thomas D; Kapuniai, Linda E; Hunt, Carl E; Brooks, Lee J; Weese-Mayer, Debra E; Silvestri, Jean; Ward, Sally Davidson; Corwin, Michael; Tinsley, Larry; Peucker, Mark
2002-10-01
Infant arousal scoring based on the Atlas Task Force definition of transient EEG arousal was evaluated to determine (1). whether transient arousals can be identified and assessed reliably in infants and (2). whether arousal and no-arousal epochs scored previously by trained raters can be validated reliably by independent sleep experts. Phase I for inter- and intrarater reliability scoring was based on two datasets of sleep epochs selected randomly from nocturnal polysomnograms of healthy full-term, preterm, idiopathic apparent life-threatening event cases, and siblings of Sudden Infant Death Syndrome infants of 35 to 64 weeks postconceptional age. After training, test set 1 reliability was assessed and discrepancies identified. After retraining, test set 2 was scored by the same raters to determine interrater reliability. Later, three raters from the trained group rescored test set 2 to assess inter- and intrarater reliabilities. Interrater and intrarater reliability kappa's, with 95% confidence intervals, ranged from substantial to almost perfect levels of agreement. Interrater reliabilities for spontaneous arousals were initially moderate and then substantial. During the validation phase, 315 previously scored epochs were presented to four sleep experts to rate as containing arousal or no-arousal events. Interrater expert agreements were diverse and considered as noninterpretable. Concordance in sleep experts' agreements, based on identification of the previously sampled arousal and no-arousal epochs, was used as a secondary evaluative technique. Results showed agreement by two or more experts on 86% of the Collaborative Home Infant Monitoring Evaluation Study arousal scored events. Conversely, only 1% of the Collaborative Home Infant Monitoring Evaluation Study-scored no-arousal epochs were rated as an arousal. In summary, this study presents an empirically tested model with procedures and criteria for attaining improved reliability in transient EEG arousal assessments in infants using the modified Atlas Task Force standards. With training based on specific criteria, substantial inter- and intrarater agreement in identifying infant arousals was demonstrated. Corroborative validation results were too disparate for meaningful interpretation. Alternate evaluation based on concordance agreements supports reliance on infant EEG criteria for assessment. Results mandate additional confirmatory validation studies with specific training on infant EEG arousal assessment criteria.
Doshi, Neena Piyush
2017-01-01
Team-based learning (TBL) combines small and large group learning by incorporating multiple small groups in a large group setting. It is a teacher-directed method that encourages student-student interaction. This study compares student learning and teaching satisfaction between conventional lecture and TBL in the subject of pathology. The present study is aimed to assess the effectiveness of TBL method of teaching over the conventional lecture. The present study was conducted in the Department of Pathology, GMERS Medical College and General Hospital, Gotri, Vadodara, Gujarat. The study population comprised 126 students of second-year MBBS, in their third semester of the academic year 2015-2016. "Hemodynamic disorders" were taught by conventional method and "transfusion medicine" by TBL method. Effectiveness of both the methods was assessed. A posttest multiple choice question was conducted at the end of "hemodynamic disorders." Assessment of TBL was based on individual score, team score, and each member's contribution to the success of the team. The individual score and overall score were compared with the posttest score on "hemodynamic disorders." A feedback was taken from the students regarding their experience with TBL. Tukey's multiple comparisons test and ANOVA summary were used to find the significance of scores between didactic and TBL methods. Student feedback was taken using "Student Satisfaction Scale" based on Likert scoring method. The mean of student scores by didactic, Individual Readiness Assurance Test (score "A"), and overall (score "D") was 49.8% (standard deviation [SD]-14.8), 65.6% (SD-10.9), and 65.6% (SD-13.8), respectively. The study showed positive educational outcome in terms of knowledge acquisition, participation and engagement, and team performance with TBL.
ERIC Educational Resources Information Center
Goldberg, Gail Lynn; Roswell, Barbara Sherr
Teachers' reactions to the administration and scoring of the Maryland School Performance Assessment Program tests (MSPAP) were studied, focusing on their direct and indirect exposure to tasks and evaluative criteria through the experience of scoring the MSPAP. Since its inception in 1991, the MSPAP has been scored in-state by certified teachers…
ERIC Educational Resources Information Center
Woodruff, David; Traynor, Anne; Cui, Zhongmin; Fang, Yu
2013-01-01
Professional standards for educational testing recommend that both the overall standard error of measurement and the conditional standard error of measurement (CSEM) be computed on the score scale used to report scores to examinees. Several methods have been developed to compute scale score CSEMs. This paper compares three methods, based on…
Jenkinson, Toni-Marie; Muncer, Steven; Wheeler, Miranda; Brechin, Don; Evans, Stephen
2018-06-01
Neuropsychological assessment requires accurate estimation of an individual's premorbid cognitive abilities. Oral word reading tests, such as the test of premorbid functioning (TOPF), and demographic variables, such as age, sex, and level of education, provide a reasonable indication of premorbid intelligence, but their ability to predict other related cognitive abilities is less well understood. This study aimed to develop regression equations, based on the TOPF and demographic variables, to predict scores on tests of verbal fluency and naming ability. A sample of 119 healthy adults provided demographic information and were tested using the TOPF, FAS, animal naming test (ANT), and graded naming test (GNT). Multiple regression analyses, using the TOPF and demographics as predictor variables, were used to estimate verbal fluency and naming ability test scores. Change scores and cases of significant impairment were calculated for two clinical samples with diagnosed neurological conditions (TBI and meningioma) using the method in Knight, McMahon, Green, and Skeaff (). Demographic variables provided a significant contribution to the prediction of all verbal fluency and naming ability test scores; however, adding TOPF score to the equation considerably improved prediction beyond that afforded by demographic variables alone. The percentage of variance accounted for by demographic variables and/or TOPF score varied from 19 per cent (FAS), 28 per cent (ANT), and 41 per cent (GNT). Change scores revealed significant differences in performance in the clinical groups, particularity the TBI group. Demographic variables, particularly education level, and scores on the TOPF should be taken into consideration when interpreting performance on tests of verbal fluency and naming ability. © 2017 The British Psychological Society.
Methodological Approaches to Online Scoring of Essays.
ERIC Educational Resources Information Center
Chung, Gregory K. W. K.; O'Neil, Harold F., Jr.
This report examines the feasibility of scoring essays using computer-based techniques. Essays have been incorporated into many of the standardized testing programs. Issues of validity and reliability must be addressed to deploy automated approaches to scoring fully. Two approaches that have been used to classify documents, surface- and word-based…
Assessing students' conceptual knowledge of electricity and magnetism
NASA Astrophysics Data System (ADS)
McColgan, Michele W.; Finn, Rose A.; Broder, Darren L.; Hassel, George E.
2017-12-01
We present the Electricity and Magnetism Conceptual Assessment (EMCA), a new assessment aligned with second-semester introductory physics courses. Topics covered include electrostatics, electric fields, circuits, magnetism, and induction. We have two motives for writing a new assessment. First, we find other assessments such as the Brief Electricity and Magnetism Assessment and the Conceptual Survey on Electricity and Magnetism not well aligned with the topics and content depth of our courses. We want to test introductory physics content at a level appropriate for our students. Second, we want the assessment to yield scores and gains comparable to the widely used Force Concept Inventory (FCI). After five testing and revision cycles, the assessment was finalized in early 2015 and is available online. We present performance results for a cohort of 225 students at Siena College who were enrolled in our algebra- and calculus-based physics courses during the spring 2015 and 2016 semesters. We provide pretest, post-test, and gain analyses, as well as individual question and whole test statistics to quantify difficulty and reliability. In addition, we compare EMCA and FCI scores and gains, and we find that students' FCI scores are strongly correlated with their performance on the EMCA. Finally, the assessment was piloted in an algebra-based physics course at George Washington University (GWU). We present performance results for a cohort of 130 GWU students and we find that their EMCA scores are comparable to the scores of students in our calculus-based physics course.
ERIC Educational Resources Information Center
Kesan, Cenk; Ozkalkan, Zuhal; Iric, Hamdullah; Kaya, Deniz
2012-01-01
In the exams based on limits and derivatives, in this study, it was tried to determine that if there was any difference in students' test scores according to the type of music listened to and environment without music. For this purpose, the achievement test including limits and derivatives and whose reliability coefficient of Cronbach Alpha is…
Developmental assessment of preterm infants: Chronological or corrected age?
Harel-Gadassi, Ayelet; Friedlander, Edwa; Yaari, Maya; Bar-Oz, Benjamin; Eventov-Friedman, Smadar; Mankuta, David; Yirmiya, Nurit
2018-06-12
The aim of this study is to examine the effect of age correction on the developmental assessment scores of preterm infants, using for the first time, the Mullen scales of early learning (MSEL) test. Participants included 110 preterm infants (born at a gestational age of ≤ 34 weeks) at ages 1, 4, 8, 12, 18, 24 and 36 months. The corrected age-based MSEL composite score and each of the five MSEL scale scores were significantly higher than chronological age-based scores at all ages. These corrected scores were significantly higher than the chronological scores regardless of gestational age whether weight was, or adequate or small for gestational age. Larger differences between corrected and chronological age-based scores significantly correlated with earlier gestational age and with lower birth weight between 1 and 24 months but not at 36 months. Using chronological age-based scores yielded significantly more infants identified with developmental delays than using corrected age-based scores. The findings indicate that clinicians and researchers, as well as family members, should be aware of and acknowledge the distinction between corrected and chronological ages when evaluating preterm infants in research and clinical practices. Copyright © 2018. Published by Elsevier Ltd.
Carrillo-Larco, Rodrigo M; Miranda, J Jaime; Gilman, Robert H; Medina-Lezama, Josefina; Chirinos-Pacheco, Julio A; Muñoz-Retamozo, Paola V; Smeeth, Liam; Checkley, William; Bernabe-Ortiz, Antonio
2017-11-29
Chronic Kidney Disease (CKD) represents a great burden for the patient and the health system, particularly if diagnosed at late stages. Consequently, tools to identify patients at high risk of having CKD are needed, particularly in limited-resources settings where laboratory facilities are scarce. This study aimed to develop a risk score for prevalent undiagnosed CKD using data from four settings in Peru: a complete risk score including all associated risk factors and another excluding laboratory-based variables. Cross-sectional study. We used two population-based studies: one for developing and internal validation (CRONICAS), and another (PREVENCION) for external validation. Risk factors included clinical- and laboratory-based variables, among others: sex, age, hypertension and obesity; and lipid profile, anemia and glucose metabolism. The outcome was undiagnosed CKD: eGFR < 60 ml/min/1.73m 2 . We tested the performance of the risk scores using the area under the receiver operating characteristic (ROC) curve, sensitivity, specificity, positive/negative predictive values and positive/negative likelihood ratios. Participants in both studies averaged 57.7 years old, and over 50% were females. Age, hypertension and anemia were strongly associated with undiagnosed CKD. In the external validation, at a cut-off point of 2, the complete and laboratory-free risk scores performed similarly well with a ROC area of 76.2% and 76.0%, respectively (P = 0.784). The best assessment parameter of these risk scores was their negative predictive value: 99.1% and 99.0% for the complete and laboratory-free, respectively. The developed risk scores showed a moderate performance as a screening test. People with a score of ≥ 2 points should undergo further testing to rule out CKD. Using the laboratory-free risk score is a practical approach in developing countries where laboratories are not readily available and undiagnosed CKD has significant morbidity and mortality.
Black-white achievement gap and family wealth.
Yeung, W Jean; Conley, Dalton
2008-01-01
This article examines the extent to which family wealth affects the Black-White test score gap for young children based on data from the Panel Study of Income Dynamics (aged 3-12). This study found little evidence that wealth mediated the Black-White test scores gaps, which were eliminated when child and family demographic covariates were held constant. However, family wealth had a stronger association with cognitive achievement of school-aged children than that of preschoolers and a stronger association with school-aged children's math than on their reading scores. Liquid assets, particularly holdings in stocks or mutual funds, were positively associated with school-aged children's test scores. Family wealth was associated with a higher quality home environment, better parenting behavior, and children's private school attendance.
Pantazes, Robert J; Saraf, Manish C; Maranas, Costas D
2007-08-01
In this paper, we introduce and test two new sequence-based protein scoring systems (i.e. S1, S2) for assessing the likelihood that a given protein hybrid will be functional. By binning together amino acids with similar properties (i.e. volume, hydrophobicity and charge) the scoring systems S1 and S2 allow for the quantification of the severity of mismatched interactions in the hybrids. The S2 scoring system is found to be able to significantly functionally enrich a cytochrome P450 library over other scoring methods. Given this scoring base, we subsequently constructed two separate optimization formulations (i.e. OPTCOMB and OPTOLIGO) for optimally designing protein combinatorial libraries involving recombination or mutations, respectively. Notably, two separate versions of OPTCOMB are generated (i.e. model M1, M2) with the latter allowing for position-dependent parental fragment skipping. Computational benchmarking results demonstrate the efficacy of models OPTCOMB and OPTOLIGO to generate high scoring libraries of a prespecified size.
1997-02-01
application with a strong resemblance to a video game , concern has been raised that prior video game experience might have a moderating effect on scores. Much...such as spatial ability. The effects of computer or video game experience on work sample scores have not been systematically investigated. The purpose...of this study was to evaluate the incremental validity of prior video game experience over that of general aptitude as a predictor of work sample test
ERIC Educational Resources Information Center
Xi, Xiaoming; Mollaun, Pam
2009-01-01
This study investigated the scoring of the Test of English as a Foreign Language[TM] Internet-based Test (TOEFL iBT[TM]) Speaking section by bilingual or multilingual speakers of English and 1 or more Indian languages. We explored the extent to which raters from India, after being trained and certified, were able to score the Speaking section for…
ERIC Educational Resources Information Center
McLean, James E.; Kaufman, Alan S.
1995-01-01
The six Holland-based Interest Scale scores yielded by the Harrington-O'Shea Career Decision-Making System (CDM) (T. Harrington and A. O'Shea, 1982) were related to sex, race, and performance on the Kaufman Adolescent and Adult Intelligence Test for 254 adolescents and young adults. CDM scores did not relate to most of the variables studied, and…
Performance-Based Testing and Success in Naval Advanced Flight Training.
1992-11-01
dual-task ADHT reached significance as measured by the increase in R2 . [4] At this point, the number of variables had been pared to 15. We subjected...these 15 variables to further analysis. First, we desired to construct a composite score based on the eight ADHT variables. One feasible composite...score was arrived at by utilizing only the dual-task ADHT test in the following manner: ADHTCS =.20 * ZADHT6 - .50 * ZADHT5 - .10 * ZADHT7 - .20
A Bad Idea: National Standards Based on Test Scores
ERIC Educational Resources Information Center
Baker, Keith
2010-01-01
The justification for national standards is that test scores predict a nation's future economic success. There is no evidence that supports this assumption. There is evidence that it is wrong. For more than half a century, reformers have been trying to fix our schools with little success. The obvious conclusion is that something that can't be…
Can a Two-Question Test Be Reliable and Valid for Predicting Academic Outcomes?
ERIC Educational Resources Information Center
Bridgeman, Brent
2016-01-01
Scores on essay-based assessments that are part of standardized admissions tests are typically given relatively little weight in admissions decisions compared to the weight given to scores from multiple-choice assessments. Evidence is presented to suggest that more weight should be given to these assessments. The reliability of the writing scores…
ERIC Educational Resources Information Center
Rhea, David M.
2017-01-01
Many honors programs make admissions decisions based on student high school GPA and a standardized test score. However, McKay argued that standardized test scores can be a barrier to honors program participation, particularly for minority students. Minority students, particularly Hispanic and African American students, are apt to have lower…
ERIC Educational Resources Information Center
Demir, Metin
2015-01-01
This study predicts the number of correct answers given by pre-service classroom teachers in Civil Servant Recruitment Examination's (CSRE) educational sciences test based on their high school grade point averages, university entrance scores, and grades (mid-term and final exams) from their undergraduate educational courses. This study was…
Black-White Achievement Gap and Family Wealth
ERIC Educational Resources Information Center
Yeung, W. Jean; Conley, Dalton
2008-01-01
This article examines the extent to which family wealth affects the Black-White test score gap for young children based on data from the Panel Study of Income Dynamics (aged 3-12). This study found little evidence that wealth mediated the Black-White test scores gaps, which were eliminated when child and family demographic covariates were held…
Item Purification Does Not Always Improve DIF Detection: A Counterexample with Angoff's Delta Plot
ERIC Educational Resources Information Center
Magis, David; Facon, Bruno
2013-01-01
Item purification is an iterative process that is often advocated as improving the identification of items affected by differential item functioning (DIF). With test-score-based DIF detection methods, item purification iteratively removes the items currently flagged as DIF from the test scores to get purified sets of items, unaffected by DIF. The…
Can Tracking Raise the Test Scores of High-Ability Minority Students?
ERIC Educational Resources Information Center
Card, David; Giuliano, Laura
2016-01-01
We evaluate a tracking program in a large urban district where schools with at least one gifted fourth grader create a separate "gifted/high achiever" classroom. Most seats are filled by non-gifted high achievers, ranked by previous-year test scores. We study the program's effects on the high achievers using (1) a rank-based regression…
ERIC Educational Resources Information Center
Rizzo, Monica Ellen
2012-01-01
Most American colleges and universities require standardized entrance exams when making admissions decisions. Scores on these exams help determine if, when and where students will be allowed to pursue higher education. These scores are also used to determine eligibility for merit based financial aid. This testing persists even though half of the…
Developing Local Oral Reading Fluency Cut Scores for Predicting High-Stakes Test Performance
ERIC Educational Resources Information Center
Grapin, Sally L.; Kranzler, John H.; Waldron, Nancy; Joyce-Beaulieu, Diana; Algina, James
2017-01-01
This study evaluated the classification accuracy of a second grade oral reading fluency curriculum-based measure (R-CBM) in predicting third grade state test performance. It also compared the long-term classification accuracy of local and publisher-recommended R-CBM cut scores. Participants were 266 students who were divided into a calibration…
Moriyama, Yasushi; Yoshino, Aihide; Muramatsu, Taro; Mimura, Masaru
2017-05-01
The supermarket task, which is included in the Japanese version of the Rapid Dementia Screening Test, requires the quick (1 min) generation of words for things that can be bought in a supermarket. Cluster size and switches are investigated during this task. We investigated how the severity of dementia related to cluster size and switches on the supermarket task in patients with Alzheimer's disease. We administered the Japanese version of the Rapid Dementia Screening Test to 250 patients with very mild to severe Alzheimer's disease and to 49 healthy volunteers. Patients had Mini-Mental State Examination scores from 12 to 26 and Clinical Dementia Rating scale scores from 0.5 to 3. Patients were divided into four groups based on their Clinical Dementia Rating score (0.5, 1, 2, 3). We performed statistical analyses between the four groups and control subjects based on cluster size and switch scores on the supermarket task. The score for cluster size and switches deteriorated according to the severity of dementia. Moreover, for subjects with a Clinical Dementia Rating score of 0.5, cluster size was impaired, but switches were intact. Our findings indicate that the scores for cluster size and switches on the supermarket task may be useful for detecting the severity of symptoms of dementia in patients with Alzheimer's disease. © 2016 The Authors. Psychogeriatrics © 2016 Japanese Psychogeriatric Society.
Training improves laparoscopic tasks performance and decreases operator workload.
Hu, Jesse S L; Lu, Jirong; Tan, Wee Boon; Lomanto, Davide
2016-05-01
It has been postulated that increased operator workload during task performance may increase fatigue and surgical errors. The National Aeronautics and Space Administration-Task Load Index (NASA-TLX) is a validated tool for self-assessment for workload. Our study aims to assess the relationship of workload and performance of novices in simulated laparoscopic tasks of different complexity levels before and after training. Forty-seven novices without prior laparoscopic experience were recruited in a trial to investigate whether training improves task performance as well as mental workload. The participants were tested on three standard tasks (ring transfer, precision cutting and intracorporeal suturing) in increasing complexity based on the Fundamentals of Laparoscopic Surgery (FLS) curriculum. Following a period of training and rest, participants were tested again. Test scores were computed from time taken and time penalties for precision errors. Test scores and NASA-TLX scores were recorded pre- and post-training and analysed using paired t tests. One-way repeated measures ANOVA was used to analyse differences in NASA-TLX scores between the three tasks. NASA-TLX score was lowest with ring transfer and highest with intracorporeal suturing. This was statistically significant in both pre-training (p < 0.001) and post-training (p < 0.001). NASA-TLX scores mirror the changes in test scores for the three tasks. Workload scores decreased significantly after training for all three tasks (ring transfer = 2.93, p < 0.001, precision cutting = 3.74, p < 0.001, intracorporeal suturing = 2.98, p < 0.001). NASA-TLX score is an accurate reflection of the complexity of simulated laparoscopic tasks in the FLS curriculum. This also correlates with the relationship of test scores between the three tasks. Simulation training improves both performance score and workload score across the tasks.
Test-Based Accountability: The Promise and the Perils
ERIC Educational Resources Information Center
Loveless, Tom
2005-01-01
In the early 1990s, states began establishing standards in academic subjects backed by test-based accountability systems to see that the standards were met. Incentives were implemented for schools and students based on pupil test scores. These early accountability systems paved the way for passage of landmark federal legislation, the No Child Left…
Student Moon Observations and Spatial-Scientific Reasoning
NASA Astrophysics Data System (ADS)
Cole, Merryn; Wilhelm, Jennifer; Yang, Hongwei
2015-07-01
Relationships between sixth grade students' moon journaling and students' spatial-scientific reasoning after implementation of an Earth/Space unit were examined. Teachers used the project-based Realistic Explorations in Astronomical Learning curriculum. We used a regression model to analyze the relationship between the students' Lunar Phases Concept Inventory (LPCI) post-test score variables and several predictors, including moon journal score, number of moon journal entries, student gender, teacher experience, and pre-test score. The model shows that students who performed better on moon journals, both in terms of overall score and number of entries, tended to score higher on the LPCI. For every 1 point increase in the overall moon journal score, participants scored 0.18 points (out of 20) or nearly 1% point higher on the LPCI post-test when holding constant the effects of the other two predictors. Similarly, students who increased their scores by 1 point in the overall moon journal score scored approximately 1% higher in the Periodic Patterns (PP) and Geometric Spatial Visualization (GSV) domains of the LPCI. Also, student gender and teacher experience were shown to be significant predictors of post-GSV scores on the LPCI in addition to the pre-test scores, overall moon journal score, and number of entries that were also significant predictors on the LPCI overall score and the PP domain. This study is unique in the purposeful link created between student moon observations and spatial skills. The use of moon journals distinguishes this study further by fostering scientific observation along with skills from across science, technology, engineering, and mathematics disciplines.
NASA Astrophysics Data System (ADS)
Chamrat, Suthida
2018-01-01
The standard evaluation of Thai education relies excessively on the Ordinary National Educational Test, widely known as O-NET. However, a focus on O-Net results can lead to unsatisfactory teaching practices, especially in science subjects. Among the negative consequences, is that schools frequently engage in "cramming" practices in order to elevate their O-NET scores. Higher education, which is committed to generating and applying knowledge by socially engaged scholars, needs to take account of this situation. This research article portrays the collaboration between the faculty of education at Chiang Mai University and an educational service area to develop the model of science camp. The activities designed for the Science Camp Model were based on the Tinkering and Maker Movement. Specifically, the Science Camp Model was designed to enhance the conceptualization of electricity for Middle School Students in order to meet the standard evaluation of the Ordinary National Educational Test. The hands-on activities consisted of 5 modules which were simple electrical circuits, paper circuits, electrical measurement roleplay motor art robots and Force from Motor. The data were collected by 11 items of Electricity Socratic-based Test adapted from cumulative published O-NET tests focused on the concept of electricity concept. The qualitative data were also collected virtually via Flinga.com. The results indicated that students after participating in 5modules of science camp based on the Maker Movement and tinkering activity developed average percentage of test scores from 33.64 to 65.45. Gain score analysis using dependent t-test compared pretest and posttest mean scores. The p value was found to be statistically significant (less than 0.001). The posttest had a considerably higher mean score compared with the pretest. Qualitative data also indicated that students could explain the main concepts of electrical circuits, and the transformation of electrical energy to mechanical energy. The schools were satisfied, and expressed greater confidence in the Science Camp Model as an alternative way to improve Standard Evaluation of Ordinary National Educational Test.
Embedded measures of performance validity using verbal fluency tests in a clinical sample.
Sugarman, Michael A; Axelrod, Bradley N
2015-01-01
The objective of this study was to determine to what extent verbal fluency measures can be used as performance validity indicators during neuropsychological evaluation. Participants were clinically referred for neuropsychological evaluation in an urban-based Veteran's Affairs hospital. Participants were placed into 2 groups based on their objectively evaluated effort on performance validity tests (PVTs). Individuals who exhibited credible performance (n = 431) failed 0 PVTs, and those with poor effort (n = 192) failed 2 or more PVTs. All participants completed the Controlled Oral Word Association Test (COWAT) and Animals verbal fluency measures. We evaluated how well verbal fluency scores could discriminate between the 2 groups. Raw scores and T scores for Animals discriminated between the credible performance and poor-effort groups with 90% specificity and greater than 40% sensitivity. COWAT scores had lower sensitivity for detecting poor effort. A combination of FAS and Animals scores into logistic regression models yielded acceptable group classification, with 90% specificity and greater than 44% sensitivity. Verbal fluency measures can yield adequate detection of poor effort during neuropsychological evaluation. We provide suggested cut points and logistic regression models for predicting the probability of poor effort in our clinical setting and offer suggested cutoff scores to optimize sensitivity and specificity.
Hellemann, G S; Green, M F; Kern, R S; Sitarenios, G; Nuechterlein, K H
2017-10-01
Measures of social cognition are increasingly being applied to psychopathology, including studies of schizophrenia and other psychotic disorders. Tests of social cognition present unique challenges for international adaptations. The Mayer-Salovey-Caruso Emotional Intelligence Test, Managing Emotions Branch (MSCEIT-ME) is a commonly-used social cognition test that involves the evaluation of social scenarios presented in vignettes. This paper presents evaluations of translations of this test in six different languages based on representative samples from the relevant countries. The goal was to identify items from the MSCEIT-ME that show different response patterns across countries using indices of discrepancy and content validity criteria. An international version of the MSCEIT-ME scoring was developed that excludes items that showed undesirable properties across countries. We then confirmed that this new version had better performance (i.e. less discrepancy across regions) in international samples than the version based on the original norms. Additionally, it provides scores that are comparable to ratings based on local norms. This paper shows that it is possible to adapt complex social cognitive tasks so they can provide valid data across different cultural contexts.
Active-learning versus teacher-centered instruction for learning acids and bases
NASA Astrophysics Data System (ADS)
Acar Sesen, Burcin; Tarhan, Leman
2011-07-01
Background and purpose: Active-learning as a student-centered learning process has begun to take more interest in constructing scientific knowledge. For this reason, this study aimed to investigate the effectiveness of active-learning implementation on high-school students' understanding of 'acids and bases'. Sample The sample of this study was 45 high-school students (average age 17 years) from two different classes, which were randomly assigned to the experimental (n = 21) and control groups (n = 25), in a high school in Turkey. Design and methods A pre-test consisting of 25 items was applied to both experimental and control groups before the treatment in order to identify student prerequisite knowledge about their proficiency for learning 'acids and bases'. A one-way analysis of variance (ANOVA) was conducted to compare the pre-test scores for groups and no significant difference was found between experimental (ME = 40.14) and control groups (MC = 41.92) in terms of mean scores (F 1,43 = 2.66, p > 0.05). The experimental group was taught using an active-learning curriculum developed by the authors and the control group was taught using traditional course content based on teacher-centered instruction. After the implementation, 'Acids and Bases Achievement Test' scores were collected for both groups. Results ANOVA results showed that students' 'Acids and Bases Achievement Test' post-test scores differed significantly in terms of groups (F 1,43 = 102.53; p < 0.05). Additionally, in this study 54 misconceptions, 14 of them not reported in the literature before, were observed in the following terms: 'acid and base theories'; 'metal and non-metal oxides'; 'acid and base strengths'; 'neutralization'; 'pH and pOH'; 'hydrolysis'; 'acid-base equilibrium'; 'buffers'; 'indicators'; and 'titration'. Based on the achievement test and individual interview results, it was found that high-school students in the experimental group had fewer misconceptions and understood the concepts more meaningfully than students in control group. Conclusion The study revealed that active-learning implementation is more effective at improving students' learning achievement and preventing misconceptions.
Development and Validation of a Bilingual Stroke Preparedness Assessment Instrument.
Skolarus, Lesli E; Mazor, Kathleen M; Sánchez, Brisa N; Dome, Mackenzie; Biller, José; Morgenstern, Lewis B
2017-04-01
Stroke preparedness interventions are limited by the lack of psychometrically sound intermediate end points. We sought to develop and assess the reliability and validity of the video-Stroke Action Test (video-STAT) an English and a Spanish video-based test to assess people's ability to recognize and react to stroke signs. Video-STAT development and testing was divided into 4 phases: (1) video development and community-generated response options, (2) pilot testing in community health centers, (3) administration in a national sample, bilingual sample, and neurologist sample, and (4) administration before and after a stroke preparedness intervention. The final version of the video-STAT included 8 videos: 4 acute stroke/emergency, 2 prior stroke/nonemergency, 1 nonstroke/emergency, and 1 nonstroke/nonemergency. Acute stroke recognition and action response were queried after each vignette. Video-STAT scoring was based on the acute stroke vignettes only (score range 0-12 best). The national sample consisted of 598 participants, 438 who took the video-STAT in English and 160 who took the video-STAT in Spanish. There was adequate internal consistency (Cronbach α=0.72). The average video-STAT score was 5.6 (SD=3.6), whereas the average neurologist score was 11.4 (SD=1.3). There was no difference in video-STAT scores between the 116 bilingual video-STAT participants who took the video-STAT in English or Spanish. Compared with baseline scores, the video-STAT scores increased after a stroke preparedness intervention (6.2 versus 8.9, P <0.01) among a sample of 101 black adults and youth. The video-STAT yields reliable scores that seem to be valid measures of stroke preparedness. © 2017 American Heart Association, Inc.
Development and Validation of a Bilingual Stroke Preparedness Assessment Instrument
Skolarus, Lesli E.; Mazor, Kathleen M.; Sánchez, Brisa N.; Dome, Mackenzie; Biller, José; Morgenstern, Lewis B.
2017-01-01
Background and Purpose Stroke preparedness interventions are limited by the lack of psychometrically sound intermediate endpoints. We sought to develop and assess the reliability and validity of the video-Stroke Action Test, video-STAT, an English and Spanish video-based test to assess people’s ability to recognize and react to stroke signs. Methods Video-STAT development and testing was divided into four phases: 1) video development and community-generated response options; 2) pilot testing in community health centers; 3) administration in a national sample, bilingual sample and neurologist sample; and 4) administration before and after a stroke preparedness intervention. Results The final version of the video-STAT included 8 videos: 4 acute stroke/emergency, 2 prior stroke/non-emergency, 1 non-stroke/emergency, 1 non-stroke/non-emergency. Acute stroke recognition and action response were queried after each vignette. Video-STAT scoring was based on the acute stroke vignettes only (score range 0–12 best). The national sample consisted of 598 participants, 438 who took the video-STAT in English and 160 who took the video-STAT in Spanish. There was adequate internal consistency (Cronbach’s alpha=0.72). The average video-STAT score was 5.6 (sd=3.6) while the average neurologist score was 11.4 (sd=1.3). There was no difference in video-STAT scores between the 116 bilingual video-STAT participants who took the video-STAT in English or Spanish. Compared to baseline scores, the video-STAT scores increased following a stroke preparedness intervention (6.2 vs. 8.9, p<0.01) among a sample of 101 African American adults and youth. Conclusion The video-STAT yields reliable scores that appear to be valid measures of stroke preparedness. PMID:28250199
Crockford, Christopher; Newton, Judith; Lonergan, Katie; Madden, Caoifa; Mays, Iain; O'Sullivan, Meabhdh; Costello, Emmet; Pinto-Grau, Marta; Vajda, Alice; Heverin, Mark; Pender, Niall; Al-Chalabi, Ammar; Hardiman, Orla; Abrahams, Sharon
2018-02-01
Cognitive impairment affects approximately 50% of people with amyotrophic lateral sclerosis (ALS). Research has indicated that impairment may worsen with disease progression. The Edinburgh Cognitive and Behavioural ALS Screen (ECAS) was designed to measure neuropsychological functioning in ALS, with its alternate forms (ECAS-A, B, and C) allowing for serial assessment over time. The aim of the present study was to establish reliable change scores for the alternate forms of the ECAS, and to explore practice effects and test-retest reliability of the ECAS's alternate forms. Eighty healthy participants were recruited, with 57 completing two and 51 completing three assessments. Participants were administered alternate versions of the ECAS serially (A-B-C) at four-month intervals. Intra-class correlation analysis was employed to explore test-retest reliability, while analysis of variance was used to examine the presence of practice effects. Reliable change indices (RCI) and regression-based methods were utilized to establish change scores for the ECAS alternate forms. Test-retest reliability was excellent for ALS Specific, ALS Non-Specific, and ECAS Total scores of the combined ECAS A, B, and C (all > .90). No significant practice effects were observed over the three testing sessions. RCI and regression-based methods produced similar change scores. The alternate forms of the ECAS possess excellent test-retest reliability in a healthy control sample, with no significant practice effects. The use of conservative RCI scores is recommended. Therefore, a change of ≥8, ≥4, and ≥9 for ALS Specific, ALS Non-Specific, and ECAS Total score is required for reliable change.
Lee, Sung-Jae; Brooks, Ronald; Bolan, Robert K.; Flynn, Risa
2013-01-01
Men who have sex with men (MSM) in the United States represent a vulnerable population with lower rates of HIV testing. There are various specific attributes of HIV testing that may impact willingness to test (WTT) for HIV. Identifying specific attributes influencing patients’ decisions around WTT for HIV is critical to ensure improved HIV testing uptake. This study examined WTT for HIV by using conjoint analysis, an innovative method for systematically estimating consumer preferences across discrete attributes. WTT for HIV was assessed across eight hypothetical HIV testing scenarios varying across seven dichotomous attributes: location (home vs. clinic), price (free vs. $50), sample collection (finger prick vs. blood), timeliness of results (immediate vs. 1–2 weeks), privacy (anonymous vs. confidential), results given (by phone vs. in-person), and type of counseling (brochure vs. in-person). Seventy-five MSM were recruited from a community based organization providing HIV testing services in Los Angeles to participate in conjoint analysis. WTT for HIV score was based on a 100-point scale. Scores ranged from 32.2 to 80.3 for eight hypothetical HIV testing scenarios. Price of HIV testing (free vs. $50) had the highest impact on WTT (impact score=31.4, SD=29.2, p<.0001), followed by timeliness of results (immediate vs. 1–2 weeks) (impact score=13.9, SD=19.9, p=<.0001) and testing location (home vs. clinic) (impact score=10.3, SD=22.8, p=.0002). Impacts of other HIV testing attributes were not significant. Conjoint analysis method enabled direct assessment of HIV testing preferences and identified specific attributes that significantly impact WTT for HIV among MSM. This method provided empirical evidence to support the potential uptake of the newly FDA-approved over-the-counter HIV home-test kit with immediate results, with cautionary note on the cost of the kit. PMID:23651439
NASA Astrophysics Data System (ADS)
Zaidah, A.; Sukarmin; Sunarno, W.
2018-04-01
This study aimed to determine the influence of a physics-based scientific learning to increase student’s critical thinking skill. This type of this research was quantitative research with taking the conclusion through statistical analysis. This research was carried out in MA (Senior High School) Mu'allimat NW Pancor in the second semester in the academic year of 2016/2017 with all students of XI class. The sampling is done by using technique purposive sampling where the class was taken from XI 6 class. Based on the result of descriptive analysis, it was obtained an average pre-test score of 49.17 and an average post-test score of 82.43. Also, the results showed that the average score was gained of 0.67 with a medium category. Based on the inferential analysis showed the value of t = 22.559 while the ttable in significance level of 5% was 2.04. Thus, t > the ttable from Ha is accepted. Therefore, the pre-test and posttest were different significantly when the students used scientific-based learning. The result showed that a physics-based scientific learning has influenced to increase the student’s critical thinking skill.
Simulation-based training in brain death determination.
MacDougall, Benjamin J; Robinson, Jennifer D; Kappus, Liana; Sudikoff, Stephanie N; Greer, David M
2014-12-01
Despite straightforward guidelines on brain death determination by the American Academy of Neurology (AAN), substantial practice variability exists internationally, between states, and among institutions. We created a simulation-based training course on proper determination based on the AAN practice parameters to address and assess knowledge and practice gaps at our institution. Our intervention consisted of a didactic course and a simulation exercise, and was bookended by before and after multiple-choice tests. The 40-min didactic course, including a video demonstration, covered all aspects of the brain death examination. Simulation sessions utilized a SimMan 3G manikin and involved a complete examination, including an apnea test. Possible confounders and signs incompatible with brain death were embedded throughout. Facilitators evaluated performance with a 26-point checklist based on the most recent AAN guidelines. A senior neurologist conducted all aspects of the course, including the didactic session, simulation, and debriefing session. Ninety physicians from multiple specialties have participated in the didactic session, 38 of whom have completed the simulation. Pre-test scores were poor (41.4 %), with attendings scoring higher than residents (46.6 vs. 40.4 %, p = 0.07), and neurologists and neurosurgeons significantly outperforming other specialists (53.9 vs. 38.9 %, p = 0.003). Post-test scores (73.3 %) were notably higher than pre-test scores (45.4 %). Participant feedback has been uniformly positive. Baseline knowledge of brain death determination among providers was low but improved greatly after the course. Our intervention represents an effective model that can be replicated at other institutions to train clinicians in the determination of brain death according to evidence-based guidelines.
Validation of the tablet-administered Brief Assessment of Cognition (BAC App).
Atkins, Alexandra S; Tseng, Tina; Vaughan, Adam; Twamley, Elizabeth W; Harvey, Philip; Patterson, Thomas; Narasimhan, Meera; Keefe, Richard S E
2017-03-01
Computerized tests benefit from automated scoring procedures and standardized administration instructions. These methods can reduce the potential for rater error. However, especially in patients with severe mental illnesses, the equivalency of traditional and tablet-based tests cannot be assumed. The Brief Assessment of Cognition in Schizophrenia (BACS) is a pen-and-paper cognitive assessment tool that has been used in hundreds of research studies and clinical trials, and has normative data available for generating age- and gender-corrected standardized scores. A tablet-based version of the BACS called the BAC App has been developed. This study compared performance on the BACS and the BAC App in patients with schizophrenia and healthy controls. Test equivalency was assessed, and the applicability of paper-based normative data was evaluated. Results demonstrated the distributions of standardized composite scores for the tablet-based BAC App and the pen-and-paper BACS were indistinguishable, and the between-methods mean differences were not statistically significant. The discrimination between patients and controls was similarly robust. The between-methods correlations for individual measures in patients were r>0.70 for most subtests. When data from the Token Motor Test was omitted, the between-methods correlation of composite scores was r=0.88 (df=48; p<0.001) in healthy controls and r=0.89 (df=46; p<0.001) in patients, consistent with the test-retest reliability of each measure. Taken together, results indicate that the tablet-based BAC App generates results consistent with the traditional pen-and-paper BACS, and support the notion that the BAC App is appropriate for use in clinical trials and clinical practice. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Value Added Based on Educational Positions in Dutch Secondary Education
ERIC Educational Resources Information Center
Timmermans, Anneke C.; Bosker, Roel J.; de Wolf, Inge F.; Doolaard, Simone; van der Werf, Margaretha P. C.
2014-01-01
Estimating added value as an indicator of school effectiveness in the context of educational accountability often occurs using test or examination scores of students. This study investigates the possibilities for using scores of educational positions as an alternative indicator. A number of advantages of a value added indicator based on…
Comparison of Reliability Measures under Factor Analysis and Item Response Theory
ERIC Educational Resources Information Center
Cheng, Ying; Yuan, Ke-Hai; Liu, Cheng
2012-01-01
Reliability of test scores is one of the most pervasive psychometric concepts in measurement. Reliability coefficients based on a unifactor model for continuous indicators include maximal reliability rho and an unweighted sum score-based omega, among many others. With increasing popularity of item response theory, a parallel reliability measure pi…
Qu, Long; Guennel, Tobias; Marshall, Scott L
2013-12-01
Following the rapid development of genome-scale genotyping technologies, genetic association mapping has become a popular tool to detect genomic regions responsible for certain (disease) phenotypes, especially in early-phase pharmacogenomic studies with limited sample size. In response to such applications, a good association test needs to be (1) applicable to a wide range of possible genetic models, including, but not limited to, the presence of gene-by-environment or gene-by-gene interactions and non-linearity of a group of marker effects, (2) accurate in small samples, fast to compute on the genomic scale, and amenable to large scale multiple testing corrections, and (3) reasonably powerful to locate causal genomic regions. The kernel machine method represented in linear mixed models provides a viable solution by transforming the problem into testing the nullity of variance components. In this study, we consider score-based tests by choosing a statistic linear in the score function. When the model under the null hypothesis has only one error variance parameter, our test is exact in finite samples. When the null model has more than one variance parameter, we develop a new moment-based approximation that performs well in simulations. Through simulations and analysis of real data, we demonstrate that the new test possesses most of the aforementioned characteristics, especially when compared to existing quadratic score tests or restricted likelihood ratio tests. © 2013, The International Biometric Society.
Structure refinement of membrane proteins via molecular dynamics simulations.
Dutagaci, Bercem; Heo, Lim; Feig, Michael
2018-07-01
A refinement protocol based on physics-based techniques established for water soluble proteins is tested for membrane protein structures. Initial structures were generated by homology modeling and sampled via molecular dynamics simulations in explicit lipid bilayer and aqueous solvent systems. Snapshots from the simulations were selected based on scoring with either knowledge-based or implicit membrane-based scoring functions and averaged to obtain refined models. The protocol resulted in consistent and significant refinement of the membrane protein structures similar to the performance of refinement methods for soluble proteins. Refinement success was similar between sampling in the presence of lipid bilayers and aqueous solvent but the presence of lipid bilayers may benefit the improvement of lipid-facing residues. Scoring with knowledge-based functions (DFIRE and RWplus) was found to be as good as scoring using implicit membrane-based scoring functions suggesting that differences in internal packing is more important than orientations relative to the membrane during the refinement of membrane protein homology models. © 2018 Wiley Periodicals, Inc.
Prediction of true test scores from observed item scores and ancillary data.
Haberman, Shelby J; Yao, Lili; Sinharay, Sandip
2015-05-01
In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®). In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. © 2015 The British Psychological Society.
ERIC Educational Resources Information Center
Lin, Yu-Shih; Chang, Yi-Chun; Liew, Keng-Hou; Chu, Chih-Ping
2016-01-01
Computerised testing and diagnostics are critical challenges within an e-learning environment, where the learners can assess their learning performance through tests. However, a test result based on only a single score is insufficient information to provide a full picture of learning performance. In addition, because test results implicitly…
Using needs-based frameworks for evaluating new technologies: an application to genetic tests.
Rogowski, Wolf H; Schleidgen, Sebastian
2015-02-01
Given the multitude of newly available genetic tests in the face of limited healthcare budgets, the European Society of Human Genetics assessed how genetic services can be prioritized fairly. Using (health) benefit maximizing frameworks for this purpose has been criticized on the grounds that rather than maximization, fairness requires meeting claims (e.g. based on medical need) equitably. This study develops a prioritization score for genetic tests to facilitate equitable allocation based on need-based claims. It includes attributes representing health need associated with hereditary conditions (severity and progression), a genetic service's suitability to alleviate need (evidence of benefit and likelihood of positive result) and costs to meet the needs. A case study for measuring the attributes is provided and a suggestion is made how need-based claims can be quantified in a priority function. Attribute weights can be informed by data from discrete-choice experiments. Further work is needed to measure the attributes across the multitude of genetic tests and to determine appropriate weights. The priority score is most likely to be considered acceptable if developed within a decision process which meets criteria of procedural fairness and if the priority score is interpreted as "strength of recommendation" rather than a fixed cut-off value. Copyright © 2014. Published by Elsevier Ireland Ltd.
Examining Exam Reviews: A Comparison of Exam Scores and Attitudes
ERIC Educational Resources Information Center
Hackathorn, Jana; Cornell, Kathryn; Garczynski, Amy M.; Solomon, Erin D.; Blankmeyer, Katheryn E.; Tennial, Rachel E.
2012-01-01
Instructors commonly use exam reviews to help students prepare for exams and to increase student success. The current study compared the effects of traditional, trivia, and practice test-based exam reviews on actual exam scores, as well as students' attitudes toward each review. Findings suggested that students' exam scores were significantly…
An Evaluation of the IntelliMetric[SM] Essay Scoring System
ERIC Educational Resources Information Center
Rudner, Lawrence M.; Garcia, Veronica; Welch, Catherine
2006-01-01
This report provides a two-part evaluation of the IntelliMetric[SM] automated essay scoring system based on its performance scoring essays from the Analytic Writing Assessment of the Graduate Management Admission Test[TM] (GMAT[TM]). The IntelliMetric system performance is first compared to that of individual human raters, a Bayesian system…
Impact of Measurement Error on Statistical Power: Review of an Old Paradox.
ERIC Educational Resources Information Center
Williams, Richard H.; And Others
1995-01-01
The paradox that a Student t-test based on pretest-posttest differences can attain its greatest power when the difference score reliability is zero was explained by demonstrating that power is not a mathematical function of reliability unless either true score variance or error score variance is constant. (SLD)
Team-Based Learning in a Community Health Nursing Course: Improving Academic Outcomes.
Miles, Jane M; Larson, Kim L; Swanson, Melvin
2017-07-01
Population health concepts, such as upstream thinking, present challenging ideas to undergraduate nursing students grounded in an acute care orientation. The purpose of this study was to describe how team-based learning (TBL) influenced academic outcomes in a community health nursing course. A descriptive correlational design examined the relationship among student scores on individual readiness assurance tests (iRATs), team readiness assurance tests (tRATs), and the final examination. The sample included 221 nursing students who had completed the course. A large positive correlation was found between iRAT and final examination scores. For all students, the mean tRAT score was higher than the mean iRAT score. A moderate positive correlation existed between tRAT and final examination scores. The study contributes to understanding the effects of TBL pedagogy on student academic outcomes in nursing education. TBL is a valuable teaching method in a course requiring the application of challenging concepts. [J Nurs Educ. 2017;56(7):425-429.]. Copyright 2017, SLACK Incorporated.
NASA Astrophysics Data System (ADS)
Noble, Clifford Elliott, II
2002-09-01
The problem. The purpose of this study was to investigate the ability of three single-task instruments---(a) the Test of English as a Foreign Language, (b) the Aviation Test of Spoken English, and (c) the Single Manual-Tracking Test---and three dual-task instruments---(a) the Concurrent Manual-Tracking and Communication Test, (b) the Certified Flight Instructor's Test, and (c) the Simulation-Based English Test---to predict the language performance of 10 Chinese student pilots speaking English as a second language when operating single-engine and multiengine aircraft within American airspace. Method. This research implemented a correlational design to investigate the ability of the six described instruments to predict the mean score of the criterion evaluation, which was the Examiner's Test. This test assessed the oral communication skill of student pilots on the flight portion of the terminal checkride in the Piper Cadet, Piper Seminole, and Beechcraft King Air airplanes. Results. Data from the Single Manual-Tracking Test, as well as the Concurrent Manual-Tracking and Communication Test, were discarded due to performance ceiling effects. Hypothesis 1, which stated that the average correlation between the mean scores of the dual-task evaluations and that of the Examiner's Test would predict the mean score of the criterion evaluation with a greater degree of accuracy than that of single-task evaluations, was not supported. Hypothesis 2, which stated that the correlation between the mean scores of the participants on the Simulation-Based English Test and the Examiner's Test would predict the mean score of the criterion evaluation with a greater degree of accuracy than that of all single- and dual-task evaluations, was also not supported. The findings suggest that single- and dual-task assessments administered after initial flight training are equivalent predictors of language performance when piloting single-engine and multiengine aircraft.
Park, Hyung-Ran; Kim, Chun-Ja; Park, Jee-Won; Park, Eunyoung
2015-01-01
The purpose of this study was to examine the effectiveness of team-based learning (a well-recognized learning and teaching strategy), applied in a health assessment subject, on nursing students' perceived teamwork (team-efficacy and team skills) and academic performance (individual and team readiness assurance tests, and examination scores). A prospective, one-group, pre- and post-test design enrolled a convenience sample of 74 second-year nursing students at a university in Suwon, Korea. Team-based learning was applied in a 2-credit health assessment subject over a 16-week semester. All students received written material one week before each class for readiness preparation. After administering individual- and team-readiness assurance tests consecutively, the subject instructor gave immediate feedback and delivered a mini-lecture to the students. Finally, students carried out skill based application exercises. The findings showed significant improvements in the mean scores of students' perceived teamwork after the introduction of team-based learning. In addition, team-efficacy was associated with team-adaptability skills and team-interpersonal skills. Regarding academic performance, team readiness assurance tests were significantly higher than individual readiness assurance tests over time. Individual readiness assurance tests were significantly related with examination scores, while team readiness assurance tests were correlated with team-efficacy and team-interpersonal skills. The application of team-based learning in a health assessment subject can enhance students' perceived teamwork and academic performance. This finding suggests that team-based learning may be an effective learning and teaching strategy for improving team-work of nursing students, who need to collaborate and effectively communicate with health care providers to improve patients' health.
Schoenberg, Mike R; Rum, Ruba S
2017-11-01
Rapid, clear and efficient communication of neuropsychological results is essential to benefit patient care. Errors in communication are a lead cause of medical errors; nevertheless, there remains a lack of consistency in how neuropsychological scores are communicated. A major limitation in the communication of neuropsychological results is the inconsistent use of qualitative descriptors for standardized test scores and the use of vague terminology. PubMed search from 1 Jan 2007 to 1 Aug 2016 to identify guidelines or consensus statements for the description and reporting of qualitative terms to communicate neuropsychological test scores was conducted. The review found the use of confusing and overlapping terms to describe various ranges of percentile standardized test scores. In response, we propose a simplified set of qualitative descriptors for normalized test scores (Q-Simple) as a means to reduce errors in communicating test results. The Q-Simple qualitative terms are: 'very superior', 'superior', 'high average', 'average', 'low average', 'borderline' and 'abnormal/impaired'. A case example illustrates the proposed Q-Simple qualitative classification system to communicate neuropsychological results for neurosurgical planning. The Q-Simple qualitative descriptor system is aimed as a means to improve and standardize communication of standardized neuropsychological test scores. Research are needed to further evaluate neuropsychological communication errors. Conveying the clinical implications of neuropsychological results in a manner that minimizes risk for communication errors is a quintessential component of evidence-based practice. Copyright © 2017 Elsevier B.V. All rights reserved.
Development of a Multi-Biomarker Disease Activity Test for Rheumatoid Arthritis
Shen, Yijing; Ramanujan, Saroja; Knowlton, Nicholas; Swan, Kathryn A.; Turner, Mary; Sutton, Chris; Smith, Dustin R.; Haney, Douglas J.; Chernoff, David; Hesterberg, Lyndal K.; Carulli, John P.; Taylor, Peter C.; Shadick, Nancy A.; Weinblatt, Michael E.; Curtis, Jeffrey R.
2013-01-01
Background Disease activity measurement is a key component of rheumatoid arthritis (RA) management. Biomarkers that capture the complex and heterogeneous biology of RA have the potential to complement clinical disease activity assessment. Objectives To develop a multi-biomarker disease activity (MBDA) test for rheumatoid arthritis. Methods Candidate serum protein biomarkers were selected from extensive literature screens, bioinformatics databases, mRNA expression and protein microarray data. Quantitative assays were identified and optimized for measuring candidate biomarkers in RA patient sera. Biomarkers with qualifying assays were prioritized in a series of studies based on their correlations to RA clinical disease activity (e.g. the Disease Activity Score 28-C-Reactive Protein [DAS28-CRP], a validated metric commonly used in clinical trials) and their contributions to multivariate models. Prioritized biomarkers were used to train an algorithm to measure disease activity, assessed by correlation to DAS and area under the receiver operating characteristic curve for classification of low vs. moderate/high disease activity. The effect of comorbidities on the MBDA score was evaluated using linear models with adjustment for multiple hypothesis testing. Results 130 candidate biomarkers were tested in feasibility studies and 25 were selected for algorithm training. Multi-biomarker statistical models outperformed individual biomarkers at estimating disease activity. Biomarker-based scores were significantly correlated with DAS28-CRP and could discriminate patients with low vs. moderate/high clinical disease activity. Such scores were also able to track changes in DAS28-CRP and were significantly associated with both joint inflammation measured by ultrasound and damage progression measured by radiography. The final MBDA algorithm uses 12 biomarkers to generate an MBDA score between 1 and 100. No significant effects on the MBDA score were found for common comorbidities. Conclusion We followed a stepwise approach to develop a quantitative serum-based measure of RA disease activity, based on 12-biomarkers, which was consistently associated with clinical disease activity levels. PMID:23585841
Development of a multi-biomarker disease activity test for rheumatoid arthritis.
Centola, Michael; Cavet, Guy; Shen, Yijing; Ramanujan, Saroja; Knowlton, Nicholas; Swan, Kathryn A; Turner, Mary; Sutton, Chris; Smith, Dustin R; Haney, Douglas J; Chernoff, David; Hesterberg, Lyndal K; Carulli, John P; Taylor, Peter C; Shadick, Nancy A; Weinblatt, Michael E; Curtis, Jeffrey R
2013-01-01
Disease activity measurement is a key component of rheumatoid arthritis (RA) management. Biomarkers that capture the complex and heterogeneous biology of RA have the potential to complement clinical disease activity assessment. To develop a multi-biomarker disease activity (MBDA) test for rheumatoid arthritis. Candidate serum protein biomarkers were selected from extensive literature screens, bioinformatics databases, mRNA expression and protein microarray data. Quantitative assays were identified and optimized for measuring candidate biomarkers in RA patient sera. Biomarkers with qualifying assays were prioritized in a series of studies based on their correlations to RA clinical disease activity (e.g. the Disease Activity Score 28-C-Reactive Protein [DAS28-CRP], a validated metric commonly used in clinical trials) and their contributions to multivariate models. Prioritized biomarkers were used to train an algorithm to measure disease activity, assessed by correlation to DAS and area under the receiver operating characteristic curve for classification of low vs. moderate/high disease activity. The effect of comorbidities on the MBDA score was evaluated using linear models with adjustment for multiple hypothesis testing. 130 candidate biomarkers were tested in feasibility studies and 25 were selected for algorithm training. Multi-biomarker statistical models outperformed individual biomarkers at estimating disease activity. Biomarker-based scores were significantly correlated with DAS28-CRP and could discriminate patients with low vs. moderate/high clinical disease activity. Such scores were also able to track changes in DAS28-CRP and were significantly associated with both joint inflammation measured by ultrasound and damage progression measured by radiography. The final MBDA algorithm uses 12 biomarkers to generate an MBDA score between 1 and 100. No significant effects on the MBDA score were found for common comorbidities. We followed a stepwise approach to develop a quantitative serum-based measure of RA disease activity, based on 12-biomarkers, which was consistently associated with clinical disease activity levels.
NASA Astrophysics Data System (ADS)
Prejean-Harris, Rose M.
Over the last decade, accountability has been the driving force for many changes in education in the United States. One major educational reform effort is the standards-based movement with a focus of combining a number of processes that involve aligning curriculum, instruction, assessment and feedback to specific standards that are measureable and indicative of student achievement. The purpose of this study is to determine if the type of report card is a possible predictor of third grade student achievement on standardized tests in mathematics and science for the 2012 Criterion-Referenced Competency Test (CRCT). The results of this study concluded that the difference in test scores in mathematics and science for students in the traditional report card group was not statistically significant when compared to the scores of students in the standards-based report card group when controlling for poverty level, school locale, and school district. However, students in the traditional report card group scored an average of 1.01 point higher in mathematics and 2.27 points higher in science than students in the standards-based report card group.
Stefanidis, Dimitrios; Korndorffer, James R; Black, F William; Dunne, J Bruce; Sierra, Rafael; Touchard, Cheri L; Rice, David A; Markert, Ronald J; Kastl, Peter R; Scott, Daniel J
2006-08-01
Laparoscopic simulator training translates into improved operative performance. Proficiency-based curricula maximize efficiency by tailoring training to meet the needs of each individual; however, because rates of skill acquisition vary widely, such curricula may be difficult to implement. We hypothesized that psychomotor testing would predict baseline performance and training duration in a proficiency-based laparoscopic simulator curriculum. Residents (R1, n = 20) were enrolled in an IRB-approved prospective study at the beginning of the academic year. All completed the following: a background information survey, a battery of 12 innate ability measures (5 motor, and 7 visual-spatial), and baseline testing on 3 validated simulators (5 videotrainer [VT] tasks, 12 virtual reality [minimally invasive surgical trainer-virtual reality, MIST-VR] tasks, and 2 laparoscopic camera navigation [LCN] tasks). Participants trained to proficiency, and training duration and number of repetitions were recorded. Baseline test scores were correlated to skill acquisition rate. Cutoff scores for each predictive test were calculated based on a receiver operator curve, and their sensitivity and specificity were determined in identifying slow learners. Only the Cards Rotation test correlated with baseline simulator ability on VT and LCN. Curriculum implementation required 347 man-hours (6-person team) and 795,000 dollars of capital equipment. With an attendance rate of 75%, 19 of 20 residents (95%) completed the curriculum by the end of the academic year. To complete training, a median of 12 hours (range, 5.5-21), and 325 repetitions (range, 171-782) were required. Simulator score improvement was 50%. Training duration and repetitions correlated with prior video game and billiard exposure, grooved pegboard, finger tap, map planning, Rey Figure Immediate Recall score, and baseline performance on VT and LCN. The map planning cutoff score proved most specific in identifying slow learners. Proficiency-based laparoscopic simulator training provides improvement in performance and can be effectively implemented as a routine part of resident education, but may require significant resources. Although psychomotor testing may be of limited value in the prediction of baseline laparoscopic performance, its importance may lie in the prediction of the rapidity of skill acquisition. These tests may be useful in optimizing curricular design by allowing the tailoring of training to individual needs.
Kim, Ji-Hoon; Kim, Young-Min; Park, Seong Heui; Ju, Eun A; Choi, Se Min; Hong, Tai Yong
2017-06-01
The aim of the study was to compare the educational impact of two postsimulation debriefing methods-focused and corrective feedback (FCF) versus Structured and Supported Debriefing (SSD)-on team dynamics in simulation-based cardiac arrest team training. This was a pilot randomized controlled study conducted at a simulation center. Fourth-year medical students were randomly assigned to the FCF or SSD group, with each team composed of six students and a confederate. Each team participated in two simulations and the assigned debriefing (FCF or SSD) sessions and then underwent a test simulation. Two trained raters blindly assessed all of the recorded simulations using checklists. The primary outcome was the improvement in team dynamics scores between baseline and test simulation. The secondary outcomes were improvements before and after training in team clinical performance scores, self-assessed comprehension of and confidence in cardiac arrest management and team dynamics, as well as evaluations of the postsimulation debriefing intervention. In total, 95 students participated [FCF (8 teams, n = 47) and SSD (8 teams, n = 48)]. The SSD team dynamics score during the test simulation was higher than at baseline [baseline: 74.5 (65.9-80.9), test: 85.0 (71.9-87.6), P = 0.035]. However, there were no differences in the improvement in the team dynamics or team clinical performance scores between the two groups (P = 0.328, respectively). There was no significant difference in improvement in team dynamics scores during the test simulation compared with baseline between the SSD and FCF groups in a simulation-based cardiac arrest team training in fourth-year Korean medical students.
Terwee, Caroline B; Mokkink, Lidwine B; Knol, Dirk L; Ostelo, Raymond W J G; Bouter, Lex M; de Vet, Henrica C W
2012-05-01
The COSMIN checklist is a standardized tool for assessing the methodological quality of studies on measurement properties. It contains 9 boxes, each dealing with one measurement property, with 5-18 items per box about design aspects and statistical methods. Our aim was to develop a scoring system for the COSMIN checklist to calculate quality scores per measurement property when using the checklist in systematic reviews of measurement properties. The scoring system was developed based on discussions among experts and testing of the scoring system on 46 articles from a systematic review. Four response options were defined for each COSMIN item (excellent, good, fair, and poor). A quality score per measurement property is obtained by taking the lowest rating of any item in a box ("worst score counts"). Specific criteria for excellent, good, fair, and poor quality for each COSMIN item are described. In defining the criteria, the "worst score counts" algorithm was taken into consideration. This means that only fatal flaws were defined as poor quality. The scores of the 46 articles show how the scoring system can be used to provide an overview of the methodological quality of studies included in a systematic review of measurement properties. Based on experience in testing this scoring system on 46 articles, the COSMIN checklist with the proposed scoring system seems to be a useful tool for assessing the methodological quality of studies included in systematic reviews of measurement properties.
A 2-year study of Gram stain competency assessment in 40 clinical laboratories.
Goodyear, Nancy; Kim, Sara; Reeves, Mary; Astion, Michael L
2006-01-01
We used a computer-based competency assessment tool for Gram stain interpretation to assess the performance of 278 laboratory staff from 40 laboratories on 40 multiple-choice questions. We report test reliability, mean scores, median, item difficulty, discrimination, and analysis of the highest- and lowest-scoring questions. The questions were reliable (KR-20 coefficient, 0.80). Overall mean score was 88% (range, 63%-98%). When categorized by cell type, the means were host cells, 93%; other cells (eg, yeast), 92%; gram-positive, 90%; and gram-negative, 88%. When categorized by type of interpretation, the means were other (eg, underdecolorization), 92%; identify by structure (eg, bacterial morphologic features), 91%; and identify by name (eg, genus and species), 87%. Of the 6 highest-scoring questions (mean scores, > or = 99%) 5 were identify by structure and 1 was identify by name. Of the 6 lowest-scoring questions (mean scores, < 75%) 5 were gram-negative and 1 was host cells. By type of interpretation, 2 were identify by structure and 4 were identify by name. Computer-based Gram stain competency assessment examinations are reliable. Our analysis helps laboratories identify areas for continuing education in Gram stain interpretation and will direct future revisions of the tests.
A weighted generalized score statistic for comparison of predictive values of diagnostic tests.
Kosinski, Andrzej S
2013-03-15
Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting. Copyright © 2012 John Wiley & Sons, Ltd.
Hulett, Judie L; Weiss, Robert E; Bwibo, Nimrod O; Galal, Osman M; Drorbaugh, Natalie; Neumann, Charlotte G
2014-03-14
Micronutrient deficiencies and suboptimal energy intake are widespread in rural Kenya, with detrimental effects on child growth and development. Sporadic school feeding programmes rarely include animal source foods (ASF). In the present study, a cluster-randomised feeding trial was undertaken to determine the impact of snacks containing ASF on district-wide, end-term standardised school test scores and nutrient intake. A total of twelve primary schools were randomly assigned to one of three isoenergetic feeding groups (a local plant-based stew (githeri) with meat, githeri plus whole milk or githeri with added oil) or a control group receiving no intervention feeding. After the initial term that served as baseline, children were fed at school for five consecutive terms over two school years from 1999 to 2001. Longitudinal analysis was used controlling for average energy intake, school attendance, and baseline socio-economic status, age, sex and maternal literacy. Children in the Meat group showed significantly greater improvements in test scores than those in all the other groups, and the Milk group showed significantly greater improvements in test scores than the Plain Githeri (githeri+oil) and Control groups. Compared with the Control group, the Meat group showed significant improvements in test scores in Arithmetic, English, Kiembu, Kiswahili and Geography. The Milk group showed significant improvements compared with the Control group in test scores in English, Kiswahili, Geography and Science. Folate, Fe, available Fe, energy per body weight, vitamin B₁₂, Zn and riboflavin intake were significant contributors to the change in test scores. The greater improvements in test scores of children receiving ASF indicate improved academic performance, which can result in greater academic achievement.
Merkel, C; Morabito, A; Sacerdoti, D; Bolognesi, M; Angeli, P; Gatta, A
1998-06-01
The determination of aminopyrine breath test on entry into the study was recently shown to improve the accuracy of prediction of death based on the Child-Pugh classification, but the possible usefulness of serial determinations of both parameters has not been assessed. In the present study, we aimed at evaluating whether serial determinations of aminopyrine breath test and Child-Pugh score improve prognostic accuracy in patients with cirrhosis, compared with determinations obtained only on admission. In 74 patients with liver cirrhosis aminopyrine breath test and Child-Pugh score were obtained upon entry into the study. Patients were followed with sequential aminopyrine breath tests and assessments of the Child-Pugh score every 4-6 months. A total number of 232 determinations were obtained. During follow-up 45 patients died, on average after 12 months of follow-up. Child-Pugh score improved in the beginning of follow-up, and then remained fairly constant; aminopyrine breath test showed no improvement in the beginning of follow-up, but rather a slowly progressive decline. In patients who died, both the Child-Pugh score and the metabolism of aminopyrine were significantly more impaired in the last year preceding death (p < 0.05). Applying Cox's regression model with time-dependent covariates, Child-Pugh score and aminopyrine breath test were independent significant predictors of survival. The model with time-dependent covariates explained the observed survival much better than the model with time-fixed covariates (chi-sq. explained by regression = 31.45 vs 11.97; d.f. = 2; p = 0.0000001 vs 0.003). These data suggest that serial determinations of Child-Pugh score and aminopyrine breath test can be used to efficiently update prognosis of cirrhosis.
Bazelet, Corinna S; Thompson, Aileen C; Naskrecki, Piotr
2016-01-01
The use of endemism and vascular plants only for biodiversity hotspot delineation has long been contested. Few studies have focused on the efficacy of global biodiversity hotspots for the conservation of insects, an important, abundant, and often ignored component of biodiversity. We aimed to test five alternative diversity measures for hotspot delineation and examine the efficacy of biodiversity hotspots for conserving a non-typical target organism, South African katydids. Using a 1° fishnet grid, we delineated katydid hotspots in two ways: (1) count-based: grid cells in the top 10% of total, endemic, threatened and/or sensitive species richness; vs. (2) score-based: grid cells with a mean value in the top 10% on a scoring system which scored each species on the basis of its IUCN Red List threat status, distribution, mobility and trophic level. We then compared katydid hotspots with each other and with recognized biodiversity hotspots. Grid cells within biodiversity hotspots had significantly higher count-based and score-based diversity than non-hotspot grid cells. There was a significant association between the three types of hotspots. Of the count-based measures, endemic species richness was the best surrogate for the others. However, the score-based measure out-performed all count-based diversity measures. Species richness was the least successful surrogate of all. The strong performance of the score-based method for hotspot prediction emphasizes the importance of including species' natural history information for conservation decision-making, and is easily adaptable to other organisms. Furthermore, these results add empirical support for the efficacy of biodiversity hotspots in conserving non-target organisms.
Bazelet, Corinna S.; Thompson, Aileen C.; Naskrecki, Piotr
2016-01-01
The use of endemism and vascular plants only for biodiversity hotspot delineation has long been contested. Few studies have focused on the efficacy of global biodiversity hotspots for the conservation of insects, an important, abundant, and often ignored component of biodiversity. We aimed to test five alternative diversity measures for hotspot delineation and examine the efficacy of biodiversity hotspots for conserving a non-typical target organism, South African katydids. Using a 1° fishnet grid, we delineated katydid hotspots in two ways: (1) count-based: grid cells in the top 10% of total, endemic, threatened and/or sensitive species richness; vs. (2) score-based: grid cells with a mean value in the top 10% on a scoring system which scored each species on the basis of its IUCN Red List threat status, distribution, mobility and trophic level. We then compared katydid hotspots with each other and with recognized biodiversity hotspots. Grid cells within biodiversity hotspots had significantly higher count-based and score-based diversity than non-hotspot grid cells. There was a significant association between the three types of hotspots. Of the count-based measures, endemic species richness was the best surrogate for the others. However, the score-based measure out-performed all count-based diversity measures. Species richness was the least successful surrogate of all. The strong performance of the score-based method for hotspot prediction emphasizes the importance of including species’ natural history information for conservation decision-making, and is easily adaptable to other organisms. Furthermore, these results add empirical support for the efficacy of biodiversity hotspots in conserving non-target organisms. PMID:27631131
Robust joint score tests in the application of DNA methylation data analysis.
Li, Xuan; Fu, Yuejiao; Wang, Xiaogang; Qiu, Weiliang
2018-05-18
Recently differential variability has been showed to be valuable in evaluating the association of DNA methylation to the risks of complex human diseases. The statistical tests based on both differential methylation level and differential variability can be more powerful than those based only on differential methylation level. Anh and Wang (2013) proposed a joint score test (AW) to simultaneously detect for differential methylation and differential variability. However, AW's method seems to be quite conservative and has not been fully compared with existing joint tests. We proposed three improved joint score tests, namely iAW.Lev, iAW.BF, and iAW.TM, and have made extensive comparisons with the joint likelihood ratio test (jointLRT), the Kolmogorov-Smirnov (KS) test, and the AW test. Systematic simulation studies showed that: 1) the three improved tests performed better (i.e., having larger power, while keeping nominal Type I error rates) than the other three tests for data with outliers and having different variances between cases and controls; 2) for data from normal distributions, the three improved tests had slightly lower power than jointLRT and AW. The analyses of two Illumina HumanMethylation27 data sets GSE37020 and GSE20080 and one Illumina Infinium MethylationEPIC data set GSE107080 demonstrated that three improved tests had higher true validation rates than those from jointLRT, KS, and AW. The three proposed joint score tests are robust against the violation of normality assumption and presence of outlying observations in comparison with other three existing tests. Among the three proposed tests, iAW.BF seems to be the most robust and effective one for all simulated scenarios and also in real data analyses.
ERIC Educational Resources Information Center
Gagnon, Robert; Lubarsky, Stuart; Lambert, Carole; Charlin, Bernard
2011-01-01
The Script Concordance Test (SCT) uses a panel-based, aggregate scoring method that aims to capture the variability of responses of experienced practitioners to particular clinical situations. The use of this type of scoring method is a key determinant of the tool's discriminatory power, but deviant answers could potentially diminish the…
ERIC Educational Resources Information Center
Cai, Li
2013-01-01
Lord and Wingersky's (1984) recursive algorithm for creating summed score based likelihoods and posteriors has a proven track record in unidimensional item response theory (IRT) applications. Extending the recursive algorithm to handle multidimensionality is relatively simple, especially with fixed quadrature because the recursions can be defined…
ERIC Educational Resources Information Center
Hsiao, Yu-Yu; Kwok, Oi-Man; Lai, Mark H. C.
2018-01-01
Path models with observed composites based on multiple items (e.g., mean or sum score of the items) are commonly used to test interaction effects. Under this practice, researchers generally assume that the observed composites are measured without errors. In this study, we reviewed and evaluated two alternative methods within the structural…
Does the Test Work? Evaluating a Web-Based Language Placement Test
ERIC Educational Resources Information Center
Long, Avizia Y.; Shin, Sun-Young; Geeslin, Kimberly; Willis, Erik W.
2018-01-01
In response to the need for examples of test validation from which everyday language programs can benefit, this paper reports on a study that used Bachman's (2005) assessment use argument (AUA) framework to examine evidence to support claims made about the intended interpretations and uses of scores based on a new web-based Spanish language…
Method for automatic measurement of second language speaking proficiency
NASA Astrophysics Data System (ADS)
Bernstein, Jared; Balogh, Jennifer
2005-04-01
Spoken language proficiency is intuitively related to effective and efficient communication in spoken interactions. However, it is difficult to derive a reliable estimate of spoken language proficiency by situated elicitation and evaluation of a person's communicative behavior. This paper describes the task structure and scoring logic of a group of fully automatic spoken language proficiency tests (for English, Spanish and Dutch) that are delivered via telephone or Internet. Test items are presented in spoken form and require a spoken response. Each test is automatically-scored and primarily based on short, decontextualized tasks that elicit integrated listening and speaking performances. The tests present several types of tasks to candidates, including sentence repetition, question answering, sentence construction, and story retelling. The spoken responses are scored according to the lexical content of the response and a set of acoustic base measures on segments, words and phrases, which are scaled with IRT methods or parametrically combined to optimize fit to human listener judgments. Most responses are isolated spoken phrases and sentences that are scored according to their linguistic content, their latency, and their fluency and pronunciation. The item development procedures and item norming are described.
Accounting for estimated IQ in neuropsychological test performance with regression-based techniques.
Testa, S Marc; Winicki, Jessica M; Pearlson, Godfrey D; Gordon, Barry; Schretlen, David J
2009-11-01
Regression-based normative techniques account for variability in test performance associated with multiple predictor variables and generate expected scores based on algebraic equations. Using this approach, we show that estimated IQ, based on oral word reading, accounts for 1-9% of the variability beyond that explained by individual differences in age, sex, race, and years of education for most cognitive measures. These results confirm that adding estimated "premorbid" IQ to demographic predictors in multiple regression models can incrementally improve the accuracy with which regression-based norms (RBNs) benchmark expected neuropsychological test performance in healthy adults. It remains to be seen whether the incremental variance in test performance explained by estimated "premorbid" IQ translates to improved diagnostic accuracy in patient samples. We describe these methods, and illustrate the step-by-step application of RBNs with two cases. We also discuss the rationale, assumptions, and caveats of this approach. More broadly, we note that adjusting test scores for age and other characteristics might actually decrease the accuracy with which test performance predicts absolute criteria, such as the ability to drive or live independently.
Price, Larry R; Raju, Nambury; Lurie, Anna; Wilkins, Charles; Zhu, Jianjun
2006-02-01
A specific recommendation of the 1999 Standards for Educational and Psychological Testing by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education is that test publishers report estimates of the conditional standard error of measurement (SEM). Procedures for calculating the conditional (score-level) SEM based on raw scores are well documented; however, few procedures have been developed for estimating the conditional SEM of subtest or composite scale scores resulting from a nonlinear transformation. Item response theory provided the psychometric foundation to derive the conditional standard errors of measurement and confidence intervals for composite scores on the Wechsler Preschool and Primary Scale of Intelligence-Third Edition.
Student peer assessment in evidence-based medicine (EBM) searching skills training: an experiment
Eldredge, Jonathan D.; Bear, David G.; Wayne, Sharon J.; Perea, Paul P.
2013-01-01
Background: Student peer assessment (SPA) has been used intermittently in medical education for more than four decades, particularly in connection with skills training. SPA generally has not been rigorously tested, so medical educators have limited evidence about SPA effectiveness. Methods: Experimental design: Seventy-one first-year medical students were stratified by previous test scores into problem-based learning tutorial groups, and then these assigned groups were randomized further into intervention and control groups. All students received evidence-based medicine (EBM) training. Only the intervention group members received SPA training, practice with assessment rubrics, and then application of anonymous SPA to assignments submitted by other members of the intervention group. Results: Students in the intervention group had higher mean scores on the formative test with a potential maximum score of 49 points than did students in the control group, 45.7 and 43.5, respectively (P = 0.06). Conclusions: SPA training and the application of these skills by the intervention group resulted in higher scores on formative tests compared to those in the control group, a difference approaching statistical significance. The extra effort expended by librarians, other personnel, and medical students must be factored into the decision to use SPA in any specific educational context. Implications: SPA has not been rigorously tested, particularly in medical education. Future, similarly rigorous studies could further validate use of SPA so that librarians can optimally make use of limited contact time for information skills training in medical school curricula. PMID:24163593
Use of the binomial distribution to predict impairment: application in a nonclinical sample.
Axelrod, Bradley N; Wall, Jacqueline R; Estes, Bradley W
2008-01-01
A mathematical model based on the binomial theory was developed to illustrate when abnormal score variations occur by chance in a multitest battery (Ingraham & Aiken, 1996). It has been successfully used as a comparison for obtained test scores in clinical samples, but not in nonclinical samples. In the current study, this model has been applied to demographically corrected scores on the Halstead-Reitan Neuropsychological Test Battery, obtained from a sample of 94 nonclinical college students. Results found that 15% of the sample had impairments suggested by the Halstead Impairment Index, using criteria established by Reitan and Wolfson (1993). In addition, one-half of the sample obtained impaired scores on one or two tests. These results were compared to that predicted by the binomial model and found to be consistent. The model therefore serves as a useful resource for clinicians considering the probability of impaired test performance.
NASA Astrophysics Data System (ADS)
Marulcu, Ismail; Barnett, Michael
2016-01-01
Background: Elementary Science Education is struggling with multiple challenges. National and State test results confirm the need for deeper understanding in elementary science education. Moreover, national policy statements and researchers call for increased exposure to engineering and technology in elementary science education. The basic motivation of this study is to suggest a solution to both improving elementary science education and increasing exposure to engineering and technology in it. Purpose/Hypothesis: This mixed-method study examined the impact of an engineering design-based curriculum compared to an inquiry-based curriculum on fifth graders' content learning of simple machines. We hypothesize that the LEGO-engineering design unit is as successful as the inquiry-based unit in terms of students' science content learning of simple machines. Design/Method: We used a mixed-methods approach to investigate our research questions; we compared the control and the experimental groups' scores from the tests and interviews by using Analysis of Covariance (ANCOVA) and compared each group's pre- and post-scores by using paired t-tests. Results: Our findings from the paired t-tests show that both the experimental and comparison groups significantly improved their scores from the pre-test to post-test on the multiple-choice, open-ended, and interview items. Moreover, ANCOVA results show that students in the experimental group, who learned simple machines with the design-based unit, performed significantly better on the interview questions. Conclusions: Our analyses revealed that the design-based Design a people mover: Simple machines unit was, if not better, as successful as the inquiry-based FOSS Levers and pulleys unit in terms of students' science content learning.
Arday, D R; Brundage, J F; Gardner, L I; Goldenbaum, M; Wann, F; Wright, S
1991-06-15
The authors conducted a population-based study to attempt to estimate the effect of human immunodeficiency virus type 1 (HIV-1) seropositivity on Armed Services Vocational Aptitude Battery test scores in otherwise healthy individuals with early HIV-1 infection. The Armed Services Vocational Aptitude Battery is a 10-test written multiple aptitude battery administered to all civilian applicants for military enlistment prior to serologic screening for HIV-1 antibodies. A total of 975,489 induction testing records containing both Armed Services Vocational Aptitude Battery and HIV-1 results from October 1985 through March 1987 were examined. An analysis data set (n = 7,698) was constructed by choosing five controls for each of the 1,283 HIV-1-positive cases, matched on five-digit ZIP code, and a multiple linear regression analysis was performed to control for demographic and other factors that might influence test scores. Years of education was the strongest predictor of test scores, raising an applicant's score on a composite test nearly 0.16 standard deviation per year. The HIV-1-positive effect on the composite score was -0.09 standard deviation (99% confidence interval -0.17 to -0.02). Separate regressions on each component test within the battery showed HIV-1 effects between -0.39 and +0.06 standard deviation. The two Armed Services Vocational Aptitude Battery component tests felt a priori to be the most sensitive to HIV-1-positive status showed the least decrease with seropositivity. Much of the variability in test scores was not predicted by either HIV-1 serostatus or the demographic and other factors included in the model. There appeared to be little evidence of a strong HIV-1 effect.
The King-Devick test as a determinant of head trauma and concussion in boxers and MMA fighters.
Galetta, K M; Barrett, J; Allen, M; Madda, F; Delicata, D; Tennant, A T; Branas, C C; Maguire, M G; Messner, L V; Devick, S; Galetta, S L; Balcer, L J
2011-04-26
Sports-related concussion has received increasing attention as a cause of short- and long-term neurologic symptoms among athletes. The King-Devick (K-D) test is based on measurement of the speed of rapid number naming (reading aloud single-digit numbers from 3 test cards), and captures impairment of eye movements, attention, language, and other correlates of suboptimal brain function. We investigated the K-D test as a potential rapid sideline screening for concussion in a cohort of boxers and mixed martial arts fighters. The K-D test was administered prefight and postfight. The Military Acute Concussion Evaluation (MACE) was administered as a more comprehensive but longer test for concussion. Differences in postfight K-D scores and changes in scores from prefight to postfight were compared for athletes with head trauma during the fight vs those without. Postfight K-D scores (n = 39 participants) were significantly higher (worse) for those with head trauma during the match (59.1 ± 7.4 vs 41.0 ± 6.7 seconds, p < 0.0001, Wilcoxon rank sum test). Those with loss of consciousness showed the greatest worsening from prefight to postfight. Worse postfight K-D scores (r(s) = -0.79, p = 0.0001) and greater worsening of scores (r(s) = 0.90, p < 0.0001) correlated well with postfight MACE scores. Worsening of K-D scores by ≥5 seconds was a distinguishing characteristic noted only among participants with head trauma. High levels of test-retest reliability were observed (intraclass correlation coefficient 0.97 [95% confidence interval 0.90-1.0]). The K-D test is an accurate and reliable method for identifying athletes with head trauma, and is a strong candidate rapid sideline screening test for concussion.
The King-Devick test as a determinant of head trauma and concussion in boxers and MMA fighters
Galetta, K.M.; Barrett, J.; Allen, M.; Madda, F.; Delicata, D.; Tennant, A.T.; Branas, C.C.; Maguire, M.G.; Messner, L.V.; Devick, S.; Galetta, S.L.
2011-01-01
Objective: Sports-related concussion has received increasing attention as a cause of short- and long-term neurologic symptoms among athletes. The King-Devick (K-D) test is based on measurement of the speed of rapid number naming (reading aloud single-digit numbers from 3 test cards), and captures impairment of eye movements, attention, language, and other correlates of suboptimal brain function. We investigated the K-D test as a potential rapid sideline screening for concussion in a cohort of boxers and mixed martial arts fighters. Methods: The K-D test was administered prefight and postfight. The Military Acute Concussion Evaluation (MACE) was administered as a more comprehensive but longer test for concussion. Differences in postfight K-D scores and changes in scores from prefight to postfight were compared for athletes with head trauma during the fight vs those without. Results: Postfight K-D scores (n = 39 participants) were significantly higher (worse) for those with head trauma during the match (59.1 ± 7.4 vs 41.0 ± 6.7 seconds, p < 0.0001, Wilcoxon rank sum test). Those with loss of consciousness showed the greatest worsening from prefight to postfight. Worse postfight K-D scores (rs = −0.79, p = 0.0001) and greater worsening of scores (rs = 0.90, p < 0.0001) correlated well with postfight MACE scores. Worsening of K-D scores by ≥5 seconds was a distinguishing characteristic noted only among participants with head trauma. High levels of test-retest reliability were observed (intraclass correlation coefficient 0.97 [95% confidence interval 0.90–1.0]). Conclusions: The K-D test is an accurate and reliable method for identifying athletes with head trauma, and is a strong candidate rapid sideline screening test for concussion. PMID:21288984
Lantelme, Pierre; Eltchaninoff, Hélène; Rabilloud, Muriel; Souteyrand, Géraud; Dupré, Marion; Spaziano, Marco; Bonnet, Marc; Becle, Clément; Riche, Benjamin; Durand, Eric; Bouvier, Erik; Dacher, Jean-Nicolas; Courand, Pierre-Yves; Cassagnes, Lucie; Dávila Serrano, Eduardo E; Motreff, Pascal; Boussel, Loic; Lefèvre, Thierry; Harbaoui, Brahim
2018-05-11
The aim of this study was to develop a new scoring system based on thoracic aortic calcification (TAC) to predict 1-year cardiovascular and all-cause mortality. A calcified aorta is often associated with poor prognosis after transcatheter aortic valve replacement (TAVR). A risk score encompassing aortic calcification may be valuable in identifying poor TAVR responders. The C 4 CAPRI (4 Cities for Assessing CAlcification PRognostic Impact) multicenter study included a training cohort (1,425 patients treated using TAVR between 2010 and 2014) and a contemporary test cohort (311 patients treated in 2015). TAC was measured by computed tomography pre-TAVR. CAPRI risk scores were based on the linear predictors of Cox models including TAC in addition to comorbidities and demographic, atherosclerotic disease and cardiac function factors. CAPRI scores were constructed and tested in 2 independent cohorts. Cardiovascular and all-cause mortality at 1 year was 13.0% and 17.9%, respectively, in the training cohort and 8.2% and 11.8% in the test cohort. The inclusion of TAC in the model improved prediction: 1-cm 3 increase in TAC was associated with a 6% increase in cardiovascular mortality and a 4% increase in all-cause mortality. The predicted and observed survival probabilities were highly correlated (slopes >0.9 for both cardiovascular and all-cause mortality). The model's predictive power was fair (AUC 68% [95% confidence interval [CI]: 64-72]) for both cardiovascular and all-cause mortality. The model performed similarly in the training and test cohorts. The CAPRI score, which combines the TAC variable with classical prognostic factors, is predictive of 1-year cardiovascular and all-cause mortality. Its predictive performance was confirmed in an independent contemporary cohort. CAPRI scores are highly relevant to current practice and strengthen the evidence base for decision making in valvular interventions. Its routine use may help prevent futile procedures. Copyright © 2018 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.
Spreckelsen, C; Juenger, J
2017-09-26
Adequate estimation and communication of risks is a critical competence of physicians. Due to an evident lack of these competences, effective training addressing risk competence during medical education is needed. Test-enhanced learning has been shown to produce marked effects on achievements. This study aimed to investigate the effect of repeated tests implemented on top of a blended learning program for risk competence. We introduced a blended-learning curriculum for risk estimation and risk communication based on a set of operationalized learning objectives, which was integrated into a mandatory course "Evidence-based Medicine" for third-year students. A randomized controlled trial addressed the effect of repeated testing on achievement as measured by the students' pre- and post-training score (nine multiple-choice items). Basic numeracy and statistical literacy were assessed at baseline. Analysis relied on descriptive statistics (histograms, box plots, scatter plots, and summary of descriptive measures), bootstrapped confidence intervals, analysis of covariance (ANCOVA), and effect sizes (Cohen's d, r) based on adjusted means and standard deviations. All of the 114 students enrolled in the course consented to take part in the study and were assigned to either the intervention or control group (both: n = 57) by balanced randomization. Five participants dropped out due to non-compliance (control: 4, intervention: 1). Both groups profited considerably from the program in general (Cohen's d for overall pre vs. post scores: 2.61). Repeated testing yielded an additional positive effect: while the covariate (baseline score) exhibits no relation to the post-intervention score, F(1, 106) = 2.88, p > .05, there was a significant effect of the intervention (repeated tests scenario) on learning achievement, F(1106) = 12.72, p < .05, d = .94, r = .42 (95% CI: [.26, .57]). However, in the subgroup of participants with a high initial numeracy score no similar effect could be observed. Dedicated training can improve relevant components of risk competence of medical students. An already promising overall effect of the blended learning approach can be improved significantly by implementing a test-enhanced learning design, namely repeated testing. As students with a high initial numeracy score did not profit equally from repeated testing, target-group specific opt-out may be offered.
Thakur, Sahil; Ichhpujani, Parul; Kumar, Suresh; Kaur, Ravneet; Sood, Sunandan
2018-05-14
This study was designed to assess the efficacy, reliability and repeatability of SPARCS (Spaeth Richman Contrast Sensitivity Test) as compared to the conventional Pelli Robson Chart Test for the assessment of contrast sensitivity in patients with glaucoma. We evaluated 135 eyes of 135 patients who were age and sex matched into three groups (controls, disc suspects and glaucoma) of 45 patients each. The glaucoma subgroup was further divided into subgroups of mild, moderate and severe based on the visual field damage. There was a strong positive correlation between Pelli Robson scores and SPARCS scores (S = 0.807, P < 0.001). Intraclass correlation coefficient (ICC) for Pelli Robson Test was 0.952 and 0.988 for SPARCS. The coefficient of repeatability (COR) for mean SPARCS was 5.65%, while COR of Pelli Robson Test was 12.44%. SPARCS was found to have better repeatability than Pelli Robson Test based on COR values. Pelli Robson score had a sensitivity of 80% and a specificity of 65.6% for detecting glaucoma patients as compared to 84.4% and 70%, respectively, for SPARCS scores. SPARCS is a better alternative to conventional Pelli Robson Chart Test for assessment of contrast sensitivity in patients with glaucoma. Being independent of the effects of literacy and educational status, it offers a universal way to measure contrast sensitivity. It can also be reliably used in patients with varying severity of glaucoma.
External validation of the HIT Expert Probability (HEP) score.
Joseph, Lee; Gomes, Marcelo P V; Al Solaiman, Firas; St John, Julie; Ozaki, Asuka; Raju, Manjunath; Dhariwal, Manoj; Kim, Esther S H
2015-03-01
The diagnosis of heparin-induced thrombocytopenia (HIT) can be challenging. The HIT Expert Probability (HEP) Score has recently been proposed to aid in the diagnosis of HIT. We sought to externally and prospectively validate the HEP score. We prospectively assessed pre-test probability of HIT for 51 consecutive patients referred to our Consultative Service for evaluation of possible HIT between August 1, 2012 and February 1, 2013. Two Vascular Medicine fellows independently applied the 4T and HEP scores for each patient. Two independent HIT expert adjudicators rendered a diagnosis of HIT likely or unlikely. The median (interquartile range) of 4T and HEP scores were 4.5 (3.0, 6.0) and 5 (3.0, 8.5), respectively. There were no significant differences between area under receiver-operating characteristic curves of 4T and HEP scores against the gold standard, confirmed HIT [defined as positive serotonin release assay and positive anti-PF4/heparin ELISA] (0.74 vs 0.73, p = 0.97). HEP score ≥ 2 was 100 % sensitive and 16 % specific for determining the presence of confirmed HIT while a 4T score > 3 was 93 % sensitive and 35 % specific. In conclusion, the HEP and 4T scores are excellent screening pre-test probability models for HIT, however, in this prospective validation study, test characteristics for the diagnosis of HIT based on confirmatory laboratory testing and expert opinion are similar. Given the complexity of the HEP scoring model compared to that of the 4T score, further validation of the HEP score is warranted prior to widespread clinical acceptance.
Wu, Xiangxiang; Zeng, Huahui; Zhu, Xin; Ma, Qiujuan; Hou, Yimin; Wu, Xuefen
2013-11-20
A series of pyrrolopyridinone derivatives as specific inhibitors towards the cell division cycle 7 (Cdc7) was taken into account, and the efficacy of these compounds was analyzed by QSAR and docking approaches to gain deeper insights into the interaction mechanism and ligands selectivity for Cdc7. By regression analysis the prediction models based on Grid score and Zou-GB/SA score were found, respectively with good quality of fits (r(2)=0.748, 0.951; r(cv)(2)=0.712, 0.839). The accuracy of the models was validated by test set and the deviation of the predicted values in validation set using Zou-GB/SA score was smaller than that using Grid score, suggesting that the model based on Zou-GB/SA score provides a more effective method for predicting potencies of Cdc7 inhibitors. Copyright © 2013 Elsevier B.V. All rights reserved.
Effectiveness of the training material in drug-dose calculation skills.
Basak, Tulay; Aslan, Ozlem; Unver, Vesile; Yildiz, Dilek
2016-07-01
The aim of study was to evaluate the effectiveness of the training material based on low-level environmental fidelity simulation in drug-dose calculation skills in senior nursing students. A quasi-experimental design with one group. The sample included senior nursing students attending a nursing school in Turkey in the period December 2012-January 2013. Eighty-two senior nursing students were included in the sample. Data were obtained using a data collection form which was developed by the researchers. A paired-sample t-test was used to compare the pretest and post-test scores. The difference between the mean pretest score and the mean post-test score was statistically significant (P < 0.05). This study revealed that the training material based on low-level environmental fidelity simulation positively impacted accurate drug-dose calculation skills in senior nursing students. © 2016 Japan Academy of Nursing Science.
NASA Astrophysics Data System (ADS)
Dhitareka, P. H.; Firman, H.; Rusyati, L.
2018-05-01
This research is comparing science virtual and paper-based test in measuring grade 7 students’ critical thinking based on Multiple Intelligences and gender. Quasi experimental method with within-subjects design is conducted in this research in order to obtain the data. The population of this research was all seventh grade students in ten classes of one public secondary school in Bandung. There were 71 students within two classes taken randomly became the sample in this research. The data are obtained through 28 questions with a topic of living things and environmental sustainability constructed based on eight critical thinking elements proposed by Inch then the questions provided in science virtual and paper-based test. The data was analysed by using paired-samples t test when the data are parametric and Wilcoxon signed ranks test when the data are non-parametric. In general comparison, the p-value of the comparison between science virtual and paper-based tests’ score is 0.506, indicated that there are no significance difference between science virtual and paper-based test based on the tests’ score. The results are furthermore supported by the students’ attitude result which is 3.15 from the scale from 1 to 4, indicated that they have positive attitudes towards Science Virtual Test.
ERIC Educational Resources Information Center
Lee, Chung-Ping; Lou, Shi-Jer; Shih, Ru-Chu; Tseng, Kuo-Hung
2011-01-01
This study uses the analytical hierarchy process (AHP) to quantify important knowledge management behaviors and to analyze the weight scores of elementary school students' behaviors in knowledge transfer, sharing, and creation. Based on the analysis of Expert Choice and tests for validity and reliability, this study identified the weight scores of…
A General Approach to Measuring Test-Taking Effort on Computer-Based Tests
ERIC Educational Resources Information Center
Wise, Steven L.; Gao, Lingyun
2017-01-01
There has been an increased interest in the impact of unmotivated test taking on test performance and score validity. This has led to the development of new ways of measuring test-taking effort based on item response time. In particular, Response Time Effort (RTE) has been shown to provide an assessment of effort down to the level of individual…
ERIC Educational Resources Information Center
Barnette, J. Jackson
2005-01-01
An Excel program developed to assist researchers in the determination and presentation of confidence intervals around commonly used score reliability coefficients is described. The software includes programs to determine confidence intervals for Cronbachs alpha, Pearson r-based coefficients such as those used in test-retest and alternate forms…
Report: States See Test-Score Gains
ERIC Educational Resources Information Center
Viadero, Debra
2004-01-01
This article discusses a report from Education Trust, a Washington-based research and advocacy group. The report says almost half the states have seen rising math scores on their state exams for elementary school pupils since the federal No Child Left Behind law was enacted. It also states that reading scores have improved among 4th and 5th…
Development of website for studying modern physics
NASA Astrophysics Data System (ADS)
Saehana, S.; Wahyono, U.; Darmadi, I. W.; Kendek, Y.; Widyawati, W.
2018-03-01
The purpose of this study is to produce a website in modern physics courses in order to increase student interest in physics learning. To determine the feasibility level of learning media then feasibility test to the product. The feasibility test carried out on the product is divided into three parts: material feasibility test, media feasibility test, and student response test. Based on the results of the test conducted the material obtained an average score of 3.72 and categorized very well. The result of media test that was obtained got the average score of 3.25 and categorized well. The result of the analysis of student's response to the twenty students of class A (fifth semester) of physics education program FKIP UniversitasTadulako obtained an average score of 3.16 with the good category. The results showed that the website developed can be used as one of the learning media that can support the learning process of students.
ERIC Educational Resources Information Center
Bartik, Timothy J.
2013-01-01
This paper uses a regression discontinuity model to examine the effects on kindergarten entrance assessments of the Kalamazoo County Ready 4s (KC Ready 4s) program, a half-day pre-K program for four-year-olds in Kalamazoo County, Michigan. The results are based on test scores and other characteristics of up to 220 children participating in KC…
Construction of an Exome-Wide Risk Score for Schizophrenia Based on a Weighted Burden Test.
Curtis, David
2018-01-01
Polygenic risk scores obtained as a weighted sum of associated variants can be used to explore association in additional data sets and to assign risk scores to individuals. The methods used to derive polygenic risk scores from common SNPs are not suitable for variants detected in whole exome sequencing studies. Rare variants, which may have major effects, are seen too infrequently to judge whether they are associated and may not be shared between training and test subjects. A method is proposed whereby variants are weighted according to their frequency, their annotations and the genes they affect. A weighted sum across all variants provides an individual risk score. Scores constructed in this way are used in a weighted burden test and are shown to be significantly different between schizophrenia cases and controls using a five-way cross-validation procedure. This approach represents a first attempt to summarise exome sequence variation into a summary risk score, which could be combined with risk scores from common variants and from environmental factors. It is hoped that the method could be developed further. © 2017 John Wiley & Sons Ltd/University College London.
Liaw, Sok Ying; Chan, Sally Wai-Chi; Chen, Fun-Gee; Hooi, Shing Chuan; Siau, Chiang
2014-09-17
Virtual patient simulation has grown substantially in health care education. A virtual patient simulation was developed as a refresher training course to reinforce nursing clinical performance in assessing and managing deteriorating patients. The objective of this study was to describe the development of the virtual patient simulation and evaluate its efficacy, by comparing with a conventional mannequin-based simulation, for improving the nursing students' performances in assessing and managing patients with clinical deterioration. A randomized controlled study was conducted with 57 third-year nursing students who were recruited through email. After a baseline evaluation of all participants' clinical performance in a simulated environment, the experimental group received a 2-hour fully automated virtual patient simulation while the control group received 2-hour facilitator-led mannequin-based simulation training. All participants were then re-tested one day (first posttest) and 2.5 months (second posttest) after the intervention. The participants from the experimental group completed a survey to evaluate their learning experiences with the newly developed virtual patient simulation. Compared to their baseline scores, both experimental and control groups demonstrated significant improvements (P<.001) in first and second post-test scores. While the experimental group had significantly lower (P<.05) second post-test scores compared with the first post-test scores, no significant difference (P=.94) was found between these two scores for the control group. The scores between groups did not differ significantly over time (P=.17). The virtual patient simulation was rated positively. A virtual patient simulation for a refreshing training course on assessing and managing clinical deterioration was developed. Although the randomized controlled study did not show that the virtual patient simulation was superior to mannequin-based simulation, both simulations have demonstrated to be effective refresher learning strategies for improving nursing students' clinical performance. Given the greater resource requirements of mannequin-based simulation, the virtual patient simulation provides a more promising alternative learning strategy to mitigate the decay of clinical performance over time.
Preterm birth, social disadvantage, and cognitive competence in Swedish 18- to 19-year-old men.
Ekeus, Cecilia; Lindström, Karolina; Lindblad, Frank; Rasmussen, Finn; Hjern, Anders
2010-01-01
The aim was to study the impact of a range of gestational ages (GAs) on cognitive competence in late adolescence and how this effect is modified by contextual social adversity in childhood. This was a register study based on a national cohort of 119664 men born in Sweden from 1973 to 1976. Data on GA and other perinatal factors were obtained from the Medical Birth Register, and information on cognitive test scores was extracted from military conscription at the ages of 18 to 19 years. Test scores were analyzed as z scores on a 9-point stanine scale, whereby each unit is equivalent to 0.5 SD. Socioeconomic indicators of the childhood household were obtained from the Population and Housing Census of 1990. The data were analyzed by multivariate linear regression. The mean cognitive test scores decreased in a stepwise manner with GA. In unadjusted analysis, the test scores were 0.63 stanine unit lower in men who were born after 24 to 32 gestational weeks than in those who were born at term. The difference in global scores between the lowest and highest category of socioeconomic status was 1.57. Adjusting the analysis for the childhood socioeconomic indicators decreased the effect of GA on cognitive test scores by 26% to 33%. There was also a multiplicative interaction effect of social adversity and moderately preterm birth on cognitive test scores. This study confirms previous claims of an incremental association of cognitive competence with GA. Socioeconomic indicators in childhood modified this effect at all levels of preterm birth.
Kyle Harrold, G; Hasanaj, Lisena; Moehringer, Nicholas; Zhang, Isis; Nolan, Rachel; Serrano, Liliana; Raynowska, Jenelle; Rucker, Janet C; Flanagan, Steven R; Cardone, Dennis; Galetta, Steven L; Balcer, Laura J
2017-08-15
This study investigated the utility of sideline concussion tests, including components of the Sports Concussion Assessment Tool, 3rd Edition (SCAT3) and the King-Devick (K-D), a vision-based test of rapid number naming, in an outpatient, multidisciplinary concussion center treating patients with both sports-related and non-sports related concussions. The ability of these tests to predict clinical outcomes based on the scores at the initial visit was evaluated. Scores for components of the SCAT3 and the K-D were fit into regression models accounting for age, gender, and sport/non-sport etiology in order to predict clinical outcome measures including total number of visits to the concussion center, whether the patient reached a SCAT3 symptom severity score≤7, and the total types of referrals each patient received over their course. Patient characteristics, differences between those with sport and non-sport etiologies, and correlations between the tests were also analyzed. Among 426 patients with concussion, SCAT3 total symptom score and symptom severity score at the initial visit predicted each of the clinical outcome variables. K-D score at the initial visit predicted the total number of visits and the total number of referrals. Those with sports-related concussions were younger, had less severely-affected test scores, had fewer visits and types of referrals, and were more likely to have clinical resolution of their concussion and to reach a symptom severity score≤7. This large-scale study of concussion patients supports the use of sideline concussion tests as part of outpatient concussion assessment, especially the total symptom and symptom severity score portions of the SCAT3 and the K-D. Women in this cohort had higher total symptom and symptom severity scores compared to men. Our data also suggest that those with non-sports-related concussions have longer lasting symptoms than those with sports-related concussions, and that these two groups should perhaps be regarded separately when assessing outcomes and needs in a multidisciplinary setting. Copyright © 2017 Elsevier B.V. All rights reserved.
Accelerometry-enabled measurement of walking performance with a robotic exoskeleton: a pilot study.
Lonini, Luca; Shawen, Nicholas; Scanlan, Kathleen; Rymer, William Z; Kording, Konrad P; Jayaraman, Arun
2016-03-31
Clinical scores for evaluating walking skills with lower limb exoskeletons are often based on a single variable, such as distance walked or speed, even in cases where a host of features are measured. We investigated how to combine multiple features such that the resulting score has high discriminatory power, in particular with few patients. A new score is introduced that allows quantifying the walking ability of patients with spinal cord injury when using a powered exoskeleton. Four spinal cord injury patients were trained to walk over ground with the ReWalk™ exoskeleton. Body accelerations during use of the device were recorded by a wearable accelerometer and 4 features to evaluate walking skills were computed. The new score is the Gaussian naïve Bayes surprise, which evaluates patients relative to the features' distribution measured in 7 expert users of the ReWalk™. We compared our score based on all the features with a standard outcome measure, which is based on number of steps only. All 4 patients improved over the course of training, as their scores trended towards the expert users' scores. The combined score (Gaussian naïve surprise) was considerably more discriminative than the one using only walked distance (steps). At the end of training, 3 out of 4 patients were significantly different from the experts, according to the combined score (p < .001, Wilcoxon Signed-Rank Test). In contrast, all but one patient were scored as experts when number of steps was the only feature. Integrating multiple features could provide a more robust metric to measure patients' skills while they learn to walk with a robotic exoskeleton. Testing this approach with other features and more subjects remains as future work.
Semler, Elisa; Anderl-Straub, Sarah; Uttner, Ingo; Diehl-Schmid, Janine; Danek, Adrian; Einsiedler, Beate; Fassbender, Klaus; Fliessbach, Klaus; Huppertz, Hans-Jürgen; Jahn, Holger; Kornhuber, Johannes; Landwehrmeyer, Bernhard; Lauer, Martin; Muche, Rainer; Prudlo, Johannes; Schneider, Anja; Schroeter, Matthias L; Ludolph, Albert C; Otto, Markus
2018-04-25
With upcoming therapeutic interventions for patients with primary progressive aphasia (PPA), instruments for the follow-up of patients are needed to describe disease progression and to evaluate potential therapeutic effects. So far, volumetric brain changes have been proposed as clinical endpoints in the literature, but cognitive scores are still lacking. This study followed disease progression predominantly in language-based performance within 1 year and defined a PPA sum score which can be used in therapeutic interventions. We assessed 28 patients with nonfluent variant PPA, 17 with semantic variant PPA, 13 with logopenic variant PPA, and 28 healthy controls in detail for 1 year. The most informative neuropsychological assessments were combined to a sum score, and associations between brain atrophy were investigated followed by a sample size calculation for clinical trials. Significant absolute changes up to 20% in cognitive tests were found after 1 year. Semantic and phonemic word fluency, Boston Naming Test, Digit Span, Token Test, AAT Written language, and Cookie Test were identified as the best markers for disease progression. These tasks provide the basis of a new PPA sum score. Assuming a therapeutic effect of 50% reduction in cognitive decline for sample size calculations, a number of 56 cases is needed to find a significant treatment effect. Correlations between cognitive decline and atrophy showed a correlation up to r = 0.7 between the sum score and frontal structures, namely the superior and inferior frontal gyrus, as well as with left-sided subcortical structures. Our findings support the high performance of the proposed sum score in the follow-up of PPA and recommend it as an outcome measure in intervention studies.
Test/score/report: Simulation techniques for automating the test process
NASA Technical Reports Server (NTRS)
Hageman, Barbara H.; Sigman, Clayton B.; Koslosky, John T.
1994-01-01
A Test/Score/Report capability is currently being developed for the Transportable Payload Operations Control Center (TPOCC) Advanced Spacecraft Simulator (TASS) system which will automate testing of the Goddard Space Flight Center (GSFC) Payload Operations Control Center (POCC) and Mission Operations Center (MOC) software in three areas: telemetry decommutation, spacecraft command processing, and spacecraft memory load and dump processing. Automated computer control of the acceptance test process is one of the primary goals of a test team. With the proper simulation tools and user interface, the task of acceptance testing, regression testing, and repeatability of specific test procedures of a ground data system can be a simpler task. Ideally, the goal for complete automation would be to plug the operational deliverable into the simulator, press the start button, execute the test procedure, accumulate and analyze the data, score the results, and report the results to the test team along with a go/no recommendation to the test team. In practice, this may not be possible because of inadequate test tools, pressures of schedules, limited resources, etc. Most tests are accomplished using a certain degree of automation and test procedures that are labor intensive. This paper discusses some simulation techniques that can improve the automation of the test process. The TASS system tests the POCC/MOC software and provides a score based on the test results. The TASS system displays statistics on the success of the POCC/MOC system processing in each of the three areas as well as event messages pertaining to the Test/Score/Report processing. The TASS system also provides formatted reports documenting each step performed during the tests and the results of each step. A prototype of the Test/Score/Report capability is available and currently being used to test some POCC/MOC software deliveries. When this capability is fully operational it should greatly reduce the time necessary to test a POCC/MOC software delivery, as well as improve the quality of the test process.
Development and evaluation of learning module on clinical decision-making in Prosthodontics.
Deshpande, Saee; Lambade, Dipti; Chahande, Jayashree
2015-01-01
Best practice strategies for helping students learn the reasoning skills of problem solving and critical thinking (CT) remain a source of conjecture, particularly with regard to CT. The dental education literature is fundamentally devoid of research on the cognitive components of clinical decision-making. This study was aimed to develop and evaluate the impact of blended learning module on clinical decision-making skills of dental graduates for planning prosthodontics rehabilitation. An interactive teaching module consisting of didactic lectures on clinical decision-making and a computer-assisted case-based treatment planning software was developed Its impact on cognitive knowledge gain in clinical decision-making was evaluated using an assessment involving problem-based multiple choice questions and paper-based case scenarios. Mean test scores were: Pretest (17 ± 1), posttest 1 (21 ± 2) and posttest 2 (43 ± 3). Comparison of mean scores was done with one-way ANOVA test. There was overall significant difference in between mean scores at all the three points (P < 0.001). A pair-wise comparison of mean scores was done with Bonferroni test. The mean difference is significant at the 0.05 level. The pair-wise comparison shows that posttest 2 score is significantly higher than posttest 1 and posttest 1 is significantly higher than pretest that is, pretest 2 > posttest 1 > pretest. Blended teaching methods employing didactic lectures on the clinical decision-making as well as computer assisted case-based learning can be used to improve quality of clinical decision-making in prosthodontic rehabilitation for dental graduates.
ERIC Educational Resources Information Center
Burk, Anne
An ex post facto study examined third grade students' achievement test scores both before and after the adoption of a literature-based basal reading text. The experimental groups consisted of five third grade classes at Terre Town Elementary School (Indiana) for each of the years 1988 through 1993. Mean scores were plotted and data were visually…
ERIC Educational Resources Information Center
Pullin, Diana
2013-01-01
A growing number of states and local schools across the country have adopted educator evaluation and accountability programs based on the use of student test scores and value-added models (VAM). A wide array of potential legal issues could arise from the implementation of these programs. This article uses legal analysis and social science evidence…
ERIC Educational Resources Information Center
Iadevaia, David G.
A study was conducted at Pima Community College to determine the relationship between the final grade received by students in an introductory, algebra-based physics course (PHY 121) and their scores on the reading, writing, and mathematics portions of the college's nonmandatory assessment test. Between 1983 and 1988, 639 students obtained a final…
ERIC Educational Resources Information Center
Guarino, Cassandra; Reckase, Mark D.; Wooldridge, Jeffrey M.
2013-01-01
The push for accountability in public schooling has extended to the measurement of teacher performance, accelerated by federal efforts through Race to the Top. Currently, a large number of states and districts across the country are computing measures of teacher performance based on the standardized test scores of their students and using them to…
ERIC Educational Resources Information Center
Stiefel, Leanna; Schwartz, Amy Ellen; Portas, Carole; Kim, Dae Yeop
2003-01-01
Analyzes the impact of Performance Driven Budgeting (PDB), a school-based budgeting initiative, on student test scores in the fourth and fifth grades and on spending patterns in selected New York City schools. Finds that PDB has a positive effect on some student test scores and leads to a change in the mix of spending, but not its level. (Contains…
NASA Astrophysics Data System (ADS)
Supriyanti, F. M. T.; Halimatul, H. S.
2018-05-01
This study aims to enhance chemistry students’ creative thinking skills using material from local resources on protein qualitative test experiment (LMBE). In this study, a quasi experiment method using one group pretest-postest non-equivalen control group design was carried out on the effectiveness of local material-based experiment approach. The data was collected using the test consists of five assay test and student work sheet (LKM). The effectiveness of the local material-based experiment was tested by means of percentage of normalized gain
Haralambos, K; Whatley, S D; Edwards, R; Gingell, R; Townsend, D; Ashfield-Watt, P; Lansberg, P; Datta, D B N; McDowell, I F W
2015-05-01
Familial Hypercholesterolaemia (FH) is caused by mutations in genes of the Low Density Lipoprotein (LDL) receptor pathway. A definitive diagnosis of FH can be made by the demonstration of a pathogenic mutation. The Wales FH service has developed scoring criteria to guide selection of patients for DNA testing, for those referred to clinics with hypercholesterolaemia. The criteria are based on a modification of the Dutch Lipid Clinic scoring criteria and utilise a combination of lipid values, physical signs, personal and family history of premature cardiovascular disease. They are intended to provide clinical guidance and enable resources to be targeted in a cost effective manner. 623 patients who presented to lipid clinics across Wales had DNA testing following application of these criteria. The proportion of patients with a pathogenic mutation ranged from 4% in those scoring 5 or less up to 85% in those scoring 15 or more. LDL-cholesterol was the strongest discriminatory factor. Scores gained from physical signs, family history, coronary heart disease, and triglycerides also showed a gradient in mutation pick-up rate according to the score. These criteria provide a useful tool to guide selection of patients for DNA testing when applied by health professionals who have clinical experience of FH. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
E-Beam Capture Aid Drawing Based Modelling on Cell Biology
NASA Astrophysics Data System (ADS)
Hidayat, T.; Rahmat, A.; Redjeki, S.; Rahman, T.
2017-09-01
The objectives of this research are to find out how far Drawing-based Modeling assisted with E-Beam Capture could support student’s scientific reasoning skill using Drawing - based Modeling approach assisted with E-Beam Capture. The research design that is used for this research is the Pre-test and Post-test Design. The data collection of scientific reasoning skills is collected by giving multiple choice questions before and after the lesson. The data analysis of scientific reasoning skills is using scientific reasoning assessment rubric. The results show an improvement of student’s scientific reasoning in every indicator; an improvement in generativity which shows 2 students achieving high scores, 3 students in elaboration reasoning, 4 students in justification, 3 students in explanation, 3 students in logic coherency, 2 students in synthesis. The research result in student’s explanation reasoning has the highest number of students with high scores, which shows 20 students with high scores in the pre-test and 23 students in post-test and synthesis reasoning shows the lowest number, which shows 1 student in the pretest and 3 students in posttest. The research result gives the conclusion that Drawing-based Modeling approach assisted with E-Beam Capture could not yet support student’s scientific reasoning skills comprehensively.
A Milestone-Based Evaluation System-The Cure for Grade Inflation?
Kuo, Lindsay E; Hoffman, Rebecca L; Morris, Jon B; Williams, Noel N; Malachesky, Mark; Huth, Laura E; Kelz, Rachel R
2015-01-01
Controversy exists over the optimal use of the Milestones in the process of resident evaluation and feedback. We sought to evaluate the performance of a Milestones-based feedback system in comparison to a traditional model. The traditional evaluation system (TES) consisted of a generic 16-item survey using a 5-point Likert scale ranging from 1 to 5, and a free-text comments section. The Milestones-based evaluation system (MBES) was launched in July 2014, ranging from 0 to 4. Individual milestones were mapped to rotations based on resident educational goals by postgraduate year (PGY). The MBES consisted of a survey with a maximum of 7 items, followed by a free-text comment section. Within each evaluation system, an overall composite score was calculated for each categorical general surgical resident. To scale the 2 systems for comparison, TES scores were adjusted downward by 1 point. Descriptive statistics were performed. Univariate analysis was performed with the Wilcoxon signed-rank test. A test for trend across PGY was used for the MBES only. In the traditional system, the median score was 3.66 (range: 3.2-4.0). There was no meaningful difference in the median score by PGY. In the new system, the median score was 2.69 (range: 1.5-3.7, p < 0.01). The median score differed across PGY and increased by PGY of training (p < 0.01). There was an increase in differences between median scores by PGY. On using the milestones to facilitate faculty evaluation of resident knowledge and skill, there was a trend in increasing score by PGY of training. In the MBES, scores could be used to better discriminate resident skill and knowledge levels and resulted in improved differentiation in scoring by PGY. The use of the milestones as a basis for evaluation enabled the program to provide more meaningful feedback to residents and represents an improvement in surgical education. Copyright © 2015 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
Aiyenigba, Bolatito; Ojo, Abiodun; Aisiri, Adolor; Uzim, Justus; Adeusi, Oluwole; Mwenesi, Halima
2017-01-01
Rapid and precise diagnosis of malaria is an essential element in effective case management and control of malaria. Malaria microscopy is used as the gold standard for malaria diagnosis, however results remain poor as positivity rate in Nigeria is consistently over 90%. The United States President's Malaria Initiative (PMI) through the Malaria Action Program for States (MAPS) supported selected states in Nigeria to build capacity for malaria microscopy. This study demonstrates the effectiveness of in-service training on malaria microscopy amongst medical laboratory scientists. The training was based on the World Health Organization (WHO) basic microscopy training manual. The 10-day training utilized a series of didactic lectures and examination of teaching slides using a CX 21 Olympus binocular microscope. All 108 medical laboratory scientists trained from 2012 to 2015 across five states in Nigeria supported by PMI were included in the study. Evaluation of the training using a pre-and post-test method was based on written test questions; reading photographic slide images of malaria parasites; and prepared slides. There was a significant improvement in the mean written pre-and post-tests scores from 37.9% (95% CI 36.2-39.6%) to 70.7% (95% CI 68.4-73.1%) ( p < 0.001). The mean counting post-test score improved significantly from 4.2% (95% CI 2.6-5.7%) to 27.9% (95% CI 25.3-30.5%) ( p < 0.001). Mean post-test score for computer-based picture speciation test (63.0%) and picture detection test (89.2%) were significantly higher than the mean post-test score for slide reading speciation test (38.3%) and slide reading detection test (70.7%), p < 0.001 in both cases. Parasite detection and speciation using enhanced visual imaging was significantly improved compared with using direct microscopy. Regular in-service training and provision of functional and high resolution microscopes are needed to ensure quality routine malaria microscopy.
Outcomes of a pilates-based intervention for individuals with lateral epicondylosis: A pilot study.
Dale, Lucinda M; Mikuski, Connie; Miller, Jacqueline
2015-01-01
Core stability and flexibility, features of Pilates exercise, can reduce loads to the upper extremities. Reducing loads is essential to improve symptoms for individuals with lateral epicondylosis. Although Pilates exercise has gained popularity in healthy populations, it has not been studied for individuals with lateral epicondylosis. The purpose of this study was to determine if adding Pilates-based intervention to standard occupational therapy intervention improved outcomes as measured by the Patient-Rated Tennis Elbow Evaluation (PRTEE) more than standard intervention for individuals with lateral epicondylosis. Participants (N= 17) were randomized to the standard intervention group or Pilates-based intervention group. All participants received standard intervention. The Pilates-based intervention group additionally completed abdominal strengthening, postural correction, and flexibility. For both groups, paired t-tests showed significantly improved PRTEE scores, 38.1 for the Pilates-based intervention group, and 22.9 for the standard intervention group. Paired t-test showed significantly improved provocative grip strength and pain for both groups. Independent t-tests showed no significant difference between groups in improved scores of PRTEE, pain, and provocative grip. Although the Pilates-based intervention group showed greater improvement in PRTEE outcome, provocative grip, and pain, scores were not significantly better than those of the standard intervention group, warranting further research.
Description, measurement and evaluation of tertiary-education food environments.
Roy, R; Hebden, L; Kelly, B; De Gois, T; Ferrone, E M; Samrout, M; Vermont, S; Allman-Farinelli, M
2016-05-01
Obesity in young adults is an increasing health problem in Australia and many other countries. Evidence-based information is needed to guide interventions that reduce the obesity-promoting elements in tertiary-education environments. In a food environmental audit survey, 252 outlets were audited across seven institutions: three universities and four technical and further education institutions campuses. A scoring instrument called the food environment-quality index was developed and used to assess all food outlets on these campuses. Information was collated on the availability, accessibility and promotion of foods and beverages and a composite score (maximum score=148; higher score indicates healthier outlets) was calculated. Each outlet and the overall campus were ranked into tertiles based on their 'healthiness'. Differences in median scores for each outcome measure were compared between institutions and outlet types using one-way ANOVA with post hoc Scheffe's testing, χ 2 tests, Kruskal-Wallis H test and the Mann-Whitney U test. Binomial logistic regressions were used to compare the proportion of healthy v. unhealthy food categories across different types of outlets. Overall, the most frequently available items were sugar-sweetened beverages (20 % of all food/drink items) followed by chocolates (12 %), high-energy (>600 kJ/serve) foods (10 %), chips (10 %) and confectionery (10 %). Healthy food and beverages were observed to be less available, accessible and promoted than unhealthy options. The median score across all outlets was 72 (interquartile range=7). Tertiary-education food environments are dominated by high-energy, nutrient-poor foods and beverages. Interventions to decrease availability, accessibility and promotion of unhealthy foods are needed.
[Lack of correlation between performances in a simulator and in reality].
Konge, Lars; Bitsch, Mikael
2010-12-13
Simulation-based training provides obvious benefits for patients and doctors in education. Frequently, virtual reality simulators are expensive and evidence for their efficacy is poor, particularly as a result of studies with poor methodology and few test participants. In medical simulated training- and evaluation programmes it is always a question of transfer to the real clinical world. To illustrate this problem a study comparing the test performance of persons on a bowling simulator with their performance in a real bowling alley was conducted. Twenty-five test subjects played two rounds of bowling on a Nintendo Wii and 25 days later on a real bowling alley. Correlations of the scores in the first and second round (test-retest-reliability) and of the scores on the simulator and in reality (criterion validation) were studied and there was tested for any difference between female and male performance. The intraclass correlation coefficient equalled 0.76, i.e. the simulator fairly accurately measured participant performance. In contrast to this there was absolutely no correlation between participants' real bowling abilities and their scores on the simulator (Pearson's r = 0.06). There was no significant difference between female and male abilities. Simulation-based testing and training must be based on evidence. More studies are needed to include an adequate number of subjects. Bowling competence should not be based on Nintendo Wii measurements. Simulated training- and evaluation programmes should be validated before introduction, to ensure consistency with the real world.
Effects of Nintendo Wii-Fit® video games on balance in children with mild cerebral palsy.
Tarakci, Devrim; Ersoz Huseyinsinoglu, Burcu; Tarakci, Ela; Razak Ozdincler, Arzu
2016-10-01
This study compared the effects of Nintendo Wii-Fit ® balance-based video games and conventional balance training in children with mild cerebral palsy (CP). This randomized controlled trial involved 30 ambulatory pediatric patients (aged 5-18 years) with CP. Participants were randomized to either conventional balance training (control group) or to Wii-Fit balance-based video games training (Wii group). Both group received neuro-developmental treatment (NDT) during 24 sessions. In addition, while the control group received conventional balance training in each session, the Wii group played Nintendo Wii Fit games such as ski slalom, tightrope walk and soccer heading on balance board. Primary outcomes were Functional Reach Test (forward and sideways), Sit-to-Stand Test and Timed Get up and Go Test. Nintendo Wii Fit balance, age and game scores, 10 m walk test, 10-step climbing test and Wee-Functional Independence Measure (Wee FIM) were secondary outcomes. After the treatment, changes in balance scores and independence level in activities of daily living were significant (P < 0.05) in both groups. Statistically significant improvements were found in the Wii-based game group compared with the control group in all balance tests and total Wee FIM score (P < 0.05). Wii-fit balance-based video games are better at improving both static and performance-related balance parameters when combined with NDT treatment in children with mild CP. © 2016 Japan Pediatric Society.
1985-04-01
EM 32 12 MICROCOP REOUTO TETCHR NTOA B URA FSA4ARS16- AFHRL-TR-84-64 9 AIR FORCE 6 __ H EQUIPERCENTILE TEST EQUATING: THE EFFECTS OF PRESMOOTHING AND...combined or compound presmoother and a presmoothing method based on a particular model of test scores. Of the seven methods of presmoothing the score...unsmoothed distributions, the smoothing of that sequence of differences by the same compound method, and, finally, adding the smoothed differences back
Chowriappa, Ashirwad J; Shi, Yi; Raza, Syed Johar; Ahmed, Kamran; Stegemann, Andrew; Wilding, Gregory; Kaouk, Jihad; Peabody, James O; Menon, Mani; Hassett, James M; Kesavadas, Thenkurussi; Guru, Khurshid A
2013-12-01
A standardized scoring system does not exist in virtual reality-based assessment metrics to describe safe and crucial surgical skills in robot-assisted surgery. This study aims to develop an assessment score along with its construct validation. All subjects performed key tasks on previously validated Fundamental Skills of Robotic Surgery curriculum, which were recorded, and metrics were stored. After an expert consensus for the purpose of content validation (Delphi), critical safety determining procedural steps were identified from the Fundamental Skills of Robotic Surgery curriculum and a hierarchical task decomposition of multiple parameters using a variety of metrics was used to develop Robotic Skills Assessment Score (RSA-Score). Robotic Skills Assessment mainly focuses on safety in operative field, critical error, economy, bimanual dexterity, and time. Following, the RSA-Score was further evaluated for construct validation and feasibility. Spearman correlation tests performed between tasks using the RSA-Scores indicate no cross correlation. Wilcoxon rank sum tests were performed between the two groups. The proposed RSA-Score was evaluated on non-robotic surgeons (n = 15) and on expert-robotic surgeons (n = 12). The expert group demonstrated significantly better performance on all four tasks in comparison to the novice group. Validation of the RSA-Score in this study was carried out on the Robotic Surgical Simulator. The RSA-Score is a valid scoring system that could be incorporated in any virtual reality-based surgical simulator to achieve standardized assessment of fundamental surgical tents during robot-assisted surgery. Copyright © 2013 Elsevier Inc. All rights reserved.
Personality and Bulimic Symptomatology.
ERIC Educational Resources Information Center
Janzen, B. L.; And Others
1993-01-01
Examined relationship between bulimic symptomatology as measured by scores on Bulimia Test-Revised (BULIT-R) and personality characteristics based on Eysenck Personality Questionnaire-Revised in nonclinical sample of 166 female college students. Obtained relationship between Neuroticism, Addictiveness and scores on BULIT-R. (Author/NB)
Psychometric Properties of IRT Proficiency Estimates
ERIC Educational Resources Information Center
Kolen, Michael J.; Tong, Ye
2010-01-01
Psychometric properties of item response theory proficiency estimates are considered in this paper. Proficiency estimators based on summed scores and pattern scores include non-Bayes maximum likelihood and test characteristic curve estimators and Bayesian estimators. The psychometric properties investigated include reliability, conditional…
Stepniak, Camilla; Wickens, Brandon; Husein, Murad; Paradis, Josee; Ladak, Hanif M; Fung, Kevin; Agrawal, Sumit K
2017-06-01
OtoTrain is a Web-based otoscopy simulator that has previously been shown to have face and content validity. The objective of this study was to evaluate the effectiveness of this Web-based otoscopy simulator in teaching diagnostic otoscopy to novice learners STUDY DESIGN: Prospective, blinded randomized control trial. Second-year medical students were invited to participate in the study. A pretest consisted of a series of otoscopy videos followed by an open-answer format assessment pertaining to the characteristics and diagnosis of each video. Participants were then randomly divided into a control group and a simulator group. Following the pretest, both groups attended standard otology lectures, but the simulator group was additionally given unlimited access to OtoTrain for 1 week. A post-test was completed using a separate set of otoscopy videos. Tests were graded based on a comprehensive marking scheme. The pretest and post-test were anonymized, and the three evaluators were blinded to student allotment. A total of 41 medical students were enrolled in the study and randomized to the control group (n = 20) and the simulator group (n = 21). There was no significant difference between the two groups on their pretest scores. With the standard otology lectures, the control group had a 31% improvement in their post-test score (mean ± standard error of the mean, 30.4 ± 1.5) compared with their pretest score (23.3 ± 1.8) (P < .001). The simulator group had the addition of OtoTrain to the otology lectures, and their score improved by 71% on their post-test (37.8 ± 1.6) compared to their pretest (22.1 ± 1.9) (P < .001). Comparing the post-test results, the simulator group had a 24% higher score than the control group (P < .002). Inter-rater reliability between the blinded evaluators was excellent (r = 0.953, P < .001). The use of OtoTrain increased the diagnostic otoscopic performance in novice learners. OtoTrain may be an effective teaching adjunct for undergraduate medical students. 1b. Laryngoscope, 127:1306-1311, 2017. © 2016 The American Laryngological, Rhinological and Otological Society, Inc.
The effect of dyad versus individual simulation-based ultrasound training on skills transfer.
Tolsgaard, Martin G; Madsen, Mette E; Ringsted, Charlotte; Oxlund, Birgitte S; Oldenburg, Anna; Sorensen, Jette L; Ottesen, Bent; Tabor, Ann
2015-03-01
Dyad practice may be as effective as individual practice during clinical skills training, improve students' confidence, and reduce costs of training. However, there is little evidence that dyad training is non-inferior to single-student practice in terms of skills transfer. This study was conducted to compare the effectiveness of simulation-based ultrasound training in pairs (dyad practice) with that of training alone (single-student practice) on skills transfer. In a non-inferiority trial, 30 ultrasound novices were randomised to dyad (n = 16) or single-student (n = 14) practice. All participants completed a 2-hour training programme on a transvaginal ultrasound simulator. Participants in the dyad group practised together and took turns as the active practitioner, whereas participants in the single group practised alone. Performance improvements were evaluated through pre-, post- and transfer tests. The transfer test involved the assessment of a transvaginal ultrasound scan by one of two clinicians using the Objective Structured Assessment of Ultrasound Skills (OSAUS). Thirty participants completed the simulation-based training and 24 of these completed the transfer test. Dyad training was found to be non-inferior to single-student training: transfer test OSAUS scores were significantly higher than the pre-specified non-inferiority margin (delta score 7.8%, 95% confidence interval -3.8-19.6%; p = 0.04). More dyad (71.4%) than single (30.0%) trainees achieved OSAUS scores above a pre-established pass/fail level in the transfer test (p = 0.05). There were significant differences in performance scores before and after training in both groups (pre- versus post-test, p < 0.01) with large effect sizes (Cohen's d = 3.85) and no significant interactions between training type and performance (p = 0.59). The dyad group demonstrated higher training efficiency in terms of simulator score per number of attempts compared with the single-student group (p = 0.03). Dyad practice improves the efficiency of simulation-based training and is non-inferior to individual practice in terms of skills transfer. © 2015 John Wiley & Sons Ltd.
A job-related fitness test for the Dutch police.
Strating, M; Bakker, R H; Dijkstra, G J; Lemmink, K A P M; Groothoff, J W
2010-06-01
The variety of tasks that characterize police work highlights the importance of being in good physical condition. To take a first step at standardizing the administration of a job-related test to assess a person's ability to perform the physical demands of the core tasks of police work. The principal research questions were: are test scores related to gender, age and function and are test scores related to body mass index (BMI) and the number of hours of physical exercise? Data of 6999 police officers, geographically spread over all parts of The Netherlands, who completed a physical competence test over a 1 year period were analysed. Women performed the test significantly more slowly than men. The mean test score was also related to age; the older a person the longer it took to complete the test. A higher BMI was associated with less hours of body exercise a week and a slower test performance, both in women and men. The differences in individual test scores, based on gender and age, have implications for future strategy within the police force. From a viewpoint of 'same job, same standard' one has to accept that test-score differences may lead to the exclusion of certain staff. However, from a viewpoint of 'diversity as a business issue', one may have to accept that on average, both female and older police officers are physically less tailored to their jobs than their male and younger colleagues.
A web-based simulation of a longitudinal clinic used in a 4-week ambulatory rotation: a cohort study
Wong, Rene WG; Lochnan, Heather A
2009-01-01
Background Residency training takes place primarily on inpatient wards. In the absence of a resident continuity clinic, internal medicine residents rely on block rotations to learn about continuity of care. Alternate methods to introduce continuity of care are needed. Methods A web-based tool, Continuity of Care Online Simulations (COCOS), was designed for use in a one-month, postgraduate clinical rotation in endocrinology. It is an interactive tool that simulates the continuing care of any patient with a chronic endocrine disease. Twenty-three residents in internal medicine participated in a study to investigate the effects of using COCOS during a clinical rotation in endocrinology on pre-post knowledge test scores and self-assessment of confidence. Results Compared to residents who did the rotation alone, residents who used COCOS during the rotation had significantly higher improvements in test scores (% increase in pre-post test scores +21.6 [standard deviation, SD, 8.0] vs. +5.9 [SD 6.8]; p < .001). Test score improvements were most pronounced for less commonly seen conditions. There were no significant differences in changes in confidence. Residents rated COCOS very highly, recommending its use as a standard part of the rotation and throughout residency. Conclusion A stand-alone web-based tool can be incorporated into an existing clinical rotation to help residents learn about continuity of care. It has the most potential to teach residents about topics that are less commonly seen during a clinical rotation. The adaptable, web-based format allows the creation of cases for most chronic medical conditions. PMID:19187554
ERIC Educational Resources Information Center
Atkinson, Becky M.
2012-01-01
The study reported in this article examines how teachers read and respond to their students' Stanford Achievement Test 10 (SAT 10) scores with the goal of investigating the assumption that data-based teaching practice is more "objective" and less susceptible to divergent teacher interpretation. The study uses reader response theory to…
Gibbons, Theodore R; Mount, Stephen M; Cooper, Endymion D; Delwiche, Charles F
2015-07-10
Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory.
Merchant, Roland C; Clark, Melissa A; Mayer, Kenneth H; Seage Iii, George R; DeGruttola, Victor G; Becker, Bruce M
2009-02-01
Video-based delivery of human immunodeficiency virus (HIV) pretest information might assist in streamlining HIV screening and testing efforts in the emergency department (ED). The objectives of this study were to determine if the video "Do you know about rapid HIV testing?" is an acceptable alternative to an in-person information session on rapid HIV pretest information, in regard to comprehension of rapid HIV pretest fundamentals, and to identify patients who might have difficulties in comprehending pretest information. This was a noninferiority trial of 574 participants in an ED opt-in rapid HIV screening program who were randomly assigned to receive identical pretest information from either an animated and live-action 9.5-minute video or an in-person information session. Pretest information comprehension was assessed using a questionnaire. The video would be accepted as not inferior to the in-person information session if the 95% confidence interval (CI) of the difference (Delta) in mean scores on the questionnaire between the two information groups was less than a 10% decrease in the in-person information session arm's mean score. Linear regression models were constructed to identify patients with lower mean scores based upon study arm assignment, demographic characteristics, and history of prior HIV testing. The questionnaire mean scores were 20.1 (95% CI = 19.7 to 20.5) for the video arm and 20.8 (95% CI = 20.4 to 21.2) for the in-person information session arm. The difference in mean scores compared to the mean score for the in-person information session met the noninferiority criterion for this investigation (Delta = 0.68; 95% CI = 0.18 to 1.26). In a multivariable linear regression model, Blacks/African Americans, Hispanics, and those with Medicare and Medicaid insurance exhibited slightly lower mean scores, regardless of the pretest information delivery format. There was a strong relationship between fewer years of formal education and lower mean scores on the questionnaire. Age, gender, type of insurance, partner/marital status, and history of prior HIV testing were not predictive of scores on the questionnaire. In terms of patient comprehension of rapid HIV pretest information fundamentals, the video was an acceptable substitute to pretest information delivered by an HIV test counselor. Both the video and the in-person information session were less effective in providing pretest information for patients with fewer years of formal education.
Kuhn, Andrew Warren; Solomon, Gary S
2014-01-01
Computerized neuropsychological testing batteries have provided a time-efficient and cost-efficient way to assess and manage the neurocognitive aspects of patients with sport-related concussion. These tests are straightforward and mostly self-guided, reducing the degree of clinician involvement required by traditional clinical neuropsychological paper-and-pencil tests. To determine if self-reported supervision status affected computerized neurocognitive baseline test performance in high school athletes. Retrospective cohort study. Supervised testing took place in high school computer libraries or sports medicine clinics. Unsupervised testing took place at the participant's home or another location with computer access. From 2007 to 2012, high school athletes across middle Tennessee (n = 3771) completed computerized neurocognitive baseline testing (Immediate Post-Concussion Assessment and Cognitive Testing [ImPACT]). They reported taking the test either supervised by a sports medicine professional or unsupervised. These athletes (n = 2140) were subjected to inclusion and exclusion criteria and then matched based on age, sex, and number of prior concussions. We extracted demographic and performance-based data from each de-identified baseline testing record. Paired t tests were performed between the self-reported supervised and unsupervised groups, comparing the following ImPACT baseline composite scores: verbal memory, visual memory, visual motor (processing) speed, reaction time, impulse control, and total symptom score. For differences that reached P < .05, the Cohen d was calculated to measure the effect size. Lastly, a χ(2) analysis was conducted to compare the rate of invalid baseline testing between the groups. All statistical tests were performed at the 95% confidence interval level. Self-reported supervised athletes demonstrated better visual motor (processing) speed (P = .004; 95% confidence interval [0.28, 1.52]; d = 0.12) and faster reaction time (P < .001; 95% confidence interval [-0.026, -0.014]; d = 0.21) composite scores than self-reported unsupervised athletes. Speed-based tasks were most affected by self-reported supervision status, although the effect sizes were relatively small. These data lend credence to the hypothesis that supervision status may be a factor in the evaluation of ImPACT baseline test scores.
Soble, Jason R; Bain, Kathleen M; Bailey, K Chase; Kirton, Joshua W; Marceaux, Janice C; Critchfield, Edan A; McCoy, Karin J M; O'Rourke, Justin J F
2018-01-08
Embedded performance validity tests (PVTs) allow for continuous assessment of invalid performance throughout neuropsychological test batteries. This study evaluated the utility of the Wechsler Memory Scale-Fourth Edition (WMS-IV) Logical Memory (LM) Recognition score as an embedded PVT using the Advanced Clinical Solutions (ACS) for WAIS-IV/WMS-IV Effort System. This mixed clinical sample was comprised of 97 total participants, 71 of whom were classified as valid and 26 as invalid based on three well-validated, freestanding criterion PVTs. Overall, the LM embedded PVT demonstrated poor concordance with the criterion PVTs and unacceptable psychometric properties using ACS validity base rates (42% sensitivity/79% specificity). Moreover, 15-39% of participants obtained an invalid ACS base rate despite having a normatively-intact age-corrected LM Recognition total score. Receiving operating characteristic curve analysis revealed a Recognition total score cutoff of < 61% correct improved specificity (92%) while sensitivity remained weak (31%). Thus, results indicated the LM Recognition embedded PVT is not appropriate for use from an evidence-based perspective, and that clinicians may be faced with reconciling how a normatively intact cognitive performance on the Recognition subtest could simultaneously reflect invalid performance validity.
Thibodeau, Michel A; Leonard, Rachel C; Abramowitz, Jonathan S; Riemann, Bradley C
2015-12-01
The Dimensional Obsessive-Compulsive Scale (DOCS) is a promising measure of obsessive-compulsive disorder (OCD) symptoms but has received minimal psychometric attention. We evaluated the utility and reliability of DOCS scores. The study included 832 students and 300 patients with OCD. Confirmatory factor analysis supported the originally proposed four-factor structure. DOCS total and subscale scores exhibited good to excellent internal consistency in both samples (α = .82 to α = .96). Patient DOCS total scores reduced substantially during treatment (t = 16.01, d = 1.02). DOCS total scores discriminated between students and patients (sensitivity = 0.76, 1 - specificity = 0.23). The measure did not exhibit gender-based differential item functioning as tested by Mantel-Haenszel chi-square tests. Expected response options for each item were plotted as a function of item response theory and demonstrated that DOCS scores incrementally discriminate OCD symptoms ranging from low to extremely high severity. Incremental differences in DOCS scores appear to represent unbiased and reliable differences in true OCD symptom severity. © The Author(s) 2014.
Cartier, Vanessa; Inan, Cigdem; Zingg, Walter; Delhumeau, Cecile; Walder, Bernard; Savoldelli, Georges L
2016-08-01
Multimodal educational interventions have been shown to improve short-term competency in, and knowledge of central venous catheter (CVC) insertion. To evaluate the effectiveness of simulation-based medical education training in improving short and long-term competency in, and knowledge of CVC insertion. Before and after intervention study. University Geneva Hospital, Geneva, Switzerland, between May 2008 and January 2012. Residents in anaesthesiology aware of the Seldinger technique for vascular puncture. Participants attended a half-day course on CVC insertion. Learning objectives included work organization, aseptic technique and prevention of CVC complications. CVC insertion competency was tested pretraining, posttraining and then more than 2 years after training (sustainability phase). The primary study outcome was competency as measured by a global rating scale of technical skills, a hand hygiene compliance score and a checklist compliance score. Secondary outcome was knowledge as measured by a standardised pretraining and posttraining multiple-choice questionnaire. Statistical analyses were performed using paired Student's t test or Wilcoxon signed-rank test. Thirty-seven residents were included; 18 were tested in the sustainability phase (on average 34 months after training). The average global rating of skills was 23.4 points (±SD 4.08) before training, 32.2 (±4.51) after training (P < 0.001 for comparison with pretraining scores) and 26.5 (±5.34) in the sustainability phase (P = 0.040 for comparison with pretraining scores). The average hand hygiene compliance score was 2.8 (±1.0) points before training, 5.0 (±1.04) after training (P < 0.001 for comparison with pretraining scores) and 3.7 (±1.75) in the sustainability phase (P = 0.038 for comparison with pretraining scores). The average checklist compliance was 14.9 points (±2.3) before training, 19.9 (±1.06) after training (P < 0.001 for comparison with pretraining scores) and 17.4 (±1.41) (P = 0.002 for comparison with pretraining scores). The percentage of correct answers in the multiple-choice questionnaire increased from 76.0% (±7.9) before training to 87.7% (±4.4) after training (P < 0.001). Simulation-based medical education training was effective in improving short and long-term competency in, and knowledge of CVC insertion.
Program evaluation of Protovation Camp
NASA Astrophysics Data System (ADS)
Healy, Laurel Lynell Martin
The purpose of this program evaluation was to determine the extent to which Protovation Camp utilized the combined resources of multiple institutions to impact student learning in science, technology, engineering, and math. The partnership consisted of multiple institutions: the university, providing graduate students to facilitate inquiry-based lessons; the science center, allowing the use of their facilities and resources; and the elementary school, contributing rising third through fifth grade campers. All of these components were examined. The mixed-methods approach used post hoc quantitative data for campers, which consisted of pre-test and post-test scores on the Test of Science-Related Attitudes (TOSRA), the Draw-A-Scientist Test, and content tests based on the camp activities. Additionally, TOSRA scores and current survey results for the graduate students were used along with qualitative data collected from plusdelta charts to determine the impact of participation in Protovation Camp on teachers and students. Results of the program evaluation indicated that when students were taught inquiry-based lessons that ignite wonder, both their attitudes toward science and their knowledge about science improved. An implication for teacher preparation programs was that practicing inquiry-based lessons on actual students (campers) was an important component for teachers (graduate students) as they prepare to positively impact student learning in their own classrooms. Immediate feedback from the campers in the form of pre-test and post-test scores and from peers on plusdelta charts allowed the graduate students the opportunity to make needed adjustments to improve effectiveness before using the lesson with a new set of campers or later in their own classrooms. Keywords. Teacher preparation, Inquiry-based instruction, STEM instructions, University and museum partnerships
Ferreira, António Miguel; Marques, Hugo; Tralhão, António; Santos, Miguel Borges; Santos, Ana Rita; Cardoso, Gonçalo; Dores, Hélder; Carvalho, Maria Salomé; Madeira, Sérgio; Machado, Francisco Pereira; Cardim, Nuno; de Araújo Gonçalves, Pedro
2016-11-01
Current guidelines recommend the use of the Modified Diamond-Forrester (MDF) method to assess the pre-test likelihood of obstructive coronary artery disease (CAD). We aimed to compare the performance of the MDF method with two contemporary algorithms derived from multicenter trials that additionally incorporate cardiovascular risk factors: the calculator-based 'CAD Consortium 2' method, and the integer-based CONFIRM score. We assessed 1069 consecutive patients without known CAD undergoing coronary CT angiography (CCTA) for stable chest pain. Obstructive CAD was defined as the presence of coronary stenosis ≥50% on 64-slice dual-source CT. The three methods were assessed for calibration, discrimination, net reclassification, and changes in proposed downstream testing based upon calculated pre-test likelihoods. The observed prevalence of obstructive CAD was 13.8% (n=147). Overestimations of the likelihood of obstructive CAD were 140.1%, 9.8%, and 18.8%, respectively, for the MDF, CAD Consortium 2 and CONFIRM methods. The CAD Consortium 2 showed greater discriminative power than the MDF method, with a C-statistic of 0.73 vs. 0.70 (p<0.001), while the CONFIRM score did not (C-statistic 0.71, p=0.492). Reclassification of pre-test likelihood using the 'CAD Consortium 2' or CONFIRM scores resulted in a net reclassification improvement of 0.19 and 0.18, respectively, which would change the diagnostic strategy in approximately half of the patients. Newer risk factor-encompassing models allow for a more precise estimation of pre-test probabilities of obstructive CAD than the guideline-recommended MDF method. Adoption of these scores may improve disease prediction and change the diagnostic pathway in a significant proportion of patients. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Busch, Robyn M.; Lineweaver, Tara T.; Ferguson, Lisa; Haut, Jennifer S.
2015-01-01
Reliable change index scores (RCIs) and standardized regression-based change score norms (SRBs) permit evaluation of meaningful changes in test scores following treatment interventions, like epilepsy surgery, while accounting for test-retest reliability, practice effects, score fluctuations due to error, and relevant clinical and demographic factors. Although these methods are frequently used to assess cognitive change after epilepsy surgery in adults, they have not been widely applied to examine cognitive change in children with epilepsy. The goal of the current study was to develop RCIs and SRBs for use in children with epilepsy. Sixty-three children with epilepsy (age range 6–16; M=10.19, SD=2.58) underwent comprehensive neuropsychological evaluations at two time points an average of 12 months apart. Practice adjusted RCIs and SRBs were calculated for all cognitive measures in the battery. Practice effects were quite variable across the neuropsychological measures, with the greatest differences observed among older children, particularly on the Children’s Memory Scale and Wisconsin Card Sorting Test. There was also notable variability in test-retest reliabilities across measures in the battery, with coefficients ranging from 0.14 to 0.92. RCIs and SRBs for use in assessing meaningful cognitive change in children following epilepsy surgery are provided for measures with reliability coefficients above 0.50. This is the first study to provide RCIs and SRBs for a comprehensive neuropsychological battery based on a large sample of children with epilepsy. Tables to aid in evaluating cognitive changes in children who have undergone epilepsy surgery are provided for clinical use. An excel sheet to perform all relevant calculations is also available to interested clinicians or researchers. PMID:26043163
Development and validation of a food-based diet quality index for New Zealand adolescents
2013-01-01
Background As there is no population-specific, simple food-based diet index suitable for examination of diet quality in New Zealand (NZ) adolescents, there is a need to develop such a tool. Therefore, this study aimed to develop an adolescent-specific diet quality index based on dietary information sourced from a Food Questionnaire (FQ) and examine its validity relative to a four-day estimated food record (4DFR) obtained from a group of adolescents aged 14 to 18 years. Methods A diet quality index for NZ adolescents (NZDQI-A) was developed based on ‘Adequacy’ and ‘Variety’ of five food groups reflecting the New Zealand Food and Nutrition Guidelines for Healthy Adolescents. The NZDQI-A was scored from zero to 100, with a higher score reflecting a better diet quality. Forty-one adolescents (16 males, 25 females, aged 14–18 years) each completed the FQ and a 4DFR. The test-retest reliability of the FQ-derived NZDQI-A scores over a two-week period and the relative validity of the scores compared to the 4DFR were estimated using Pearson’s correlations. Construct validity was examined by comparing NZDQI-A scores against nutrient intakes obtained from the 4DFR. Results The NZDQI-A derived from the FQ showed good reliability (r = 0.65) and reasonable agreement with 4DFR in ranking participants by scores (r = 0.39). More than half of the participants were classified into the same thirds of scores while 10% were misclassified into the opposite thirds by the two methods. Higher NZDQI-A scores were also associated with lower total fat and saturated fat intakes and higher iron intakes. Conclusions Higher NZDQI-A scores were associated with more desirable fat and iron intakes. The scores derived from either FQ or 4DFR were comparable and reproducible when repeated within two weeks. The NZDQI-A is relatively valid and reliable in ranking diet quality in adolescents at a group level even in a small sample size. Further studies are required to test the predictive validity of this food-based diet index in larger samples. PMID:23759064
Lodeiro-Fernández, Leire; Lorenzo-López, Laura; Maseda, Ana; Núñez-Naveira, Laura; Rodríguez-Villamil, José Luis; Millán-Calenti, José Carlos
2015-01-01
Purpose The possible relationship between audiometric hearing thresholds and cognitive performance on language tests was analyzed in a cross-sectional cohort of older adults aged ≥65 years (N=98) with different degrees of cognitive impairment. Materials and methods Participants were distributed into two groups according to Reisberg’s Global Deterioration Scale (GDS): a normal/predementia group (GDS scores 1–3) and a moderate/moderately severe dementia group (GDS scores 4 and 5). Hearing loss (pure-tone audiometry) and receptive and production-based language function (Verbal Fluency Test, Boston Naming Test, and Token Test) were assessed. Results Results showed that the dementia group achieved significantly lower scores than the predementia group in all language tests. A moderate negative correlation between hearing loss and verbal comprehension (r=−0.298; P<0.003) was observed in the predementia group (r=−0.363; P<0.007). However, no significant relationship between hearing loss and verbal fluency and naming scores was observed, regardless of cognitive impairment. Conclusion In the predementia group, reduced hearing level partially explains comprehension performance but not language production. In the dementia group, hearing loss cannot be considered as an explanatory factor of poor receptive and production-based language performance. These results are suggestive of cognitive rather than simply auditory problems to explain the language impairment in the elderly. PMID:25914528
Applications of computerized adaptive testing (CAT) to the assessment of headache impact.
Ware, John E; Kosinski, Mark; Bjorner, Jakob B; Bayliss, Martha S; Batenhorst, Alice; Dahlöf, Carl G H; Tepper, Stewart; Dowson, Andrew
2003-12-01
To evaluate the feasibility of computerized adaptive testing (CAT) and the reliability and validity of CAT-based estimates of headache impact scores in comparison with 'static' surveys. Responses to the 54-item Headache Impact Test (HIT) were re-analyzed for recent headache sufferers (n = 1016) who completed telephone interviews during the National Survey of Headache Impact (NSHI). Item response theory (IRT) calibrations and the computerized dynamic health assessment (DYNHA) software were used to simulate CAT assessments by selecting the most informative items for each person and estimating impact scores according to pre-set precision standards (CAT-HIT). Results were compared with IRT estimates based on all items (total-HIT), computerized 6-item dynamic estimates (CAT-HIT-6), and a developmental version of a 'static' 6-item form (HIT-6-D). Analyses focused on: respondent burden (survey length and administration time), score distributions ('ceiling' and 'floor' effects), reliability and standard errors, and clinical validity (diagnosis, level of severity). A random sample (n = 245) was re-assessed to test responsiveness. A second study (n = 1103) compared actual CAT surveys and an improved 'static' HIT-6 among current headache sufferers sampled on the Internet. Respondents completed measures from the first study and the generic SF-8 Health Survey; some (n = 540) were re-tested on the Internet after 2 weeks. In the first study, simulated CAT-HIT and total-HIT scores were highly correlated (r = 0.92) without 'ceiling' or 'floor' effects and with a substantial reduction (90.8%) in respondent burden. Six of the 54 items accounted for the great majority of item administrations (3603/5028, 77.6%). CAT-HIT reliability estimates were very high (0.975-0.992) in the range where 95% of respondents scored, and relative validity (RV) coefficients were high for diagnosis (RV = 0.87) and severity (RV = 0.89); patient-level classifications were accurate 91.3% for a diagnosis of migraine. For all three criteria of change, CAT-HIT scores were more responsive than all other measures. In the second study, estimates of respondent burden, item usage, reliability and clinical validity were replicated. The test-retest reliability of CAT-HIT was 0.79 and alternate forms coefficients ranged from 0.85 to 0.91. All correlations with the generic SF-8 were negative. CAT-based administrations of headache impact items achieved very large reductions in respondent burden without compromising validity for purposes of patient screening or monitoring changes in headache impact over time. IRT models and CAT-based dynamic health assessments warrant testing among patients with other conditions.
Evaluating a hybrid web-based basic genetics course for health professionals.
Wallen, Gwenyth R; Cusack, Georgie; Parada, Suzan; Miller-Davis, Claiborne; Cartledge, Tannia; Yates, Jan
2011-08-01
Health professionals, particularly nurses, continue to struggle with the expanding role of genetics information in the care of their patients. This paper describes an evaluation study of the effectiveness of a hybrid basic genetics course for healthcare professionals combining web-based learning with traditional face-to-face instructional techniques. A multidisciplinary group from the National Institutes of Health (NIH) created "Basic Genetics Education for Healthcare Providers" (BGEHCP). This program combined 7 web-based self-education modules with monthly traditional face-to-face lectures by genetics experts. The course was pilot tested by 186 healthcare providers from various disciplines with 69% (n=129) of the class registrants enrolling in a pre-post evaluation trial. Outcome measures included critical thinking knowledge items and a Web-based Learning Environment Inventory (WEBLEI). Results indicated a significant (p<0.001) change in knowledge scores. WEBLEI scores indicated program effectiveness particularly in the area of convenience, access and the course structure and design. Although significant increases in overall knowledge scores were achieved, scores in content areas surrounding genetic risk identification and ethical issues regarding genetic testing reflected continued gaps in knowledge. Web-based genetics education may help overcome genetics knowledge deficits by providing access for health professionals with diverse schedules in a variety of national and international settings. Published by Elsevier Ltd.
Preliminary assessment of the feasibility of using AB words to assess candidacy in adults.
Vickers, Deborah A; Riley, Alison; Ricaud, Rebecca; Verschuur, Carl; Cooper, Stacey; Nunn, Terry; Webb, Kath; Muff, Joanne; Harris, Frances; Chung, Mark; Humphries, Jane; Langshaw, Alison; Poynter-Smith, Emma; Totten, Catherine; Tapper, Lynne; Ridgwell, Jillian; Mawman, Deborah; de Estibariz, Unai Martinez; O'Driscoll, Martin; George, Nicola; Pinto, Francesca; Hall, Anne; Llewellyn, Carol; Miah, Razun; Al-Malky, Ghada; Kitterick, Pádraig T
2016-04-01
Adult cochlear implant (CI) candidacy is assessed in part by the use of speech perception measures. In the United Kingdom the current cut-off point to fall within the CI candidacy range is a score of less than 50% on the BKB sentences presented in quiet (presented at 70 dBSPL). The specific goal of this article was to review the benefit of adding the AB word test to the assessment test battery for candidacy. The AB word test scores showed good sensitivity and specificity when calculated based on both word and phoneme scores. The word score equivalent for 50% correct on the BKB sentences was 18.5% and it was 34.5% when the phoneme score was calculated; these scores are in line with those used in centres in Wales (15% AB word score). The goal of the British Cochlear Implant Group (BCIG) service evaluation was to determine if the pre-implant assessment measures are appropriate and set at the correct level for determining candidacy, the future analyses will determine whether the speech perception cut-off point for candidacy should be adjusted and whether other more challenging measures should be used in the candidacy evaluation.
Using a genetic/clinical risk score to stop smoking (GeTSS): randomised controlled trial.
Nichols, John A A; Grob, Paul; Kite, Wendy; Williams, Peter; de Lusignan, Simon
2017-10-23
As genetic tests become cheaper, the possibility of their widespread availability must be considered. This study involves a risk score for lung cancer in smokers that is roughly 50% genetic (50% clinical criteria). The risk score has been shown to be effective as a smoking cessation motivator in hospital recruited subjects (not actively seeking cessation services). This was an RCT set in a United Kingdom National Health Service (NHS) smoking cessation clinic. Smokers were identified from medical records. Subjects that wanted to participate were randomised to a test group that was administered a gene-based risk test and given a lung cancer risk score, or a control group where no risk score was performed. Each group had 8 weeks of weekly smoking cessation sessions involving group therapy and advice on smoking cessation pharmacotherapy and follow-up at 6 months. The primary endpoint was smoking cessation at 6 months. Secondary outcomes included ranking of the risk score and other motivators. 67 subjects attended the smoking cessation clinic. The 6 months quit rates were 29.4%, (10/34; 95% CI 14.1-44.7%) for the test group and 42.9% (12/28; 95% CI 24.6-61.2%) for the controls. The difference is not significant. However, the quit rate for test group subjects with a "very high" risk score was 89% (8/9; 95% CI 68.4-100%) which was significant when compared with the control group (p = 0.023) and test group subjects with moderate risk scores had a 9.5% quit rate (2/21; 95% CI 2.7-28.9%) which was significantly lower than for above moderate risk score 61.5% (8/13; 95% CI 35.5-82.3; p = 0.03). Only the sub-group with the highest risk score showed an increased quit rate. Controls and test group subjects with a moderate risk score were relatively unlikely to have achieved and maintained non-smoker status at 6 months. ClinicalTrials.gov ID NCT01176383 (date of registration: 3 August 2010).
Story Based Activities Enhance Literacy Skills in Preschool Children
ERIC Educational Resources Information Center
Yazici, Elçin; Bolay, Hayrunnisa
2017-01-01
We investigated the impact of story-based activities on literacy skills in pre-school children. The efficacy of story-based activities program were tested by literacy skills survey test. Results showed that, the scores of overall literacy skills and all subsets skills in the study group (n = 45) were statistically significantly higher than the…
Walker, Bonnie L; Harrington, Susan S
2004-05-01
This study compares the effects of computer-based and instructor-led training on long-term care staff with a high school education or less on fire safety knowledge, attitudes, and practices. Findings show that both methods of instruction were effective in increasing staff tests scores from pre- to posttest. Scores of both groups were lower at follow-up three months later but continued to be higher than at pretest. Staff with a high school education increased scores more than those without a high school diploma.
Arntzen, Kjell Arne; Schirmer, Henrik; Johnsen, Stein Harald; Wilsgaard, Tom; Mathiesen, Ellisiv B
2012-01-01
Carotid artery atherosclerosis is a major risk factor for stroke and subsequent cognitive impairment. Prospective population studies have shown associations between carotid intima-media thickness (IMT) and stenosis and cognitive decline and dementia in elderly stroke-free persons, whereas results in the middle-aged are conflicting. In this prospective population-based study, 4,371 stroke-free middle-aged participants underwent carotid ultrasound examination and assessment of vascular risk factors at baseline and were tested for cognitive function 7 years later. Associations between IMT, number of plaques and total plaque area and cognitive test scores on verbal memory test, digit symbol-coding test and tapping test were assessed in linear regression models. In the multivariable analyses adjusted for sex, age, education, depression and vascular risk factors, the presence of plaques was significantly associated with lower test scores on the verbal memory test (p = 0.01) and on the digit symbol-coding test (p = 0.03). The number of plaques (p = 0.01) and the total plaque area (p = 0.02) were associated with lower scores on the verbal memory test. No significant association was seen between common carotid artery IMT and cognitive test scores. The tapping test was not associated with the carotid ultrasound variables. In this middle-aged general population, subclinical carotid atherosclerosis measured as the presence of plaques, number of plaques and total plaque area were independent long-term predictors of lower cognitive test scores. Copyright © 2012 S. Karger AG, Basel.
The Motivated Strategies for Learning Questionnaire: score validity among medicine residents.
Cook, David A; Thompson, Warren G; Thomas, Kris G
2011-12-01
The Motivated Strategies for Learning Questionnaire (MSLQ) purports to measure motivation using the expectancy-value model. Although it is widely used in other fields, this instrument has received little study in health professions education. The purpose of this study was to evaluate the validity of MSLQ scores. We conducted a validity study evaluating the relationships of MSLQ scores to other variables and their internal structure (reliability and factor analysis). Participants included 210 internal medicine and family medicine residents participating in a web-based course on ambulatory medicine at an academic medical centre. Measurements included pre-course MSLQ scores, pre- and post-module motivation surveys, post-module knowledge test and post-module Instructional Materials Motivation Survey (IMMS) scores. Internal consistency was universally high for all MSLQ items together (Cronbach's α = 0.93) and for each domain (α ≥ 0.67). Total MSLQ scores showed statistically significant positive associations with post-test knowledge scores. For example, a 1-point rise in total MSLQ score was associated with a 4.4% increase in post-test scores (β = 4.4; p < 0.0001). Total MSLQ scores showed moderately strong, statistically significant associations with several other measures of effort, motivation and satisfaction. Scores on MSLQ domains demonstrated associations that generally aligned with our hypotheses. Self-efficacy and control of learning belief scores demonstrated the strongest domain-specific relationships with knowledge scores (β = 2.9 for both). Confirmatory factor analysis showed a borderline model fit. Follow-up exploratory factor analysis revealed the scores of five factors (self-efficacy, intrinsic interest, test anxiety, extrinsic goals, attribution) demonstrated psychometric and predictive properties similar to those of the original scales. Scores on the MSLQ are reliable and predict meaningful outcomes. However, the factor structure suggests a simplified model might better fit the empiric data. Future research might consider how assessing and responding to motivation could enhance learning. © Blackwell Publishing Ltd 2011.
Friedrich, Orsolya; Hemmerling, Kay; Kuehlmeyer, Katja; Nörtemann, Stefanie; Fischer, Martin; Marckmann, Georg
2017-03-03
Recent findings suggest that medical students' moral competence decreases throughout medical school. This pilot study gives preliminary insights into the effects of two educational interventions in ethics classes on moral competence among medical students in Munich, Germany. Between 2012 and 2013, medical students were tested using Lind's Moral Competence Test (MCT) prior to and after completing different ethics classes. The experimental group (EG, N = 76) participated in principle-based structured case discussions (PBSCDs) and was compared with a control group with theory-based case discussions (TBCDs) (CG, N = 55). The pre/post C-scores were compared using a Wilcoxon Test, ANOVA and effect-size calculation. The C-score improved by around 3.2 C-points in the EG, and by 0.2 C-points in the CG. The mean C-score difference was not statistically significant for the EG (P = 0.14) or between the two groups (P = 0.34). There was no statistical significance for the teachers' influence (P = 0.54) on C-score. In both groups, students with below-average (M = 29.1) C-scores improved and students with above-average C-scores regressed. The increase of the C-Index was greater in the EG than in the CG. The absolute effect-size of the EG compared with the CG was 3.0 C-points, indicating a relevant effect. Teaching ethics with PBSCDs did not provide a statistically significant influence on students' moral competence, compared with TBCDs. Yet, the effect size suggests that PBSCDs may improve moral competence among medical students more effectively. Further research with larger and completely randomized samples is needed to gain definite explanations for the results.
Addiction-Like Mobile Phone Behavior – Validation and Association With Problem Gambling
Fransson, Andreas; Chóliz, Mariano; Håkansson, Anders
2018-01-01
Mobile phone use and its potential addiction has become a point of interest within the research community. The aim of the study was to translate and validate the Test of Mobile Dependence (TMD), and to investigate if there are any associations between mobile phone use and problem gambling. This was a cross-sectional study on a Swedish general population. A questionnaire consisting of a translated version of the TMD, three problem gambling questions (NODS-CLiP) together with two questions concerning previous addiction treatment was published online. Exploratory factor analysis based on polychoric correlations was performed on the TMD. Independent samples T-tests, Mann-Whitney test, logistic regression analyses and ANOVA were performed to examine mean differences between subjects based on TMD test score, gambling and previous addiction treatment. A total of 1,515 people (38.3% men) answered the questionnaire. The TMD showed acceptable internal consistency (Cronbach's alpha: 0.905), and significant correlation with subjective dependence on one's mobile phone. Women scored higher on the TMD and 15-18 year olds had the highest mean test score. The TMD test score was significantly associated with problem gambling, but only when controlling for age and sex. Various separated items related to mobile phone use were associated with problem gambling. The TMD had acceptable internal consistency and correlates with subjective dependence, while future confirmatory factor analysis is recommended. An association between mobile phone use and problem gambling may be possible, but requires further research. PMID:29780345
Addiction-Like Mobile Phone Behavior - Validation and Association With Problem Gambling.
Fransson, Andreas; Chóliz, Mariano; Håkansson, Anders
2018-01-01
Mobile phone use and its potential addiction has become a point of interest within the research community. The aim of the study was to translate and validate the Test of Mobile Dependence (TMD), and to investigate if there are any associations between mobile phone use and problem gambling. This was a cross-sectional study on a Swedish general population. A questionnaire consisting of a translated version of the TMD, three problem gambling questions (NODS-CLiP) together with two questions concerning previous addiction treatment was published online. Exploratory factor analysis based on polychoric correlations was performed on the TMD. Independent samples T -tests, Mann-Whitney test, logistic regression analyses and ANOVA were performed to examine mean differences between subjects based on TMD test score, gambling and previous addiction treatment. A total of 1,515 people (38.3% men) answered the questionnaire. The TMD showed acceptable internal consistency (Cronbach's alpha: 0.905), and significant correlation with subjective dependence on one's mobile phone. Women scored higher on the TMD and 15-18 year olds had the highest mean test score. The TMD test score was significantly associated with problem gambling, but only when controlling for age and sex. Various separated items related to mobile phone use were associated with problem gambling. The TMD had acceptable internal consistency and correlates with subjective dependence, while future confirmatory factor analysis is recommended. An association between mobile phone use and problem gambling may be possible, but requires further research.
The impact of stroke on emotional intelligence
2010-01-01
Background Emotional intelligence (EI) is important for personal, social and career success and has been linked to the frontal anterior cingulate, insula and amygdala regions. Aim To ascertain which stroke lesion sites impair emotional intelligence and relation to current frontal assessment measurements. Methods One hundred consecutive, non aphasic, independently functioning patients post stroke were evaluated with the Bar-On emotional intelligence test, "known as the Emotional Quotient Inventory (EQ-i)" and frontal tests that included the Wisconsin Card Sorting Test (WCST) and Frontal Systems Behavioral Inventory (FRSBE) for correlational validity. The results of a screening, bedside frontal network syndrome test (FNS) and NIHSS to document neurological deficit were also recorded. Lesion location was determined by the Cerefy digital, coxial brain atlas. Results After exclusions (n = 8), patients tested (n = 92, mean age 50.1, CI: 52.9, 47.3 years) revealed that EQ-i scores were correlated (negatively) with all FRSBE T sub-scores (apathy, disinhibition, executive, total), with self-reported scores correlating better than family reported scores. Regression analysis revealed age and FRSBE total scores as the most influential variables. The WCST error percentage T score did not correlate with the EQ-i scores. Based on ANOVA, there were significant differences among the lesion sites with the lowest mean EQ-i scores associated with temporal (71.5) and frontal (87.3) lesions followed by subtentorial (91.7), subcortical gray (92.6) and white (95.2) matter, and the highest scores associated with parieto-occipital lesions (113.1). Conclusions 1) Stroke impairs EI and is associated with apathy, disinhibition and executive functioning. 2) EI is associated with frontal, temporal, subcortical and subtentorial stroke syndromes. PMID:21029468
New methods for analyzing semantic graph based assessments in science education
NASA Astrophysics Data System (ADS)
Vikaros, Lance Steven
This research investigated how the scoring of semantic graphs (known by many as concept maps) could be improved and automated in order to address issues of inter-rater reliability and scalability. As part of the NSF funded SENSE-IT project to introduce secondary school science students to sensor networks (NSF Grant No. 0833440), semantic graphs illustrating how temperature change affects water ecology were collected from 221 students across 16 schools. The graphing task did not constrain students' use of terms, as is often done with semantic graph based assessment due to coding and scoring concerns. The graphing software used provided real-time feedback to help students learn how to construct graphs, stay on topic and effectively communicate ideas. The collected graphs were scored by human raters using assessment methods expected to boost reliability, which included adaptations of traditional holistic and propositional scoring methods, use of expert raters, topical rubrics, and criterion graphs. High levels of inter-rater reliability were achieved, demonstrating that vocabulary constraints may not be necessary after all. To investigate a new approach to automating the scoring of graphs, thirty-two different graph features characterizing graphs' structure, semantics, configuration and process of construction were then used to predict human raters' scoring of graphs in order to identify feature patterns correlated to raters' evaluations of graphs' topical accuracy and complexity. Results led to the development of a regression model able to predict raters' scoring with 77% accuracy, with 46% accuracy expected when used to score new sets of graphs, as estimated via cross-validation tests. Although such performance is comparable to other graph and essay based scoring systems, cross-context testing of the model and methods used to develop it would be needed before it could be recommended for widespread use. Still, the findings suggest techniques for improving the reliability and scalability of semantic graph based assessments without requiring constraint of how ideas are expressed.
Effects of Didactic Instruction and Test-Enhanced Learning in a Nursing Review Course.
Tu, Yu-Ching; Lin, Yi-Jung; Lee, Jonathan W; Fan, Lir-Wan
2017-11-01
Determining the most effective approach for students' successful academic performance and achievement on the national licensure examination for RNs is important to nursing education and practice. A quasi-experimental design was used to compare didactic instruction and test-enhanced learning among nursing students divided into two fundamental nursing review courses in their final semester. Students in each course were subdivided into low-, intermediate-, and high-score groups based on their first examination scores. Mixed model of repeated measure and two-way analysis of variance were applied to evaluate students' academic results and both teaching approaches. Intermediate-scoring students' performances improved more through didactic instruction, whereas low-scoring students' performances improved more through test-enhanced learning. Each method had differing effects on individual subgroups within the different performance level groups of their classes, which points to the importance of considering both the didactic and test-enhanced learning approaches. [J Nurs Educ. 2017;56(11):683-687.]. Copyright 2017, SLACK Incorporated.
Leight, Hayley; Saunders, Cheston; Calkins, Robin; Withers, Michelle
2012-01-01
Collaborative testing has been shown to improve performance but not always content retention. In this study, we investigated whether collaborative testing could improve both performance and content retention in a large, introductory biology course. Students were semirandomly divided into two groups based on their performances on exam 1. Each group contained equal numbers of students scoring in each grade category (“A”–“F”) on exam 1. All students completed each of the four exams of the semester as individuals. For exam 2, one group took the exam a second time in small groups immediately following the individually administered test. The other group followed this same format for exam 3. Individual and group exam scores were compared to determine differences in performance. All but exam 1 contained a subset of cumulative questions from the previous exam. Performances on the cumulative questions for exams 3 and 4 were compared for the two groups to determine whether there were significant differences in content retention. Even though group test scores were significantly higher than individual test scores, students who participated in collaborative testing performed no differently on cumulative questions than students who took the previous exam as individuals. PMID:23222835
ERIC Educational Resources Information Center
Kim, Sooyeon; Livingston, Samuel A.
2017-01-01
The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…
[Cancer nursing care education programs: the effectiveness of different teaching methods].
Cheng, Yun-Ju; Kao, Yu-Hsiu
2012-10-01
In-service education affects the quality of cancer care directly. Using classroom teaching to deliver in-service education is often ineffective due to participants' large workload and shift requirements. This study evaluated the learning effectiveness of different teaching methods in the dimensions of knowledge, attitude, and learning satisfaction. This study used a quasi-experimental study design. Participants were cancer ward nurses working at one medical center in northern Taiwan. Participants were divided into an experimental group and control group. The experimental group took an e-learning course and the control group took a standard classroom course using the same basic course material. Researchers evaluated the learning efficacy of each group using a questionnaire based on the quality of cancer nursing care learning effectiveness scale. All participants answered the questionnaire once before and once after completing the course. (1) Post-test "knowledge" scores for both groups were significantly higher than pre-test scores for both groups. Post-test "attitude" scores were significantly higher for the control group, while the experimental group reported no significant change. (2) after a covariance analysis of the pre-test scores for both groups, the post-test score for the experimental group was significantly lower than the control group in the knowledge dimension. Post-test scores did not differ significantly from pre-test scores for either group in the attitude dimension. (3) Post-test satisfaction scores between the two groups did not differ significantly with regard to teaching methods. The e-learning method, however, was demonstrated as more flexible than the classroom teaching method. Study results demonstrate the importance of employing a variety of teaching methods to instruct clinical nursing staff. We suggest that both classroom teaching and e-learning instruction methods be used to enhance the quality of cancer nursing care education programs. We also encourage that interactivity between student and instructor be incorporated into e-learning course designs to enhance effectiveness.
Dimensionality Analysis of "CBAL"™ Writing Tests. Research Report. ETS RR-13-10
ERIC Educational Resources Information Center
Fu, Jianbin; Chung, Seunghee; Wise, Maxwell
2013-01-01
The Cognitively Based Assessment of, for, and as Learning ("CBAL"™) research initiative is aimed at developing an innovative approach to K-12 assessment based on cognitive competency models. Because the choice of scoring and equating approaches depends on test dimensionality, the dimensional structure of CBAL tests must be understood.…
ERIC Educational Resources Information Center
Opara, Ijeoma M.; Onyekuru, Bruno U.; Njoku, Joyce U.
2015-01-01
The study investigated the predictive power of school based assessment scores on students' achievement in Junior Secondary Certificate Examination (JSCE) in English and Mathematics. Two hypotheses tested at 0.05 level of significance guided the study. The study adopted an ex-post facto research design. A sample of 250 students were randomly drawn…
Biases and power for groups comparison on subjective health measurements.
Hamel, Jean-François; Hardouin, Jean-Benoit; Le Neel, Tanguy; Kubis, Gildas; Roquelaure, Yves; Sébille, Véronique
2012-01-01
Subjective health measurements are increasingly used in clinical research, particularly for patient groups comparisons. Two main types of analytical strategies can be used for such data: so-called classical test theory (CTT), relying on observed scores and models coming from Item Response Theory (IRT) relying on a response model relating the items responses to a latent parameter, often called latent trait. Whether IRT or CTT would be the most appropriate method to compare two independent groups of patients on a patient reported outcomes measurement remains unknown and was investigated using simulations. For CTT-based analyses, groups comparison was performed using t-test on the scores. For IRT-based analyses, several methods were compared, according to whether the Rasch model was considered with random effects or with fixed effects, and the group effect was included as a covariate or not. Individual latent traits values were estimated using either a deterministic method or by stochastic approaches. Latent traits were then compared with a t-test. Finally, a two-steps method was performed to compare the latent trait distributions, and a Wald test was performed to test the group effect in the Rasch model including group covariates. The only unbiased IRT-based method was the group covariate Wald's test, performed on the random effects Rasch model. This model displayed the highest observed power, which was similar to the power using the score t-test. These results need to be extended to the case frequently encountered in practice where data are missing and possibly informative.
Barisoni, Laura; Troost, Jonathan P; Nast, Cynthia; Bagnasco, Serena; Avila-Casado, Carmen; Hodgin, Jeffrey; Palmer, Matthew; Rosenberg, Avi; Gasim, Adil; Liensziewski, Chrysta; Merlino, Lino; Chien, Hui-Ping; Chang, Anthony; Meehan, Shane M; Gaut, Joseph; Song, Peter; Holzman, Lawrence; Gibson, Debbie; Kretzler, Matthias; Gillespie, Brenda W; Hewitt, Stephen M
2016-07-01
The multicenter Nephrotic Syndrome Study Network (NEPTUNE) digital pathology scoring system employs a novel and comprehensive methodology to document pathologic features from whole-slide images, immunofluorescence and ultrastructural digital images. To estimate inter- and intra-reader concordance of this descriptor-based approach, data from 12 pathologists (eight NEPTUNE and four non-NEPTUNE) with experience from training to 30 years were collected. A descriptor reference manual was generated and a webinar-based protocol for consensus/cross-training implemented. Intra-reader concordance for 51 glomerular descriptors was evaluated on jpeg images by seven NEPTUNE pathologists scoring 131 glomeruli three times (Tests I, II, and III), each test following a consensus webinar review. Inter-reader concordance of glomerular descriptors was evaluated in 315 glomeruli by all pathologists; interstitial fibrosis and tubular atrophy (244 cases, whole-slide images) and four ultrastructural podocyte descriptors (178 cases, jpeg images) were evaluated once by six and five pathologists, respectively. Cohen's kappa for inter-reader concordance for 48/51 glomerular descriptors with sufficient observations was moderate (0.40
Hamstring-and-Lower-Back Flexibility in Male Amateur Soccer Players.
van der Horst, Nick; Priesterbach, Annique; Backx, Frank; Smits, Dirk-Wouter
2017-01-01
This study investigated the hamstring-and-lower-back flexibility (HLBF) of male adult amateur soccer players, using the sit-and-reach test (SRT), with a view to obtaining population-based reference values and to determining whether SRT scores are associated with player characteristics. Cross-sectional cohort study. Teams from high-level Dutch amateur soccer competitions were recruited for participation. Dutch male high-level amateur field soccer players (n = 449) of age 18 to 40 years. Players with a hamstring injury at the moment of SRT-measurement or any other injury that prevented them from following the SRT protocol were excluded. Sit-and-reach test scores were measured and then population-based reference values were calculated as follows: >2SD below mean (defining "very low" HLBF), 1SD-2SD below mean ("low" HLBF), 1SD below mean to 1SD above mean ("normal" HLBF), 1SD-2SD above mean ("high" HLBF), and >2SD above mean ("very high" HLBF). Whether SRT scores were correlated with player characteristics was determined using a Pearson correlation coefficient or Spearman rho. Sit-and-reach test scores ranged from 0 to 43.5 cm (mean 22.0 cm, SD 9.2). The cutoff points for population-based reference values were <3.5 cm for "very low", 3.5 to 13.0 cm for "low", 13.0 to 31.0 cm for "normal", 31.0 to 40.5 cm for "high", and >40.5 cm for "very high". Sit-and-reach test scores were significantly associated with players' height (ρ = -0.132, P = 0.005), body mass index (r = 0.114, P = 0.016), and history of anterior cruciate ligament surgery (P < 0.001). This study is the first to describe the HLBF of amateur soccer players. The SRT reference values with cutoff points may facilitate evidence-based decision making regarding HLBF, and the SRT might be a useful tool to assess injury risk, performance, or for diagnostic purposes.
Game-based biofeedback for paediatric anxiety and depression
2011-01-01
Twenty-four children and adolescents aged 9–17 who were referred for treatment for anxiety were assigned to either a game-based biofeedback group or a waiting list comparison group. The eight-session biofeedback intervention included psychoeducation, identification of triggers and signs of anxiety, and in vivo practice. The intervention used computer-based gaming technology to teach and practise relaxation. Analyses using ANCOVA revealed significant differences in post-test scores of anxiety and depression measures between the two groups. The intervention group reduced anxiety and depression scores on standardised tests. Findings suggest that biofeedback-assisted relaxation training can be useful in decreasing anxiety and depressive symptoms in anxious youths. PMID:22942901
Evaluation of a novel scoring and grading model for VP-based exams in postgraduate nurse education.
Forsberg, Elenita; Ziegert, Kristina; Hult, Håkan; Fors, Uno
2015-12-01
For Virtual Patient-based exams, several scoring and grading methods have been proposed, but none have yet been validated. The aim of this study was to evaluate a new scoring and grading model for VP-based exams in postgraduate paediatric nurse education. The same student group of 19 students performed a VP-based exam in three consecutive courses. When using the scoring and grading assessment model, which contains a deduction system for unnecessary or unwanted actions, a progression was found in the three courses: 53% of the students passed the first exam, 63% the second and 84% passed the final exam. The most common reason for deduction of points was due to students asking too many interview questions or ordering too many laboratory tests. The results showed that the new scoring model made it possible to judge the students' clinical reasoning process as well as their progress. Copyright © 2015 Elsevier Ltd. All rights reserved.
Testing non-inferiority of a new treatment in three-arm clinical trials with binary endpoints.
Tang, Nian-Sheng; Yu, Bin; Tang, Man-Lai
2014-12-18
A two-arm non-inferiority trial without a placebo is usually adopted to demonstrate that an experimental treatment is not worse than a reference treatment by a small pre-specified non-inferiority margin due to ethical concerns. Selection of the non-inferiority margin and establishment of assay sensitivity are two major issues in the design, analysis and interpretation for two-arm non-inferiority trials. Alternatively, a three-arm non-inferiority clinical trial including a placebo is usually conducted to assess the assay sensitivity and internal validity of a trial. Recently, some large-sample approaches have been developed to assess the non-inferiority of a new treatment based on the three-arm trial design. However, these methods behave badly with small sample sizes in the three arms. This manuscript aims to develop some reliable small-sample methods to test three-arm non-inferiority. Saddlepoint approximation, exact and approximate unconditional, and bootstrap-resampling methods are developed to calculate p-values of the Wald-type, score and likelihood ratio tests. Simulation studies are conducted to evaluate their performance in terms of type I error rate and power. Our empirical results show that the saddlepoint approximation method generally behaves better than the asymptotic method based on the Wald-type test statistic. For small sample sizes, approximate unconditional and bootstrap-resampling methods based on the score test statistic perform better in the sense that their corresponding type I error rates are generally closer to the prespecified nominal level than those of other test procedures. Both approximate unconditional and bootstrap-resampling test procedures based on the score test statistic are generally recommended for three-arm non-inferiority trials with binary outcomes.
Klein, A A; Collier, T; Yeates, J; Miles, L F; Fletcher, S N; Evans, C; Richards, T
2017-09-01
A simple and accurate scoring system to predict risk of transfusion for patients undergoing cardiac surgery is lacking. We identified independent risk factors associated with transfusion by performing univariate analysis, followed by logistic regression. We then simplified the score to an integer-based system and tested it using the area under the receiver operator characteristic (AUC) statistic with a Hosmer-Lemeshow goodness-of-fit test. Finally, the scoring system was applied to the external validation dataset and the same statistical methods applied to test the accuracy of the ACTA-PORT score. Several factors were independently associated with risk of transfusion, including age, sex, body surface area, logistic EuroSCORE, preoperative haemoglobin and creatinine, and type of surgery. In our primary dataset, the score accurately predicted risk of perioperative transfusion in cardiac surgery patients with an AUC of 0.76. The external validation confirmed accuracy of the scoring method with an AUC of 0.84 and good agreement across all scores, with a minor tendency to under-estimate transfusion risk in very high-risk patients. The ACTA-PORT score is a reliable, validated tool for predicting risk of transfusion for patients undergoing cardiac surgery. This and other scores can be used in research studies for risk adjustment when assessing outcomes, and might also be incorporated into a Patient Blood Management programme. © The Author 2017. Published by Oxford University Press on behalf of the British Journal of Anaesthesia. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Wells, Erica L; Kofler, Michael J; Soto, Elia F; Schaefer, Hillary S; Sarver, Dustin E
2018-01-01
Pediatric ADHD is associated with impairments in working memory, but these deficits often go undetected when using clinic-based tests such as digit span backward. The current study pilot-tested minor administration/scoring modifications to improve digit span backward's construct and predictive validities in a well-characterized sample of children with ADHD. WISC-IV digit span was modified to administer all trials (i.e., ignore discontinue rule) and count digits rather than trials correct. Traditional and modified scores were compared to a battery of criterion working memory (construct validity) and academic achievement tests (predictive validity) for 34 children with ADHD ages 8-13 (M=10.41; 11 girls). Traditional digit span backward scores failed to predict working memory or KTEA-2 achievement (allns). Alternate administration/scoring of digit span backward significantly improved its associations with working memory reordering (r=.58), working memory dual-processing (r=.53), working memory updating (r=.28), and KTEA-2 achievement (r=.49). Consistent with prior work, these findings urge caution when interpreting digit span performance. Minor test modifications may address test validity concerns, and should be considered in future test revisions. Digit span backward becomes a valid measure of working memory at exactly the point that testing is traditionally discontinued. Copyright © 2017 Elsevier Ltd. All rights reserved.
Conservatism and Cognitive Ability
ERIC Educational Resources Information Center
Stankov, Lazar
2009-01-01
Conservatism and cognitive ability are negatively correlated. The evidence is based on 1254 community college students and 1600 foreign students seeking entry to United States' universities. At the individual level of analysis, conservatism scores correlate negatively with SAT, Vocabulary, and Analogy test scores. At the national level of…
Mausbach, Brent T; Tiznado, Denisse; Cardenas, Veronica; Jeste, Dilip V; Patterson, Thomas L
2016-10-30
The UCSD Performance-based Skills Assessment (UPSA) is a widely used measure of functional capacity with strong reliability and validity. However there is a lack of psychometric data on Hispanics. The purpose of this study was to determine the impact of acculturation and education on UPSA performance among 62 Hispanic participants with schizophrenia or schizoaffective disorder and 46 healthy comparison subjects. Functional capacity was measured using the UPSA. Acculturation was measured using the Acculturation Rating Scale for Mexican Americans (ARSMA). Independent t-tests indicated that participants with schizophrenia had significantly lower UPSA total scores and scored lower on all UPSA sub-scales relative to the comparison group. Multiple regression also indicated that education and acculturation were significant predictors of UPSA total scores. These data provide a better understanding of UPSA scores in Hispanics with and without schizophrenia, and suggest that education and acculturation adjustments may be required to improve interpretation of test results. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Coriolano, Kamary; Aiken, Alice; Pukall, Caroline; Harrison, Mark
2015-01-01
The purposes of this study are three-fold: (1) To examine whether the WOMAC questionnaire should be obtained before or after performance-based tests. (2) To assess whether self-reported disability scores before and after performance-based tests differ between obese and non-obese individuals. (3) To observe whether physical activity and BMI predict self-reported disability before and after performance based tests. A longitudinal study included thirty one participants diagnosed with knee osteoarthritis (OA) using the Kellgren-Lawrence Scale by an orthopedic surgeon. All WOMAC scores were significantly higher after as compared to before the completion of performance-based tests. This pattern of results suggested that the WOMAC questionnaire should be administered to individuals with OA after performance-based tests. The obese OA was significantly different compared to the non-obese OA group on all WOMAC scores. Physical activity and BMI explained a significant proportion of variance of self-reported disability. Obese individuals with knee OA may over-estimate their ability to perform physical activities, and may under-estimate their level of disability compared to non-obese individuals with knee OA. In addition, self-reported physical activity seems to be a strong indicator of disability in individuals with knee OA, particularly for individuals with a sedentary life style. Implications for Rehabilitation Osteoarthritis is a progressive joint disabling condition that restricts physical function and participation in daily activities, particularity in elderly individuals. Obesity is a comorbidity commonly associated with osteoarthritis and it appears to increase self-reported disability in those diagnosed with osteoarthritis of the knee. In a relatively small sample, this study recommends that rehabilitation professionals obtain self-report questionnaires of disability after performance-based tests in obese individuals with osteoarthritis of the knee as they are more likely to give an accurate representation of their level of ability at this time.
Factors contributing to speech perception scores in long-term pediatric cochlear implant users.
Davidson, Lisa S; Geers, Ann E; Blamey, Peter J; Tobey, Emily A; Brenner, Christine A
2011-02-01
The objectives of this report are to (1) describe the speech perception abilities of long-term pediatric cochlear implant (CI) recipients by comparing scores obtained at elementary school (CI-E, 8 to 9 yrs) with scores obtained at high school (CI-HS, 15 to 18 yrs); (2) evaluate speech perception abilities in demanding listening conditions (i.e., noise and lower intensity levels) at adolescence; and (3) examine the relation of speech perception scores to speech and language development over this longitudinal timeframe. All 112 teenagers were part of a previous nationwide study of 8- and 9-yr-olds (N = 181) who received a CI between 2 and 5 yrs of age. The test battery included (1) the Lexical Neighborhood Test (LNT; hard and easy word lists); (2) the Bamford Kowal Bench sentence test; (3) the Children's Auditory-Visual Enhancement Test; (4) the Test of Auditory Comprehension of Language at CI-E; (5) the Peabody Picture Vocabulary Test at CI-HS; and (6) the McGarr sentences (consonants correct) at CI-E and CI-HS. CI-HS speech perception was measured in both optimal and demanding listening conditions (i.e., background noise and low-intensity level). Speech perception scores were compared based on age at test, lexical difficulty of stimuli, listening environment (optimal and demanding), input mode (visual and auditory-visual), and language age. All group mean scores significantly increased with age across the two test sessions. Scores of adolescents significantly decreased in demanding listening conditions. The effect of lexical difficulty on the LNT scores, as evidenced by the difference in performance between easy versus hard lists, increased with age and decreased for adolescents in challenging listening conditions. Calculated curves for percent correct speech perception scores (LNT and Bamford Kowal Bench) and consonants correct on the McGarr sentences plotted against age-equivalent language scores on the Test of Auditory Comprehension of Language and Peabody Picture Vocabulary Test achieved asymptote at similar ages, around 10 to 11 yrs. On average, children receiving CIs between 2 and 5 yrs of age exhibited significant improvement on tests of speech perception, lipreading, speech production, and language skills measured between primary grades and adolescence. Evidence suggests that improvement in speech perception scores with age reflects increased spoken language level up to a language age of about 10 yrs. Speech perception performance significantly decreased with softer stimulus intensity level and with introduction of background noise. Upgrades to newer speech processing strategies and greater use of frequency-modulated systems may be beneficial for ameliorating performance under these demanding listening conditions.
Chan, Kelvin; Phadke, Chetan P; Stremler, Denise; Suter, Lynn; Pauley, Tim; Ismail, Farooq; Boulias, Chris
2017-05-01
Water-based exercises have been used in the rehabilitation of people with stroke, but little is known about the impact of this treatment on balance. This study examined the effect of water-based exercises compared to land-based exercises on the balance of people with sub-acute stroke. In this single-blind randomized controlled study, 32 patients with first-time stroke discharged from inpatient rehabilitation at West Park Healthcare Centre were recruited. Participants were randomized into W (water-based + land; n = 17) or L (land only; n = 15) exercise groups. Both groups attended therapy two times per week for six weeks. Initial and progression protocols for the water-based exercises (a combination of balance, stretching, and strengthening and endurance training) and land therapy (balance, strength, transfer, gait, and stair training) were devised. Outcomes included the Berg Balance Score, Community Balance and Mobility Score, Timed Up and Go Test, and 2 Minute Walk Test. Baseline characteristics of groups W and L were similar in age, side of stroke, time since stroke, and wait time between inpatient discharge and outpatient therapy on all four outcomes. Pooled change scores from all outcomes showed that significantly greater number of patients in the W-group showed improvement post-training compared to the L-group (p < 0.05). More patients in W-group showed change scores exceeding the published minimal detectable change scores. A combination of water- and land-based exercises has potential for improving balance. The results of this study extend the work showing benefit of water-based exercise in chronic and less-impaired stroke groups to patients with sub-acute stroke.
ERIC Educational Resources Information Center
Bielinski, John; Minnema, Jane; Thurlow, Martha
A Web-based survey of 25 experts in testing theory and large-scale assessment examined the utility of out-of-level testing for making decisions about students and schools. Survey respondents were given a series of scenarios and asked to judge the degree to which out-of-level testing would affect the reliability and validity of test scores within…
NASA Astrophysics Data System (ADS)
Sukji, Paweena; Wichaidit, Pacharee Rompayom; Wichaidit, Sittichai
2018-01-01
The objectives of this study were to: 1) compare learning achievement and analytical thinking ability of Mathayomsuksa 3 students before and after learning through inquiry-based learning activities integrated with the local learning resource, and 2) compare average post-test score of learning achievement and analytical thinking ability to its cutting score. The target of this study was 23 Mathayomsuksa 3 students who were studying in the second semester of 2016 academic year from Banchatfang School, Chainat Province. Research instruments composed of: 1) 6 lesson plans of Environment and Natural Resources, 2) the learning achievement test, and 3) analytical thinking ability test. The results showed that 1) student' learning achievement and analytical thinking ability after learning were higher than that of before at the level of .05 statistical significance, and 2) average posttest score of student' learning achievement and analytical thinking ability were higher than its cutting score at the level of .05 statistical significance. The implication of this research is for science teachers and curriculum developers to design inquiry activities that relate to student's context.
Spring, C; French, L
1990-01-01
A method of identifying children with specific reading disabilities by identifying discrepancies between their reading and listening comprehension scores was validated with disabled and nondisabled readers in Grades 4, 5, and 6. The method is based on a modification of the reading comprehension subtest of the Peabody Individual Achievement Test (Dunn & Markwardt, 1970). In this modification, even-numbered sentences are read by subjects, and odd-numbered sentences are read by the test administrator as subjects listen. The features of this test that reduce demands on working memory, thereby making it suitable for the detection of a discrepancy between reading and listening comprehension in readers with disabilities, are discussed. A significant group-by-modality interaction was obtained. Children with reading disabilities scored significantly lower on reading than on listening comprehension, while nondisabled readers scored slightly higher, but not significantly so, on reading than on listening comprehension. The appropriateness of this method as a substitute for the traditional method, which is based on the detection of a discrepancy between intelligence and reading and which has recently been proscribed in certain school districts, is discussed. Issues concerning the listening comprehension skills of disabled readers are also discussed.
Paige, John T; Garbee, Deborah D; Kozmenko, Valeriy; Yu, Qingzhao; Kozmenko, Lyubov; Yang, Tong; Bonanno, Laura; Swartz, William
2014-01-01
Effective teamwork in the operating room (OR) is often undermined by the "silo mentality" of the differing professions. Such thinking is formed early in one's professional experience and is fostered by undergraduate medical and nursing curricula lacking interprofessional education. We investigated the immediate impact of conducting interprofessional student OR team training using high-fidelity simulation (HFS) on students' team-related attitudes and behaviors. Ten HFS OR interprofessional student team training sessions were conducted involving 2 standardized HFS scenarios, each of which was followed by a structured debriefing that targeted team-based competencies. Pre- and post-session mean scores were calculated and analyzed for 15 Likert-type items measuring self-efficacy in teamwork competencies using the t-test. Additionally, mean scores of observer ratings of team performance after each scenario and participant ratings after the second scenario for an 11-item Likert-type teamwork scale were calculated and analyzed using one-way ANOVA and t-test. Eighteen nursing students, 20 nurse anesthetist students, and 28 medical students participated in the training. Statistically significant gains from mean pre- to post-training scores occurred on 11 of the 15 self-efficacy items. Statistically significant gains in mean observer performance scores were present on all 3 subscales of the teamwork scale from the first scenario to the second. A statistically significant difference was found in comparisons of mean observer scores with mean participant scores for the team-based behaviors subscale. High-fidelity simulation OR interprofessional student team training improves students' team-based attitudes and behaviors. Students tend to overestimate their team-based behaviors. Copyright © 2014 American College of Surgeons. Published by Elsevier Inc. All rights reserved.
A new IRT-based standard setting method: application to eCat-listening.
García, Pablo Eduardo; Abad, Francisco José; Olea, Julio; Aguado, David
2013-01-01
Criterion-referenced interpretations of tests are highly necessary, which usually involves the difficult task of establishing cut scores. Contrasting with other Item Response Theory (IRT)-based standard setting methods, a non-judgmental approach is proposed in this study, in which Item Characteristic Curve (ICC) transformations lead to the final cut scores. eCat-Listening, a computerized adaptive test for the evaluation of English Listening, was administered to 1,576 participants, and the proposed standard setting method was applied to classify them into the performance standards of the Common European Framework of Reference for Languages (CEFR). The results showed a classification closely related to relevant external measures of the English language domain, according to the CEFR. It is concluded that the proposed method is a practical and valid standard setting alternative for IRT-based tests interpretations.
A more powerful exact test of noninferiority from binary matched-pairs data.
Lloyd, Chris J; Moldovan, Max V
2008-08-15
Assessing the therapeutic noninferiority of one medical treatment compared with another is often based on the difference in response rates from a matched binary pairs design. This paper develops a new exact unconditional test for noninferiority that is more powerful than available alternatives. There are two new elements presented in this paper. First, we introduce the likelihood ratio statistic as an alternative to the previously proposed score statistic of Nam (Biometrics 1997; 53:1422-1430). Second, we eliminate the nuisance parameter by estimation followed by maximization as an alternative to the partial maximization of Berger and Boos (Am. Stat. Assoc. 1994; 89:1012-1016) or traditional full maximization. Based on an extensive numerical study, we recommend tests based on the score statistic, the nuisance parameter being controlled by estimation followed by maximization. 2008 John Wiley & Sons, Ltd
Napoli, Anthony M
2014-04-01
Cardiology consensus guidelines recommend use of the Diamond and Forrester (D&F) score to augment the decision to pursue stress testing. However, recent work has reported no association between pretest probability of coronary artery disease (CAD) as measured by D&F and physician discretion in stress test utilization for inpatients. The author hypothesized that D&F pretest probability would predict the likelihood of acute coronary syndrome (ACS) and a positive stress test and that there would be limited yield to diagnostic testing of patients categorized as low pretest probability by D&F score who are admitted to a chest pain observation unit (CPU). This was a prospective observational cohort study of consecutively admitted CPU patients in a large-volume academic urban emergency department (ED). Cardiologists rounded on all patients and stress test utilization was driven by their recommendations. Inclusion criteria were as follows: age>18 years, American Heart Association (AHA) low/intermediate risk, nondynamic electrocardiograms (ECGs), and normal initial troponin I. Exclusion criteria were as follows: age older than 75 years with a history of CAD. A D&F score for likelihood of CAD was calculated on each patient independent of patient care. Based on the D&F score, patients were assigned a priori to low-, intermediate-, and high-risk groups (<10, 10 to 90, and >90%, respectively). ACS was defined by ischemia on stress test, coronary artery occlusion of ≥70% in at least one vessel, or elevations in troponin I consistent with consensus guidelines. A true-positive stress test was defined by evidence of reversible ischemia and subsequent angiographic evidence of critical stenosis or a discharge diagnosis of ACS. An estimated 3,500 patients would be necessary to have 1% precision around a potential 0.3% event rate in low-pretest-probability patients. Categorical comparisons were made using Pearson chi-square testing. A total of 3,552 patients with index visits were enrolled over a 29-month period. The mean (±standard deviation [SD]) age was 51.3 (±9.3) years. Forty-nine percent of patients received stress testing. Pretest probability based on D&F score was associated with stress test utilization (p<0.01), risk of ACS (p<0.01), and true-positive stress tests (p=0.03). No patients with low pretest probability were subsequently diagnosed with ACS (95% CI=0 to 0.66%) or had a true-positive stress test (95% CI=0 to 1.6%). Physician discretionary decision-making regarding stress test use is associated with pretest probability of CAD. However, based on the D&F score, low-pretest-probability patients who meet CPU admission criteria are very unlikely to have a true-positive stress test or eventually receive a diagnosis of ACS, such that observation and stress test utilization may be obviated. © 2014 by the Society for Academic Emergency Medicine.
Zahid, Muhammad A; Varghese, Ramani; Mohammed, Ahmed M; Ayed, Adel K
2016-06-12
To compare the Problem-based learning (PBL) with the traditional lecture-based curricula. The single best answer Multiple Choice Questions (MCQ) and the Objective Structured Clinical Examination (OSCE) were used to compare performance of the lecture-based curriculum with the PBL medical student groups. The reliability for the MCQs and OSCE was calculated with Kuder-Richardson formula and Cronbach's alpha, respectively. The content validity of the MCQs and OSCE were tested by the Independent Subject Experts (ISE). The Student's t-test for independent samples was used to compare the item difficulty of the MCQs and OSCE's, and the Chi-square test was used to compare the grades between the two student groups. The PBL students outperformed the old curriculum students in overall grades, theoretical knowledge base (tested with K2 type MCQs) and OSCE. The number of the PBL students with scores between 80-90% (grade B) was significantly (p=0.035) higher while their number with scores between 60 to 69% (grade C) was significantly p=0.001) lower than the old curriculum students. Similarly, the mean MCQ and the OSCE scores of the new curriculum students were significantly higher (p = 0.001 and p = 0.025, respectively) than the old curriculum students. Lastly, the old curriculum students found the K2-MCQs to be more (p = 0.001) difficult than the single correct answer (K1 type) MCQs while no such difference was found by the new curriculum students. Suitably designed MCQs can be used to tap the higher cognitive knowledge base acquired in the PBL setting.
Logistical Consideration in Computer-Based Screening of Astronaut Applicants
NASA Technical Reports Server (NTRS)
Galarza, Laura
2000-01-01
This presentation reviews the logistical, ergonomic, and psychometric issues and data related to the development and operational use of a computer-based system for the psychological screening of astronaut applicants. The Behavioral Health and Performance Group (BHPG) at the Johnson Space Center upgraded its astronaut psychological screening and selection procedures for the 1999 astronaut applicants and subsequent astronaut selection cycles. The questionnaires, tests, and inventories were upgraded from a paper-and-pencil system to a computer-based system. Members of the BHPG and a computer programmer designed and developed needed interfaces (screens, buttons, etc.) and programs for the astronaut psychological assessment system. This intranet-based system included the user-friendly computer-based administration of tests, test scoring, generation of reports, the integration of test administration and test output to a single system, and a complete database for past, present, and future selection data. Upon completion of the system development phase, four beta and usability tests were conducted with the newly developed system. The first three tests included 1 to 3 participants each. The final system test was conducted with 23 participants tested simultaneously. Usability and ergonomic data were collected from the system (beta) test participants and from 1999 astronaut applicants who volunteered the information in exchange for anonymity. Beta and usability test data were analyzed to examine operational, ergonomic, programming, test administration and scoring issues related to computer-based testing. Results showed a preference for computer-based testing over paper-and -pencil procedures. The data also reflected specific ergonomic, usability, psychometric, and logistical concerns that should be taken into account in future selection cycles. Conclusion. Psychological, psychometric, human and logistical factors must be examined and considered carefully when developing and using a computer-based system for psychological screening and selection.
General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies
Lee, Seunggeun; Teslovich, Tanya M.; Boehnke, Michael; Lin, Xihong
2013-01-01
We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels. PMID:23768515
KATTS: a framework for maximizing NCLEX-RN performance.
McDowell, Betsy M
2008-04-01
A key indicator of the quality of a nursing education program is the performance of its graduates as first-time takers of the NCLEX-RN. As a result, nursing schools are open to strategies that strengthen the performance of their graduates on the examination. The Knowledge base, Anxiety control, Test-Taking Skills (KATTS) framework focuses on the three components of achieving a maximum score on an examination. In KATTS, all three components must be present and in proper balance to maximize a test taker's score. By strengthening not just one but all of these components, graduates can improve their overall test scores significantly. Suggested strategies for strengthening each component of KATTS are provided. This framework has been used successfully in designing remedial tutoring programs and in assisting first-time NCLEX test takers in preparing for the licensing examination.
Hamilton-Craig, Christian R; Chow, Clara K; Younger, John F; Jelinek, V M; Chan, Jonathan; Liew, Gary Yh
2017-10-16
Introduction This article summarises the Cardiac Society of Australia and New Zealand position statement on coronary artery calcium (CAC) scoring. CAC scoring is a non-invasive method for quantifying coronary artery calcification using computed tomography. It is a marker of atherosclerotic plaque burden and the strongest independent predictor of future myocardial infarction and mortality. CAC scoring provides incremental risk information beyond traditional risk calculators such as the Framingham Risk Score. Its use for risk stratification is confined to primary prevention of cardiovascular events, and can be considered as individualised coronary risk scoring for intermediate risk patients, allowing reclassification to low or high risk based on the score. Medical practitioners should carefully counsel patients before CAC testing, which should only be undertaken if an alteration in therapy, including embarking on pharmacotherapy, is being considered based on the test result. Main recommendations CAC scoring should primarily be performed on individuals without coronary disease aged 45-75 years (absolute 5-year cardiovascular risk of 10-15%) who are asymptomatic. CAC scoring is also reasonable in lower risk groups (absolute 5-year cardiovascular risk, < 10%) where risk scores traditionally underestimate risk (eg, family history of premature CVD) and in patients with diabetes aged 40-60 years. We recommend aspirin and a high efficacy statin in high risk patients, defined as those with a CAC score ≥ 400, or a CAC score of 100-399 and above the 75th percentile for age and sex. It is reasonable to treat patients with CAC scores ≥ 100 with aspirin and a statin. It is reasonable not to treat asymptomatic patients with a CAC score of zero. Changes in management as a result of this statement Cardiovascular risk is reclassified according to CAC score. High risk patients are treated with a high efficacy statin and aspirin. Very low risk patients (ie, CAC score of zero) do not benefit from treatment.
Hilton, C; Fisher, W; Lopez, A; Sanders, C
1997-09-01
To design and test a simple, easily modifiable system for calculating faculty productivity in teaching, research, administration, and patient care in which all areas of endeavor would be recognized and high productivity in one area would produce results similar to high productivity in another at the Louisiana State University School of Medicine in New Orleans. A relative-value and time-based system was designed in 1996 so that similar efforts in the four areas would produce similar scores, and a profile reflecting the authors' estimates of high productivity ("super faculty") was developed for each area. The activity profiles of 17 faculty members were used to test the system. "Super-faculty" scores in all areas were similar. The faculty members' mean scores were higher for teaching and research than for administration and patient care, and all four mean scores were substantially lower than the respective totals for the "super faculty". In each category the scores of those faculty members who scored above the mean in that category were used to calculate new mean scores. The mean scores for these faculty members were similar to those for the "super faculty" in teaching and research but were substantially lower for administration and patient care. When the mean total score of the eight faculty members predicted to have total scores below the group mean was compared with the mean total score of the nine faculty members predicted to have total scores above the group mean, the difference was significant (p < .0001). For the former, every score in each category was below the mean, with the exception of one faculty member's score in one category. Of the latter, eight had higher scores in teaching and four had higher scores in teaching and research combined. This system provides a quantitative method for the equal recognition of faculty productivity in a number of areas, and it may be useful as a starting point for other academic units exploring similar issues.
Left behind by Design: Proficiency Counts and Test-Based Accountability. Working Paper
ERIC Educational Resources Information Center
Neal, Derek; Schanzenbach, Diane Whitmore
2009-01-01
Many test-based accountability systems, including the No Child Left Behind Act of 2001 (NCLB), place great weight on the numbers of students who score at or above specified proficiency levels in various subjects. Accountability systems based on these metrics often provide incentives for teachers and principals to target children near current…
ERIC Educational Resources Information Center
Neal, Derek; Schanzenbach, Diane Whitmore
2007-01-01
Many test-based accountability systems, including the No Child Left Behind Act of 2001 (NCLB), place great weight on the numbers of students who score at or above specified proficiency levels in various subjects. Accountability systems based on these metrics often provide incentives for teachers and principals to target children near current…
NASA Astrophysics Data System (ADS)
Powers, Angela R.
2000-10-01
This study explored the relationship between secondary chemistry students' conceptual representations of acid-base chemistry, as shown in student-constructed concept maps, and their ability to solve acid-base problems, represented by their score on an 18-item paper and pencil test, the Acid-Base Concept Assessment (ABCA). The ABCA, consisting of both multiple-choice and short-answer items, was originally designed using a question-type by subtopic matrix, validated by a panel of experts, and refined through pilot studies and factor analysis to create the final instrument. The concept map task included a short introduction to concept mapping, a prototype concept map, a practice concept-mapping activity, and the instructions for the acid-base concept map task. The instruments were administered to chemistry students at two high schools; 108 subjects completed both instruments for this study. Factor analysis of ABCA results indicated that the test was unifactorial for these students, despite the intention to create an instrument with multiple "question-type" scales. Concept maps were scored both holistically and by counting valid concepts. The two approaches were highly correlated (r = 0.75). The correlation between ABCA score and concept-map score was 0.29 for holistically-scored concept maps and 0.33 for counted-concept maps. Although both correlations were significant, they accounted for only 8.8 and 10.2% of variance in ABCA scores, respectively. However, when the reliability of the instruments used is considered, more than 20% of the variance in ABCA scores may be explained by concept map scores. MANOVAs for ABCA and concept map scores by instructor, student gender, and year in school showed significant differences for both holistic and counted concept-map scores. Discriminant analysis revealed that the source of these differences was the instruction variable. Significant differences between classes receiving different instruction were found in the frequency of concepts listed by students for 9 of 10 concepts evaluated. Mean ABCA scores did not differ significantly between the two instruction groups. The results of this study failed to provide evidence of conceptual distinctions among different "types" of problem-solving items. The results suggested that several factors influence success in chemistry problem solving, including concept knowledge and organization. Further research into the nature of chemistry problems and problem solving is recommended.
Derakhshandeh, Zahra; Amini, Mitra; Kojuri, Javad; Dehbozorgian, Marziyeh
2018-01-01
Clinical reasoning is one of the most important skills in the process of training a medical student to become an efficient physician. Assessment of the reasoning skills in a medical school program is important to direct students' learning. One of the tests for measuring the clinical reasoning ability is Clinical Reasoning Problems (CRPs). The major aim of this study is to measure psychometric qualities of CRPs and define correlation between this test and routine MCQ in cardiology department of Shiraz medical school. This study was a descriptive study conducted on total cardiology residents of Shiraz Medical School. The study population consists of 40 residents in 2014. The routine CRPs and the MCQ tests was designed based on similar objectives and were carried out simultaneously. Reliability, item difficulty, item discrimination, and correlation between each item and the total score of CRPs were all measured by Excel and SPSS software for checking psycometeric CRPs test. Furthermore, we calculated the correlation between CRPs test and MCQ test. The mean differences of CRPs test score between residents' academic year [second, third and fourth year] were also evaluated by Analysis of variances test (One Way ANOVA) using SPSS software (version 20)(α=0.05). The mean and standard deviation of score in CRPs was 10.19 ±3.39 out of 20; in MCQ, it was 13.15±3.81 out of 20. Item difficulty was in the range of 0.27-0.72; item discrimination was 0.30-0.75 with question No.3 being the exception (that was 0.24). The correlation between each item and the total score of CRP was 0.26-0.87; the correlation between CRPs test and MCQ test was 0.68 (p<0.001). The reliability of the CRPs was 0.72 as calculated by using Cronbach's alpha. The mean score of CRPs was different among residents based on their academic year and this difference was statistically significant (p<0.001). The results of this present investigation revealed that CRPs could be reliable test for measuring clinical reasoning in residents. It can be included in cardiology residency assessment programs.
A Method for the Alignment of Heterogeneous Macromolecules from Electron Microscopy
Shatsky, Maxim; Hall, Richard J.; Brenner, Steven E.; Glaeser, Robert M.
2009-01-01
We propose a feature-based image alignment method for single-particle electron microscopy that is able to accommodate various similarity scoring functions while efficiently sampling the two-dimensional transformational space. We use this image alignment method to evaluate the performance of a scoring function that is based on the Mutual Information (MI) of two images rather than one that is based on the cross-correlation function. We show that alignment using MI for the scoring function has far less model-dependent bias than is found with cross-correlation based alignment. We also demonstrate that MI improves the alignment of some types of heterogeneous data, provided that the signal to noise ratio is relatively high. These results indicate, therefore, that use of MI as the scoring function is well suited for the alignment of class-averages computed from single particle images. Our method is tested on data from three model structures and one real dataset. PMID:19166941
Academic performance in adolescents born after ART-a nationwide registry-based cohort study.
Spangmose, A L; Malchau, S S; Schmidt, L; Vassard, D; Rasmussen, S; Loft, A; Forman, J; Pinborg, A
2017-02-01
Is academic performance in adolescents aged 15-16 years and conceived after ART, measured as test scores in ninth grade, comparable to that for spontaneously conceived (SC) adolescents? ART singletons had a significantly lower mean test score in the adjusted analysis when compared with SC singletons, yet the differences were small and probably not of clinical relevance. Previous studies have shown similar intelligence quotient (IQ) levels in ART and SC children, but only a few have been on adolescents. Academic performance measured with standardized national tests has not previously been explored in a complete national cohort of adolescents conceived after ART. A Danish national registry-based cohort including all 4766 ART adolescents (n = 2836 singletons and n = 1930 twins) born in 1995-1998 were compared with two SC control cohorts: a randomly selected singleton population (n = 5660) and all twins (n = 7064) born from 1995 to 1998 in Denmark. Nine children who died during the follow-up period were excluded from the study. Mean test scores on a 7-point-marking scale from -3 to 12 were compared, and adjustments were made for relevant reproductive and socio-demographic covariates including occupational and educational level of the parents. The crude mean test score was higher in both ART singletons and ART twins compared with SC adolescents. The crude mean differences were +0.41 (95% CI 0.30-0.53) and +0.45 (95% CI 0.28-0.62) between ART and SC singletons and between ART and SC twins, respectively. However, the adjusted mean overall test score was significantly lower for ART singletons compared with SC singletons (adjusted mean difference -0.15 (95% CI -0.29-(-0.02))). For comparison, the adjusted mean difference was +2.05 (95% CI 1.82-2.28) between the highest and the lowest parental educational level, suggesting that the effect of ART is weak compared with the conventional predictors. The adjusted analyses showed significantly lower mean test scores in mathematics and physics/chemistry for ART singletons compared with SC singletons. Comparing ART twins with SC twins yielded no difference in academic performance in the adjusted analyses. Similar crude and adjusted overall mean test scores were found when comparing ART singletons and ART twins. Missing data on educational test scores occurred in 6.6% of adolescents aged 15-16 years for the birth cohorts 1995-1997, where all of the children according to their age should have passed the ninth grade exam at the time of data retrieval. As sensitivity analyses yielded no significant difference in the adjusted risk of having missing test scores between any of the groups, it is unlikely that this should bias our results. Adjustment for body mass index and smoking during pregnancy was not possible. As our results are based on national data, our findings can be applied to other populations. The findings of this paper suggest that a possible small negative effect of parental subfertility or ART treatment is counterbalanced by the higher educational level in the ART parents. The Danish Medical Association in Copenhagen (KMS) funded this study with a scholarship grant. None of the authors had any competing interests. 704676. © The Author 2017. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Exploring the gender gap in the conceptual survey of electricity and magnetism
NASA Astrophysics Data System (ADS)
Henderson, Rachel; Stewart, Gay; Stewart, John; Michaluk, Lynnette; Traxler, Adrienne
2017-12-01
The "gender gap" on various physics conceptual evaluations has been extensively studied. Men's average pretest scores on the Force Concept Inventory and Force and Motion Conceptual Evaluation are 13% higher than women's, and post-test scores are on average 12% higher than women's. This study analyzed the gender differences within the Conceptual Survey of Electricity and Magnetism (CSEM) in which the gender gap has been less well studied and is less consistent. In the current study, data collected from 1407 students (77% men, 23% women) in a calculus-based physics course over ten semesters showed that male students outperformed female students on the CSEM pretest (5%) and post-test (6%). Separate analyses were conducted for qualitative and quantitative problems on lab quizzes and course exams and showed that male students outperformed female students by 3% on qualitative quiz and exam problems. Male and female students performed equally on the quantitative course exam problems. The gender gaps within CSEM post-test scores, qualitative lab quiz scores, and qualitative exam scores were insignificant for students with a CSEM pretest score of 25% or less but grew as pretest scores increased. Structural equation modeling demonstrated that a latent variable, called Conceptual Physics Performance/Non-Quantitative (CPP/NonQnt), orthogonal to quantitative test performance was useful in explaining the differences observed in qualitative performance; this variable was most strongly related to CSEM post-test scores. The CPP/NonQnt of male students was 0.44 standard deviations higher than female students. The CSEM pretest measured CPP/NonQnt much less accurately for women (R2=4 % ) than for men (R2=17 % ). The failure to detect a gender gap for students scoring 25% or less on the pretest suggests that the CSEM instrument itself is not gender biased. The failure to find a performance difference in quantitative test performance while detecting a gap in qualitative performance suggests the qualitative differences do not result from psychological factors such as science anxiety or stereotype threat.
Age- and Gender-Specific Normative Information from Children Assessed with a Dichotic Words Test.
Moncrieff, Deborah
2015-01-01
The most widely used assessment in the clinical auditory processing disorder (APD) battery is the dichotic listening test. New tests with normative information are helpful for assessment and cross-check of results for reliable diagnosis. The Dichotic Words Test was developed for use in the clinical test battery for diagnosis of APD. The test stimuli were common single syllable words matched for average root-mean-square amplitude and each pair was temporally aligned at both onset and offset. The study was conducted to collect performance results from typically developing children to create normative information for the test. The study follows a cross-sectional design. Typically developing children (n = 416) between the ages of 5 and 12 yr were recruited from schools in the community. There were 217 males and 199 females in the study sample. Only children who passed a hearing screening were eligible to participate. Scores for each ear were recorded during administration of the first free recall version of the test. Ear advantages based on results recorded for left and right ears were used to measure prevalence of right, left, and no ear advantages. Results for each listener's dominant and non-dominant ears and the absolute difference between them were put into the data analysis. Results were analyzed for normality and because no results were normally distributed, all further analyses were done with nonparametric statistical tests. Normative data for dominant and non-dominant ear scores and ear advantages were determined at the 95% confidence interval through bootstrapping methods with 1,000 samples. Children were divided into four age groups based on results in their dominant ears. Females generally performed better than males and the prevalence of a right-ear advantage was ∼60% across all children tested. Normative lower-bound cut-off scores were established for males and females within each age group for dominant and non-dominant ear scores. Normative upper-bound cut-off scores were established for males and females within each age group for ear advantage scores. Normative information specific to age group and gender will be useful in clinical assessment for APD. Prevalence of left-ear advantage results in the sample may have been partly due to uncontrolled influences of voice-onset time in arranging the dichotic pairs. American Academy of Audiology.
Karimi, Zahra; Dehkordi, Mahnaz Aliakbari; Alipour, Ahmad; Mohtashami, Tayebeh
2018-03-01
Premenstrual syndrome (PMS) consists of repetitious physical and psychological symptoms. The symptoms occur during the luteal phase of the menstrual period and cease when the menstrual period starts. This study included pre-test and post-test experiments between a control group and a test group. The statistical population involved 40 females, chosen based on multistage cluster sampling. The participants were then divided into four groups to undergo treatment with calcium supplement plus vitamin D together with cognitive behavioral therapy (CBT), and were screened with the Premenstrual Syndrome Screening Test (PSST). The pre-test and post-test scores in the PSST, the General Health Questionnaire (GHQ-28), and Bell's Adjustment Inventory (BAI) were used as assessment tools (p < .05). According to the parameters of PMS symptoms, when evaluating the pre-test and post-test scores, the overall score of each individual in the experimental group was improved and a significant effect for the combination of calcium supplement plus vitamin D together with CBT was observed in comparison to the post-test control group. A comparison of multivariate analysis of covariance (MANCOVA) results collected from the pre-test and post-test scores revealed that the method of treatment was beneficial for PMS, adjustment, and general health. © 2018 The Institute of Psychology, Chinese Academy of Sciences and John Wiley & Sons Australia, Ltd.
Hong, Hye Jeong; Kim, Jin Sung; Seo, Wan Seok; Koo, Bon Hoon; Bai, Dai Seg; Jeong, Jin Young
2010-01-01
Objective We investigated executive functions (EFs), as evaluated by the Wisconsin Card Sorting Test (WCST), and other EF between lower grades (LG) and higher grades (HG) in elementary-school-age attention deficit hyperactivity disorder (ADHD) children. Methods We classified a sample of 112 ADHD children into 4 groups (composed of 28 each) based on age (LG vs. HG) and WCST performance [lower vs. higher performance on WCST, defined by the number of completed categories (CC)] Participants in each group were matched according to age, gender, ADHD subtype, and intelligence. We used the Wechsler intelligence Scale for Children 3rd edition to test intelligence and the Computerized Neurocognitive Function Test-IV, which included the WCST, to test EF. Results Comparisons of EFs scores in LG ADHD children showed statistically significant differences in performing digit spans backward, some verbal learning scores, including all memory scores, and Stroop test scores. However, comparisons of EF scores in HG ADHD children did not show any statistically significant differences. Correlation analyses of the CC and EF variables and stepwise multiple regression analysis in LG ADHD children showed a combination of the backward form of the Digit span test and Visual span test in lower-performance ADHD participants significantly predicted the number of CC (R2=0.273, p<0.001). Conclusion This study suggests that the design of any battery of neuropsychological tests for measuring EF in ADHD children should first consider age before interpreting developmental variations and neuropsychological test results. Researchers should consider the dynamics of relationships within EF, as measured by neuropsychological tests. PMID:20927306
Bakker, Marjan; Wicherts, Jelte M
2014-09-01
In psychology, outliers are often excluded before running an independent samples t test, and data are often nonnormal because of the use of sum scores based on tests and questionnaires. This article concerns the handling of outliers in the context of independent samples t tests applied to nonnormal sum scores. After reviewing common practice, we present results of simulations of artificial and actual psychological data, which show that the removal of outliers based on commonly used Z value thresholds severely increases the Type I error rate. We found Type I error rates of above 20% after removing outliers with a threshold value of Z = 2 in a short and difficult test. Inflations of Type I error rates are particularly severe when researchers are given the freedom to alter threshold values of Z after having seen the effects thereof on outcomes. We recommend the use of nonparametric Mann-Whitney-Wilcoxon tests or robust Yuen-Welch tests without removing outliers. These alternatives to independent samples t tests are found to have nominal Type I error rates with a minimal loss of power when no outliers are present in the data and to have nominal Type I error rates and good power when outliers are present. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Mac Giolla Phadraig, Caoimhin; Guerin, Suzanne; Nunn, June
2013-04-01
To assess the impact of a multi-tiered oral health education programme on care staff caring for people with intellectual disability (ID). Postal questionnaires were sent to all care staff of a community-based residential care service for adults, randomly assigned to control and intervention groups. A specifically developed training programme was delivered to residential staff nominees, who then trained all staff within the intervention group. The control group received no training. Post-test questionnaires were sent to both groups. Paired-samples t-test was used to compare oral health-related knowledge (K) and behaviour, attitude and self-efficacy (BAS) scores. Of the initial 219 respondents, 154 (response rate between 40% and 35.8%, with attrition rate of 29.7% from baseline to repeat) returned completed questionnaires at post-test (M=8.5 months, range=6.5-11 months). Control and intervention groups were comparable for general training, employment and demographic variables. In the intervention group, mean Knowledge Index score rose from K=7.2 to K=7.9 (P<0.001) and mean BAS scale score rose from BAS=4.7 to BAS=5.4 (P<0.001). There was no statistically significant increase in mean scores from test (K=7.0, BAS=4.7) to post-test (K=7.2, BAS=4.9) for the control group. Mean scores regarding knowledge, attitude, self-efficacy and reported behaviour increased significantly at 8.5 months in staff where training was provided. The results indicate that a multi-tiered training programme improved knowledge, attitude, self-efficacy and reported behaviour amongst staff caring for people with ID. © 2012 John Wiley & Sons A/S.
Learning and Motivational Impacts of a Multimedia Science Game
ERIC Educational Resources Information Center
Miller, Leslie M.; Chang, Ching-I.; Wang, Shu; Beier, Margaret E.; Klisch, Yvonne
2011-01-01
The power of a web-based forensic science game to teach content and motivate STEM careers was tested among secondary students. More than 700 secondary school students were exposed to one of the three web-based forensic cases for approximately 60 min. Gain scores from pre-test to a delayed post-test indicated significant gains in content knowledge.…
Cardiac Society of Australia and New Zealand Position Statement: Coronary Artery Calcium Scoring.
Liew, Gary; Chow, Clara; van Pelt, Niels; Younger, John; Jelinek, Michael; Chan, Jonathan; Hamilton-Craig, Christian
2017-12-01
Coronary Artery Calcium Scoring (CAC) is a non-invasive quantitation of coronary artery calcification using computed tomography (CT). It is a marker of atherosclerotic plaque burden and an independent predictor of future myocardial infarction and mortality. Coronary Artery Calcium Scoring provides incremental risk information beyond traditional risk calculators (eg. Framingham Risk Score). Its use for risk stratification is confined to primary prevention of cardiovascular events, and can be considered as "individualised coronary risk scoring" for those not considered to be of high or low risk. Medical practitioners should carefully counsel patients prior to CAC. Coronary Artery Calcium Scoring should only be undertaken if an alteration in therapy including embarking on pharmacotherapy is being considered based on the test result. Patient Groups to Consider Coronary Calcium Scoring: Patient Groups in Whom Coronary Calcium Scoring Should Not be Considered: Coronary Artery Calcium Scoring is not recommended for patients who are: Interpretation of CAC CAC=0 A zero score confers a very low risk of death, <1% at 10 years. CAC=1-100 Low risk, <10% CAC=101-400 Intermediate risk, 10-20% CAC=101-400 & >75th centile. Moderately high risk, 15-20% CAC >400 High risk, >20% Management Recommendations Based on CAC Optimal diet and lifestyle measures are encouraged in all risk groups and form the basis of primary prevention strategies. Patients with moderately-high or high risk based on CAC score are recommended to receive preventative medical therapy such as aspirin and statins. The evidence for pharmacotherapy is less robust in patients at intermediate levels of CAC 100-400, with modest benefit for aspirin use; though statins may be reasonable if they are above 75th centile. Aspirin and statins are generally not recommended in patients with CAC <100. Repeat CAC Testing In patients with a CAC of 0, a repeat CAC may be considered in 5 years but not sooner. In patients with positive calcium score, routine re-scanning is not currently recommended. However, an annual increase in CAC of >15% or annual increase of CAC >100 units are predictive of future myocardial infarction and mortality. Cost Effectiveness of CAC Based Primary Prevention Recommendations: There is currently no data in Australia and New Zealand that CAC is cost-effective in informing primary prevention decisions. Given the cost of testing is currently borne entirely by the patient, discussion regarding the implications of CAC results should occur before CAC is recommended and undertaken. Copyright © 2017 Australian and New Zealand Society of Cardiac and Thoracic Surgeons (ANZSCTS) and the Cardiac Society of Australia and New Zealand (CSANZ). Published by Elsevier B.V. All rights reserved.
Predictive power of the grace score in population with diabetes.
Baeza-Román, Anna; de Miguel-Balsa, Eva; Latour-Pérez, Jaime; Carrillo-López, Andrés
2017-12-01
Current clinical practice guidelines recommend risk stratification in patients with acute coronary syndrome (ACS) upon admission to hospital. Diabetes mellitus (DM) is widely recognized as an independent predictor of mortality in these patients, although it is not included in the GRACE risk score. The objective of this study is to validate the GRACE risk score in a contemporary population and particularly in the subgroup of patients with diabetes, and to test the effects of including the DM variable in the model. Retrospective cohort study in patients included in the ARIAM-SEMICYUC registry, with a diagnosis of ACS and with available in-hospital mortality data. We tested the predictive power of the GRACE score, calculating the area under the ROC curve. We assessed the calibration of the score and the predictive ability based on type of ACS and the presence of DM. Finally, we evaluated the effect of including the DM variable in the model by calculating the net reclassification improvement. The GRACE score shows good predictive power for hospital mortality in the study population, with a moderate degree of calibration and no significant differences based on ACS type or the presence of DM. Including DM as a variable did not add any predictive value to the GRACE model. The GRACE score has an appropriate predictive power, with good calibration and clinical applicability in the subgroup of diabetic patients. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
ERIC Educational Resources Information Center
Brooks, Brian L.
2010-01-01
Low scores across a battery of tests are common in healthy people and vary by demographic characteristics. The purpose of the present article was to present the base rates of low scores for the Wechsler Intelligence Scale for Children, fourth edition (WISC-IV; D. Wechsler, 2003). Participants included 2,200 children and adolescents between 6 and…
2013-01-01
Background All rigorous primary cardiovascular disease (CVD) prevention guidelines recommend absolute CVD risk scores to identify high- and low-risk patients, but laboratory testing can be impractical in low- and middle-income countries. The purpose of this study was to compare the ranking performance of a simple, non-laboratory-based risk score to laboratory-based scores in various South African populations. Methods We calculated and compared 10-year CVD (or coronary heart disease (CHD)) risk for 14,772 adults from thirteen cross-sectional South African populations (data collected from 1987 to 2009). Risk characterization performance for the non-laboratory-based score was assessed by comparing rankings of risk with six laboratory-based scores (three versions of Framingham risk, SCORE for high- and low-risk countries, and CUORE) using Spearman rank correlation and percent of population equivalently characterized as ‘high’ or ‘low’ risk. Total 10-year non-laboratory-based risk of CVD death was also calculated for a representative cross-section from the 1998 South African Demographic Health Survey (DHS, n = 9,379) to estimate the national burden of CVD mortality risk. Results Spearman correlation coefficients for the non-laboratory-based score with the laboratory-based scores ranged from 0.88 to 0.986. Using conventional thresholds for CVD risk (10% to 20% 10-year CVD risk), 90% to 92% of men and 94% to 97% of women were equivalently characterized as ‘high’ or ‘low’ risk using the non-laboratory-based and Framingham (2008) CVD risk score. These results were robust across the six risk scores evaluated and the thirteen cross-sectional datasets, with few exceptions (lower agreement between the non-laboratory-based and Framingham (1991) CHD risk scores). Approximately 18% of adults in the DHS population were characterized as ‘high CVD risk’ (10-year CVD death risk >20%) using the non-laboratory-based score. Conclusions We found a high level of correlation between a simple, non-laboratory-based CVD risk score and commonly-used laboratory-based risk scores. The burden of CVD mortality risk was high for men and women in South Africa. The policy and clinical implications are that fast, low-cost screening tools can lead to similar risk assessment results compared to time- and resource-intensive approaches. Until setting-specific cohort studies can derive and validate country-specific risk scores, non-laboratory-based CVD risk assessment could be an effective and efficient primary CVD screening approach in South Africa. PMID:23880010
Hammer, Joseph H; Brenner, Rachel E
2017-07-14
This study extended our theoretical and applied understanding of gratitude through a psychometric examination of the most popular multidimensional measure of gratitude, the Gratitude, Resentment, and Appreciation Test-Revised Short form (GRAT-RS). Namely, the dimensionality of the GRAT-RS, the model-based reliability of the GRAT-RS total score and 3 subscale scores, and the incremental evidence of validity for its latent factors were assessed. Dimensionality measures (e.g., explained common variance) and confirmatory factor analysis results with 426 community adults indicated that the GRAT-RS conformed to a multidimensional (bifactor) structure. Model-based reliability measures (e.g., omega hierarchical) provided support for the future use of the Lack of a Sense of Deprivation raw subscale score, but not for the raw GRAT-RS total score, Simple Appreciation subscale score, or Appreciation of Others subscale score. Structural equation modeling results indicated that only the general gratitude factor and the lack of a sense of deprivation specific factor accounted for significant variance in life satisfaction, positive affect, and distress. These findings support the 3 pillars of gratitude conceptualization of gratitude over competing conceptualizations, the position that the specific forms of gratitude are theoretically distinct, and the argument that appreciation is distinct from the superordinate construct of gratitude.
Shi, Xiaohu; Zhang, Jingfen; He, Zhiquan; Shang, Yi; Xu, Dong
2011-09-01
One of the major challenges in protein tertiary structure prediction is structure quality assessment. In many cases, protein structure prediction tools generate good structural models, but fail to select the best models from a huge number of candidates as the final output. In this study, we developed a sampling-based machine-learning method to rank protein structural models by integrating multiple scores and features. First, features such as predicted secondary structure, solvent accessibility and residue-residue contact information are integrated by two Radial Basis Function (RBF) models trained from different datasets. Then, the two RBF scores and five selected scoring functions developed by others, i.e., Opus-CA, Opus-PSP, DFIRE, RAPDF, and Cheng Score are synthesized by a sampling method. At last, another integrated RBF model ranks the structural models according to the features of sampling distribution. We tested the proposed method by using two different datasets, including the CASP server prediction models of all CASP8 targets and a set of models generated by our in-house software MUFOLD. The test result shows that our method outperforms any individual scoring function on both best model selection, and overall correlation between the predicted ranking and the actual ranking of structural quality.
Systematic Review of Plant-Based Homeopathic Basic Research: An Update.
Ücker, Annekathrin; Baumgartner, Stephan; Sokol, Anezka; Huber, Roman; Doesburg, Paul; Jäger, Tim
2018-05-01
Plant-based test systems have been described as a useful tool for investigating possible effects of homeopathic preparations. The last reviews of this research field were published in 2009/2011. Due to recent developments in the field, an update is warranted. Publications on plant-based test systems were analysed with regard to publication quality, reproducibility and potential for further research. A literature search was conducted in online databases and specific journals, including publications from 2008 to 2017 dealing with plant-based test systems in homeopathic basic research. To be included, they had to contain statistical analysis and fulfil quality criteria according to a pre-defined manuscript information score (MIS). Publications scoring at least 5 points (maximum 10 points) were assumed to be adequate. They were analysed for the use of adequate controls, outcome and reproducibility. Seventy-four publications on plant-based test systems were found. Thirty-nine publications were either abstracts or proceedings of conferences and were excluded. From the remaining 35 publications, 26 reached a score of 5 or higher in the MIS. Adequate controls were used in 13 of these publications. All of them described specific effects of homeopathic preparations. The publication quality still varied: a substantial number of publications (23%) did not adequately document the methods used. Four reported on replication trials. One replication trial found effects of homeopathic preparations comparable to the original study. Three replication trials failed to confirm the original study but identified possible external influencing factors. Five publications described novel plant-based test systems. Eight trials used systematic negative control experiments to document test system stability. Regarding research design, future trials should implement adequate controls to identify specific effects of homeopathic preparations and include systematic negative control experiments. Further external and internal replication trials, and control of influencing factors, are needed to verify results. Standardised test systems should be developed. The Faculty of Homeopathy.
A prognostic scoring system for arm exercise stress testing.
Xie, Yan; Xian, Hong; Chandiramani, Pooja; Bainter, Emily; Wan, Leping; Martin, Wade H
2016-01-01
Arm exercise stress testing may be an equivalent or better predictor of mortality outcome than pharmacological stress imaging for the ≥50% for patients unable to perform leg exercise. Thus, our objective was to develop an arm exercise ECG stress test scoring system, analogous to the Duke Treadmill Score, for predicting outcome in these individuals. In this retrospective observational cohort study, arm exercise ECG stress tests were performed in 443 consecutive veterans aged 64.1 (11.1) years. (mean (SD)) between 1997 and 2002. From multivariate Cox models, arm exercise scores were developed for prediction of 5-year and 12-year all-cause and cardiovascular mortality and 5-year cardiovascular mortality or myocardial infarction (MI). Arm exercise capacity in resting metabolic equivalents (METs), 1 min heart rate recovery (HRR) and ST segment depression ≥1 mm were the stress test variables independently associated with all-cause and cardiovascular mortality by step-wise Cox analysis (all p<0.01). A score based on the relation HRR (bpm)+7.3×METs-10.5×ST depression (0=no; 1=yes) prognosticated 5-year cardiovascular mortality with a C-statistic of 0.81 before and 0.88 after adjustment for significant demographic and clinical covariates. Arm exercise scores for the other outcome end points yielded C-statistic values of 0.77-0.79 before and 0.82-0.86 after adjustment for significant covariates versus 0.64-0.72 for best fit pharmacological myocardial perfusion imaging models in a cohort of 1730 veterans who were evaluated over the same time period. Arm exercise scores, analogous to the Duke Treadmill Score, have good power for prediction of mortality or MI in patients who cannot perform leg exercise.
Haley, Stephen M; Fragala-Pinkham, Maria; Ni, Pengsheng
2006-07-01
To examine the relative sensitivity to detect functional mobility changes with a full-length parent questionnaire compared with a computerized adaptive testing version of the questionnaire after a 16-week group fitness programme. Prospective, pre- and posttest study with a 16-week group fitness intervention. Three community-based fitness centres. Convenience sample of children (n = 28) with physical or developmental disabilities. A 16-week group exercise programme held twice a week in a community setting. A full-length (161 items) paper version of a mobility parent questionnaire based on the Pediatric Evaluation of Disability Inventory, but expanded to include expected skills of children up to 15 years old was compared with a 15-item computer adaptive testing version. Both measures were administered at pre- and posttest intervals. Both the full-length Pediatric Evaluation of Disability Inventory and the 15-item computer adaptive testing version detected significant changes between pre- and posttest scores, had large effect sizes, and standardized response means, with a modest decrease in the computer adaptive test as compared with the 161-item paper version. Correlations between the computer adaptive and paper formats across pre- and posttest scores ranged from r = 0.76 to 0.86. Both functional mobility test versions were able to detect positive functional changes at the end of the intervention period. Greater variability in score estimates was generated by the computerized adaptive testing version, which led to a relative reduction in sensitivity as defined by the standardized response mean. Extreme scores were generally more difficult for the computer adaptive format to estimate with as much accuracy as scores in the mid-range of the scale. However, the reduction in accuracy and sensitivity, which did not influence the group effect results in this study, is counterbalanced by the large reduction in testing burden.
Barnes, Deborah E; Cenzer, Irena S; Yaffe, Kristine; Ritchie, Christine S; Lee, Sei J
2014-11-01
Our objective in this study was to develop a point-based tool to predict conversion from amnestic mild cognitive impairment (MCI) to probable Alzheimer's disease (AD). Subjects were participants in the first part of the Alzheimer's Disease Neuroimaging Initiative. Cox proportional hazards models were used to identify factors associated with development of AD, and a point score was created from predictors in the final model. The final point score could range from 0 to 9 (mean 4.8) and included: the Functional Assessment Questionnaire (2‒3 points); magnetic resonance imaging (MRI) middle temporal cortical thinning (1 point); MRI hippocampal subcortical volume (1 point); Alzheimer's Disease Cognitive Scale-cognitive subscale (2‒3 points); and the Clock Test (1 point). Prognostic accuracy was good (Harrell's c = 0.78; 95% CI 0.75, 0.81); 3-year conversion rates were 6% (0‒3 points), 53% (4‒6 points), and 91% (7‒9 points). A point-based risk score combining functional dependence, cerebral MRI measures, and neuropsychological test scores provided good accuracy for prediction of conversion from amnestic MCI to AD. Copyright © 2014 The Alzheimer's Association. All rights reserved.
Cook, David A; Gelula, Mark H; Dupras, Denise M; Schwartz, Alan
2007-09-01
Adapting web-based (WB) instruction to learners' individual differences may enhance learning. Objectives This study aimed to investigate aptitude-treatment interactions between learning and cognitive styles and WB instructional methods. We carried out a factorial, randomised, controlled, crossover, post-test-only trial involving 89 internal medicine residents, family practice residents and medical students at 2 US medical schools. Parallel versions of a WB course in complementary medicine used either active or reflective questions and different end-of-module review activities ('create and study a summary table' or 'study an instructor-created table'). Participants were matched or mismatched to question type based on active or reflective learning style. Participants used each review activity for 1 course module (crossover design). Outcome measurements included the Index of Learning Styles, the Cognitive Styles Analysis test, knowledge post-test, course rating and preference. Post-test scores were similar for matched (mean +/- standard error of the mean 77.4 +/- 1.7) and mismatched (76.9 +/- 1.7) learners (95% confidence interval [CI] for difference - 4.3 to 5.2l, P = 0.84), as were course ratings (P = 0.16). Post-test scores did not differ between active-type questions (77.1 +/- 2.1) and reflective-type questions (77.2 +/- 1.4; P = 0.97). Post-test scores correlated with course ratings (r = 0.45). There was no difference in post-test subscores for modules completed using the 'construct table' format (78.1 +/- 1.4) or the 'table provided' format (76.1 +/- 1.4; CI - 1.1 to 5.0, P = 0.21), and wholist and analytic styles had no interaction (P = 0.75) or main effect (P = 0.18). There was no association between activity preference and wholist or analytic scores (P = 0.37). Cognitive and learning styles had no apparent influence on learning outcomes. There were no differences in outcome between these instructional methods.
Personalized Risk Scoring for Critical Care Prognosis Using Mixtures of Gaussian Processes.
Alaa, Ahmed M; Yoon, Jinsung; Hu, Scott; van der Schaar, Mihaela
2018-01-01
In this paper, we develop a personalized real-time risk scoring algorithm that provides timely and granular assessments for the clinical acuity of ward patients based on their (temporal) lab tests and vital signs; the proposed risk scoring system ensures timely intensive care unit admissions for clinically deteriorating patients. The risk scoring system is based on the idea of sequential hypothesis testing under an uncertain time horizon. The system learns a set of latent patient subtypes from the offline electronic health record data, and trains a mixture of Gaussian Process experts, where each expert models the physiological data streams associated with a specific patient subtype. Transfer learning techniques are used to learn the relationship between a patient's latent subtype and her static admission information (e.g., age, gender, transfer status, ICD-9 codes, etc). Experiments conducted on data from a heterogeneous cohort of 6321 patients admitted to Ronald Reagan UCLA medical center show that our score significantly outperforms the currently deployed risk scores, such as the Rothman index, MEWS, APACHE, and SOFA scores, in terms of timeliness, true positive rate, and positive predictive value. Our results reflect the importance of adopting the concepts of personalized medicine in critical care settings; significant accuracy and timeliness gains can be achieved by accounting for the patients' heterogeneity. The proposed risk scoring methodology can confer huge clinical and social benefits on a massive number of critically ill inpatients who exhibit adverse outcomes including, but not limited to, cardiac arrests, respiratory arrests, and septic shocks.
Understanding Elementary Astronomy by Making Drawing-Based Models
NASA Astrophysics Data System (ADS)
van Joolingen, W. R.; Aukes, Annika V. A.; Gijlers, H.; Bollen, L.
2015-04-01
Modeling is an important approach in the teaching and learning of science. In this study, we attempt to bring modeling within the reach of young children by creating the SimSketch modeling system, which is based on freehand drawings that can be turned into simulations. This system was used by 247 children (ages ranging from 7 to 15) to create a drawing-based model of the solar system. The results show that children in the target age group are capable of creating a drawing-based model of the solar system and can use it to show the situations in which eclipses occur. Structural equation modeling predicting post-test knowledge scores based on learners' pre-test knowledge scores, the quality of their drawings and motivational aspects yielded some evidence that such drawing contributes to learning. Consequences for using modeling with young children are considered.
IRT Equating of the MCAT. MCAT Monograph.
ERIC Educational Resources Information Center
Hendrickson, Amy B.; Kolen, Michael J.
This study compared various equating models and procedures for a sample of data from the Medical College Admission Test(MCAT), considering how item response theory (IRT) equating results compare with classical equipercentile results and how the results based on use of various IRT models, observed score versus true score, direct versus linked…
[Prognostic scores for pulmonary embolism].
Junod, Alain
2016-03-23
Nine prognostic scores for pulmonary embolism (PE), based on retrospective and prospective studies, published between 2000 and 2014, have been analyzed and compared. Most of them aim at identifying PE cases with a low risk to validate their ambulatory care. Important differences in the considered outcomes: global mortality, PE-specific mortality, other complications, sizes of low risk groups, exist between these scores. The most popular score appears to be the PESI and its simplified version. Few good quality studies have tested the applicability of these scores to PE outpatient care, although this approach tends to already generalize in the medical practice.
Development of an unlicensed assistive personnel job screening test.
Newhouse, Robin P; Steinhauser, Michele; Berk, Ron
2007-01-01
Unlicensed assistive personnel (UAP) competency is essential to the quality and safety of patient care. The purpose of this study was to construct a test with acceptable estimates of reliability and validity to identify UAPs who could successfully perform their job in four essential knowledge-based domains (math, patient data collection, medical terminology, and reporting abnormal data). An investigator-developed test was constructed. Psychometric testing was completed by administering the test to 145 UAPs. A cut score of 17/23 resulted in 79.7% sensitivity and 70.4% specificity. There were significant differences in mean scores between masters and nonmasters (t = -13.70, df = 79, p < .00). Master status was significantly related to the ability to take a patient's blood pressure (Phi = .503, p < .00). A score of 17 or greater indicates that the applicant demonstrates competency of basic knowledge required for the position. The test can be used as a screening tool for UAPs in nurse recruitment before candidates advance to the unit for an interview.
Emergency Airway Response Team Simulation Training: A Nursing Perspective.
Crimlisk, Janet T; Krisciunas, Gintas P; Grillone, Gregory A; Gonzalez, R Mauricio; Winter, Michael R; Griever, Susan C; Fernandes, Eduarda; Medzon, Ron; Blansfield, Joseph S; Blumenthal, Adam
Simulation-based education is an important tool in the training of professionals in the medical field, especially for low-frequency, high-risk events. An interprofessional simulation-based training program was developed to enhance Emergency Airway Response Team (EART) knowledge, team dynamics, and personnel confidence. This quality improvement study evaluated the EART simulation training results of nurse participants. Twenty-four simulation-based classes of 4-hour sessions were conducted during a 12-week period. Sixty-three nurses from the emergency department (ED) and the intensive care units (ICUs) completed the simulation. Participants were evaluated before and after the simulation program with a knowledge-based test and a team dynamics and confidence questionnaire. Additional comparisons were made between ED and ICU nurses and between nurses with previous EART experience and those without previous EART experience. Comparison of presimulation (presim) and postsimulation (postsim) results indicated a statistically significant gain in both team dynamics and confidence and Knowledge Test scores (P < .01). There were no differences in scores between ED and ICU groups in presim or postsim scores; nurses with previous EART experience demonstrated significantly higher presim scores than nurses without EART experience, but there were no differences between these nurse groups at postsim. This project supports the use of simulation training to increase nurses' knowledge, confidence, and team dynamics in an EART response. Importantly, nurses with no previous experience achieved outcome scores similar to nurses who had experience, suggesting that emergency airway simulation is an effective way to train both new and experienced nurses.
Do collaborative practical tests encourage student-centered active learning of gross anatomy?
Green, Rodney A; Cates, Tanya; White, Lloyd; Farchione, Davide
2016-05-06
Benefits of collaborative testing have been identified in many disciplines. This study sought to determine whether collaborative practical tests encouraged active learning of anatomy. A gross anatomy course included a collaborative component in four practical tests. Two hundred and seven students initially completed the test as individuals and then worked as a team to complete the same test again immediately afterwards. The relationship between mean individual, team, and difference (between team and individual) test scores to overall performance on the final examination (representing overall learning in the course) was examined using regression analysis. The overall mark in the course increased by 9% with a decreased failure rate. There was a strong relationship between individual score and final examination mark (P < 0.001) but no relationship for team score (P = 0.095). A longitudinal analysis showed that the test difference scores increased after Test 1 which may be indicative of social loafing and this was confirmed by a significant negative relationship between difference score on Test 4 (indicating a weaker student) and final examination mark (P < 0.001). It appeared that for this cohort, there was little peer-to-peer learning occurring during the collaborative testing and that weaker students gained the benefit from team marks without significant active learning taking place. This negative outcome may be due to insufficient encouragement of the active learning strategies that were expected to occur during the collaborative testing process. An improved understanding of the efficacy of collaborative assessment could be achieved through the inclusion of questionnaire based data to allow a better interpretation of learning outcomes. Anat Sci Educ 9: 231-237. © 2015 American Association of Anatomists. © 2015 American Association of Anatomists.
Bakare, Muideen O; Ubochi, Vincent N; Okoroikpa, Ifeoma N; Aguocha, Chinyere M; Ebigbo, Peter O
2009-09-15
There may be need to assess intelligent quotient (IQ) scores in sub-Saharan African children with intellectual disability, either for the purpose of educational needs assessment or research. However, modern intelligence scales developed in the western parts of the world suffer limitation of widespread use because of the influence of socio-cultural variations across the world. This study examined the agreement between IQ scores estimation among Nigerian children with intellectual disability using clinicians' judgment based on International Classification of Diseases, tenth Edition(ICD - 10) criteria for mental retardation and caregivers judgment based on 'ratio IQ' scores calculated from estimated mental age in the context of socio-cultural milieu of the children. It proposed a viable option of IQ score assessment among sub-Saharan African children with intellectual disability, using a ratio of culture-specific estimated mental age and chronological age of the child in the absence of standardized alternatives, borne out of great diversity in socio-cultural context of sub-Saharan Africa. Clinicians and care-givers independently assessed the children in relation to their socio-cultural background. Clinicians assessed the IQ scores of the children based on the ICD - 10 diagnostic criteria for mental retardation. 'Ratio IQ' scores were calculated from the ratio of estimated mental age and chronological age of each child. The IQ scores as assessed by the clinicians were then compared with the 'ratio IQ' scores using correlation statistics. A total of forty-four (44) children with intellectual disability were assessed. There was a significant correlation between clinicians' assessed IQ scores and the 'ratio IQ' scores employing zero order correlation without controlling for the chronological age of the children (r = 0.47, df = 42, p = 0.001). First order correlation controlling for the chronological age of the children showed higher correlation score between clinicians' assessed IQ scores and 'ratio IQ' scores (r = 0.75, df = 41, p = 0.000). Agreement between clinicians' assessed IQ scores and 'ratio IQ' scores was good. 'Ratio IQ' test would provide a viable option of assessing IQ scores in sub-Saharan African children with intellectual disability in the absence of culture-appropriate standardized intelligence scales, which is often the case because of great diversity in socio-cultural structures of sub-Saharan Africa.
Scoring in genetically modified organism proficiency tests based on log-transformed results.
Thompson, Michael; Ellison, Stephen L R; Owen, Linda; Mathieson, Kenneth; Powell, Joanne; Key, Pauline; Wood, Roger; Damant, Andrew P
2006-01-01
The study considers data from 2 UK-based proficiency schemes and includes data from a total of 29 rounds and 43 test materials over a period of 3 years. The results from the 2 schemes are similar and reinforce each other. The amplification process used in quantitative polymerase chain reaction determinations predicts a mixture of normal, binomial, and lognormal distributions dominated by the latter 2. As predicted, the study results consistently follow a positively skewed distribution. Log-transformation prior to calculating z-scores is effective in establishing near-symmetric distributions that are sufficiently close to normal to justify interpretation on the basis of the normal distribution.
Beckstead, D Joel; Lambert, Michael J; DuBose, Anthony P; Linehan, Marsha
2015-12-01
This pilot study examined pre to post-change of patients in a substance use residential treatment center that incorporated Dialectical Behavior Therapy with specific cultural, traditional and spiritual practices for American Indian/Alaska Native adolescents. Specifically, the incorporation of cultural, spiritual and traditional practices was done while still maintaining fidelity to the evidence based treatment (DBT). 229 adolescents participated in the study and were given the Youth Outcome Questionnaire-Self-Report version at pre-treatment and post-treatment and the total scores were compared. The results of the research study showed that 96% of adolescents were either "recovered" or "improved" using clinical significant change criteria. Additionally, differences between the group's pre-test scores and post-test scores were statistically significant using a matched standard T-test comparison. Finally, the effect size that was calculated using Cohen's criteria was found to be large. The results are discussed in terms of the implication for integrating western and traditional based methods of care in addressing substance use disorders and other mental health disorders with American Indian/Alaska Native adolescents. Published by Elsevier Ltd.
Benkert, Pascal; Schwede, Torsten; Tosatto, Silvio Ce
2009-05-20
The selection of the most accurate protein model from a set of alternatives is a crucial step in protein structure prediction both in template-based and ab initio approaches. Scoring functions have been developed which can either return a quality estimate for a single model or derive a score from the information contained in the ensemble of models for a given sequence. Local structural features occurring more frequently in the ensemble have a greater probability of being correct. Within the context of the CASP experiment, these so called consensus methods have been shown to perform considerably better in selecting good candidate models, but tend to fail if the best models are far from the dominant structural cluster. In this paper we show that model selection can be improved if both approaches are combined by pre-filtering the models used during the calculation of the structural consensus. Our recently published QMEAN composite scoring function has been improved by including an all-atom interaction potential term. The preliminary model ranking based on the new QMEAN score is used to select a subset of reliable models against which the structural consensus score is calculated. This scoring function called QMEANclust achieves a correlation coefficient of predicted quality score and GDT_TS of 0.9 averaged over the 98 CASP7 targets and perform significantly better in selecting good models from the ensemble of server models than any other groups participating in the quality estimation category of CASP7. Both scoring functions are also benchmarked on the MOULDER test set consisting of 20 target proteins each with 300 alternatives models generated by MODELLER. QMEAN outperforms all other tested scoring functions operating on individual models, while the consensus method QMEANclust only works properly on decoy sets containing a certain fraction of near-native conformations. We also present a local version of QMEAN for the per-residue estimation of model quality (QMEANlocal) and compare it to a new local consensus-based approach. Improved model selection is obtained by using a composite scoring function operating on single models in order to enrich higher quality models which are subsequently used to calculate the structural consensus. The performance of consensus-based methods such as QMEANclust highly depends on the composition and quality of the model ensemble to be analysed. Therefore, performance estimates for consensus methods based on large meta-datasets (e.g. CASP) might overrate their applicability in more realistic modelling situations with smaller sets of models based on individual methods.
Implications for social policy of variability in racial groups.
Helms, Janet E
2008-11-01
Social policy and federal and state legislation require the use of single cut scores when tests of cognitive ability, knowledge, or skills (CAKS) are used to make high-stakes assessment decisions, such as whether students or employees may be promoted. Rationales offered for the requirement are that cut scores provide objective standards and are fairer than using subjective criteria, such as racial group membership. It is argued that failure to consider threats to statistical conclusion validity, such as differences in variability between groups, obscures the differential impact of using a common cut score as the basis for highstakes decisions. Analyses of 40 Black and White samples revealed that (a) Whites might be considerably advantaged and Blacks might be considerably disadvantaged by the same cut score and (b) depending on where the cut score is set, decisions based on ratios of numbers of Whites numbers of Blacks might be fairer than use of CAKS test cut scores. Implications for assessment practice and social policy are discussed.
Bishop, Somer L.; Farmer, Cristan; Thurm, Audrey
2014-01-01
Nonverbal IQ (NVIQ) was examined in 84 individuals with ASD followed from age 2 to 19. Most adults who scored in the range of ID also received scores below 70 as children, and the majority of adults with scores in the average range had scored in this range by age 3. However, within the lower ranges of ability, actual scores declined from age 2 to 19, likely due in part to limitations of appropriate tests. Use of Vineland-II DLS scores in place of NVIQ did not statistically improve the correspondence between age 2 and age 19 scores. Clinicians and researchers should use caution when making comparisons based on exact scores or specific ability ranges within or across individuals with ASD of different ages. PMID:25239176
The impact of service-learning on cultural competence.
Amerson, Roxanne
2010-01-01
Service-learning provides an excellent pedagogy for introducing students to clients of different cultural backgrounds, helping students become aware of the issues these clients face related to culture and health care, and teaching culturally appropriate care. The Transcultural Self-Efficacy Tool was used to evaluate self-perceived cultural competence in a convenience sample of 60 baccalaureate nursing students enrolled in a community health nursing course following the completion of service-learning projects with local and international communities. Pre- and posttests were analyzed based on total scores and subscale (cognitive, practical, and affective) scores. A paired-samples t test compared the mean pretest total score to the mean posttest total score, which demonstrated a significant increase. In addition, paired-samples t tests demonstrated a significant increase in each subscale.
An Item Response Theory Model for Test Bias.
ERIC Educational Resources Information Center
Shealy, Robin; Stout, William
This paper presents a conceptualization of test bias for standardized ability tests which is based on multidimensional, non-parametric, item response theory. An explanation of how individually-biased items can combine through a test score to produce test bias is provided. It is contended that bias, although expressed at the item level, should be…
Barchi, Francis H; Kasimatis-Singleton, Megan; Kasule, Mary; Khulumani, Pilate; Merz, Jon F
2013-02-01
Little empirical data are available on the extent to which capacity-building programs in research ethics prepare trainees to apply ethical reasoning skills to the design, conduct, or review of research. A randomized controlled trial was conducted in Botswana in 2010 to assess the effectiveness of a case-based intervention using email to augment in-person seminars. University faculty and current and prospective IRB/REC members took part in a semester-long training program in research ethics. Participants attended two 2-day seminars and were assigned at random to one of two on-line arms of the trial. Participants in both arms completed on-line international modules from the Collaborative Institutional Training Initiative. Between seminars, intervention-arm participants were also emailed a weekly case to analyze in response to set questions; responses and individualized faculty feedback were exchanged via email. Tests assessing ethics knowledge were administered at the start of each seminar. The post-test included an additional section in which participants were asked to identify the ethical issues highlighted in five case studies from a list of multiple-choice responses. Results were analyzed using regression and ANOVA. Of the 71 participants (36 control, 35 intervention) enrolled at the first seminar, 41 (57.7%) attended the second seminar (19 control, 22 intervention). In the intervention arm, 19 (54.3%) participants fully completed and 8 (22.9%) partially completed all six weekly cases. The mean score was higher on the post-test (30.3/40) than on the pre-test (28.0/40), and individual post- and pre-test scores were highly correlated (r = 0.65, p < 0.0001). Group assignment alone did not have an effect on test scores (p > 0.84), but intervention-arm subjects who completed all assigned cases answered an average of 3.2 more questions correctly on the post-test than others, controlling for pre-test scores (p = 0.003). Completion of the case-based intervention improved respondents' test scores, with those who completed all six email cases scoring roughly 10% better than those who failed to complete this task and those in the control arm. There was only suggestive evidence that intensive case work improved ethical issue identification, although there was limited ability to assess this outcome due to a high drop-out rate.
2013-01-01
Background Little empirical data are available on the extent to which capacity-building programs in research ethics prepare trainees to apply ethical reasoning skills to the design, conduct, or review of research. A randomized controlled trial was conducted in Botswana in 2010 to assess the effectiveness of a case-based intervention using email to augment in-person seminars. Methods University faculty and current and prospective IRB/REC members took part in a semester-long training program in research ethics. Participants attended two 2-day seminars and were assigned at random to one of two on-line arms of the trial. Participants in both arms completed on-line international modules from the Collaborative Institutional Training Initiative. Between seminars, intervention-arm participants were also emailed a weekly case to analyze in response to set questions; responses and individualized faculty feedback were exchanged via email. Tests assessing ethics knowledge were administered at the start of each seminar. The post-test included an additional section in which participants were asked to identify the ethical issues highlighted in five case studies from a list of multiple-choice responses. Results were analyzed using regression and ANOVA. Results Of the 71 participants (36 control, 35 intervention) enrolled at the first seminar, 41 (57.7%) attended the second seminar (19 control, 22 intervention). In the intervention arm, 19 (54.3%) participants fully completed and 8 (22.9%) partially completed all six weekly cases. The mean score was higher on the post-test (30.3/40) than on the pre-test (28.0/40), and individual post- and pre-test scores were highly correlated (r = 0.65, p < 0.0001). Group assignment alone did not have an effect on test scores (p > 0.84), but intervention-arm subjects who completed all assigned cases answered an average of 3.2 more questions correctly on the post-test than others, controlling for pre-test scores (p = 0.003). Conclusions Completion of the case-based intervention improved respondents’ test scores, with those who completed all six email cases scoring roughly 10% better than those who failed to complete this task and those in the control arm. There was only suggestive evidence that intensive case work improved ethical issue identification, although there was limited ability to assess this outcome due to a high drop-out rate. PMID:23368699
Test Information Targeting Strategies for Adaptive Multistage Testing Designs.
ERIC Educational Resources Information Center
Luecht, Richard M.; Burgin, William
Adaptive multistage testlet (MST) designs appear to be gaining popularity for many large-scale computer-based testing programs. These adaptive MST designs use a modularized configuration of preconstructed testlets and embedded score-routing schemes to prepackage different forms of an adaptive test. The conditional information targeting (CIT)…
Nornoo, Adwoa O; Jackson, Jonathan; Axtell, Samantha
2017-03-25
Objective. To determine whether there is a correlation between pharmacy students' scores on the Health Science Reasoning Test (HSRT) and their grade on a package insert assignment designed to assess critical thinking. Methods. The HSRT was administered to first-year pharmacy students during a critical-thinking course in the spring semester. In the same semester, a required package insert assignment was completed in a pharmacokinetics course. To determine whether there was a relationship between HSRT scores and grades on the assignment, a Spearman's rho correlation test was performed. Results. A very weak but significant positive correlation was found between students' grades on the assignment and their overall HSRT score (r=0.19, p <0.05), as well as deduction (a scale score of the HSRT; r=0.26, p <0.01). Conclusion. Based on a very weak but significant correlation to HSRT scores, this study demonstrated the potential of a package insert assignment to be used as one of the components to measure critical-thinking skills in pharmacy students.
Jackson, Jonathan; Axtell, Samantha
2017-01-01
Objective. To determine whether there is a correlation between pharmacy students’ scores on the Health Science Reasoning Test (HSRT) and their grade on a package insert assignment designed to assess critical thinking. Methods. The HSRT was administered to first-year pharmacy students during a critical-thinking course in the spring semester. In the same semester, a required package insert assignment was completed in a pharmacokinetics course. To determine whether there was a relationship between HSRT scores and grades on the assignment, a Spearman’s rho correlation test was performed. Results. A very weak but significant positive correlation was found between students’ grades on the assignment and their overall HSRT score (r=0.19, p<0.05), as well as deduction (a scale score of the HSRT; r=0.26, p<0.01). Conclusion. Based on a very weak but significant correlation to HSRT scores, this study demonstrated the potential of a package insert assignment to be used as one of the components to measure critical-thinking skills in pharmacy students. PMID:28381884
Wrzus, Cornelia; Egloff, Boris; Riediger, Michaela
2017-08-01
Implicit association tests (IATs) are increasingly used to indirectly assess people's traits, attitudes, or other characteristics. In addition to measuring traits or attitudes, IAT scores also reflect differences in cognitive abilities because scores are based on reaction times (RTs) and errors. As cognitive abilities change with age, questions arise concerning the usage and interpretation of IATs for people of different age. To address these questions, the current study examined how cognitive abilities and cognitive processes (i.e., quad model parameters) contribute to IAT results in a large age-heterogeneous sample. Participants (N = 549; 51% female) in an age-stratified sample (range = 12-88 years) completed different IATs and 2 tasks to assess cognitive processing speed and verbal ability. From the IAT data, D2-scores were computed based on RTs, and quad process parameters (activation of associations, overcoming bias, detection, guessing) were estimated from individual error rates. Substantial IAT scores and quad processes except guessing varied with age. Quad processes AC and D predicted D2-scores of the content-specific IAT. Importantly, the effects of cognitive abilities and quad processes on IAT scores were not significantly moderated by participants' age. These findings suggest that IATs seem suitable for age-heterogeneous studies from adolescence to old age when IATs are constructed and analyzed appropriately, for example with D-scores and process parameters. We offer further insight into how D-scoring controls for method effects in IATs and what IAT scores capture in addition to implicit representations of characteristics. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Madani, Amin; Watanabe, Yusuke; Bilgic, Elif; Pucher, Philip H; Vassiliou, Melina C; Aggarwal, Rajesh; Fried, Gerald M; Mitmaker, Elliot J; Feldman, Liane S
2017-03-01
Errors in judgment during laparoscopic cholecystectomy can lead to bile duct injuries and other complications. Despite correlations between outcomes, expertise and advanced cognitive skills, current methods to evaluate these skills remain subjective, rater- and situation-dependent and non-systematic. The purpose of this study was to develop objective metrics using a Web-based platform and to obtain validity evidence for their assessment of decision-making during laparoscopic cholecystectomy. An interactive online learning platform was developed ( www.thinklikeasurgeon.com ). Trainees and surgeons from six institutions completed a 12-item assessment, developed based on a cognitive task analysis. Five items required subjects to draw their answer on the surgical field, and accuracy scores were calculated based on an algorithm derived from experts' responses ("visual concordance test", VCT). Test-retest reliability, internal consistency, and correlation with self-reported experience, Global Operative Assessment of Laparoscopic Skills (GOALS) score and Objective Performance Rating Scale (OPRS) score were calculated. Questionnaires were administered to evaluate the platform's usability, feasibility and educational value. Thirty-nine subjects (17 surgeons, 22 trainees) participated. There was high test-retest reliability (intraclass correlation coefficient = 0.95; n = 10) and internal consistency (Cronbach's α = 0.87). The assessment demonstrated significant differences between novices, intermediates and experts in total score (p < 0.01) and VCT score (p < 0.01). There was high correlation between total case number and total score (ρ = 0.83, p < 0.01) and between total case number and VCT (ρ = 0.82, p < 0.01), and moderate to high correlations between total score and GOALS (ρ = 0.66, p = 0.05), VCT and GOALS (ρ = 0.83, p < 0.01), total score and OPRS (ρ = 0.67, p = 0.04), and VCT and OPRS (ρ = 0.78, p = 0.01). Most subjects agreed or strongly agreed that the platform and assessment was easy to use [n = 29 (78 %)], facilitates learning intra-operative decision-making [n = 28 (81 %)], and should be integrated into surgical training [n = 28 (76 %)]. This study provides preliminary validity evidence for a novel interactive platform to objectively assess decision-making during laparoscopic cholecystectomy.
Dutagaci, Bercem; Wittayanarakul, Kitiyaporn; Mori, Takaharu; Feig, Michael
2017-06-13
A scoring protocol based on implicit membrane-based scoring functions and a new protocol for optimizing the positioning of proteins inside the membrane was evaluated for its capacity to discriminate native-like states from misfolded decoys. A decoy set previously established by the Baker lab (Proteins: Struct., Funct., Genet. 2006, 62, 1010-1025) was used along with a second set that was generated to cover higher resolution models. The Implicit Membrane Model 1 (IMM1), IMM1 model with CHARMM 36 parameters (IMM1-p36), generalized Born with simple switching (GBSW), and heterogeneous dielectric generalized Born versions 2 (HDGBv2) and 3 (HDGBv3) were tested along with the new HDGB van der Waals (HDGBvdW) model that adds implicit van der Waals contributions to the solvation free energy. For comparison, scores were also calculated with the distance-scaled finite ideal-gas reference (DFIRE) scoring function. Z-scores for native state discrimination, energy vs root-mean-square deviation (RMSD) correlations, and the ability to select the most native-like structures as top-scoring decoys were evaluated to assess the performance of the scoring functions. Ranking of the decoys in the Baker set that were relatively far from the native state was challenging and dominated largely by packing interactions that were captured best by DFIRE with less benefit of the implicit membrane-based models. Accounting for the membrane environment was much more important in the second decoy set where especially the HDGB-based scoring functions performed very well in ranking decoys and providing significant correlations between scores and RMSD, which shows promise for improving membrane protein structure prediction and refinement applications. The new membrane structure scoring protocol was implemented in the MEMScore web server ( http://feiglab.org/memscore ).
Fundamentals of endoscopic surgery: creation and validation of the hands-on test.
Vassiliou, Melina C; Dunkin, Brian J; Fried, Gerald M; Mellinger, John D; Trus, Thadeus; Kaneva, Pepa; Lyons, Calvin; Korndorffer, James R; Ujiki, Michael; Velanovich, Vic; Kochman, Michael L; Tsuda, Shawn; Martinez, Jose; Scott, Daniel J; Korus, Gary; Park, Adrian; Marks, Jeffrey M
2014-03-01
The Fundamentals of Endoscopic Surgery™ (FES) program consists of online materials and didactic and skills-based tests. All components were designed to measure the skills and knowledge required to perform safe flexible endoscopy. The purpose of this multicenter study was to evaluate the reliability and validity of the hands-on component of the FES examination, and to establish the pass score. Expert endoscopists identified the critical skill set required for flexible endoscopy. They were then modeled in a virtual reality simulator (GI Mentor™ II, Simbionix™ Ltd., Airport City, Israel) to create five tasks and metrics. Scores were designed to measure both speed and precision. Validity evidence was assessed by correlating performance with self-reported endoscopic experience (surgeons and gastroenterologists [GIs]). Internal consistency of each test task was assessed using Cronbach's alpha. Test-retest reliability was determined by having the same participant perform the test a second time and comparing their scores. Passing scores were determined by a contrasting groups methodology and use of receiver operating characteristic curves. A total of 160 participants (17 % GIs) performed the simulator test. Scores on the five tasks showed good internal consistency reliability and all had significant correlations with endoscopic experience. Total FES scores correlated 0.73, with participants' level of endoscopic experience providing evidence of their validity, and their internal consistency reliability (Cronbach's alpha) was 0.82. Test-retest reliability was assessed in 11 participants, and the intraclass correlation was 0.85. The passing score was determined and is estimated to have a sensitivity (true positive rate) of 0.81 and a 1-specificity (false positive rate) of 0.21. The FES hands-on skills test examines the basic procedural components required to perform safe flexible endoscopy. It meets rigorous standards of reliability and validity required for high-stakes examinations, and, together with the knowledge component, may help contribute to the definition and determination of competence in endoscopy.
Naive scoring of human sleep based on a hidden Markov model of the electroencephalogram.
Yaghouby, Farid; Modur, Pradeep; Sunderam, Sridhar
2014-01-01
Clinical sleep scoring involves tedious visual review of overnight polysomnograms by a human expert. Many attempts have been made to automate the process by training computer algorithms such as support vector machines and hidden Markov models (HMMs) to replicate human scoring. Such supervised classifiers are typically trained on scored data and then validated on scored out-of-sample data. Here we describe a methodology based on HMMs for scoring an overnight sleep recording without the benefit of a trained initial model. The number of states in the data is not known a priori and is optimized using a Bayes information criterion. When tested on a 22-subject database, this unsupervised classifier agreed well with human scores (mean of Cohen's kappa > 0.7). The HMM also outperformed other unsupervised classifiers (Gaussian mixture models, k-means, and linkage trees), that are capable of naive classification but do not model dynamics, by a significant margin (p < 0.05).
van der Maas, Nico Arie
2017-03-16
The Multiple Sclerosis Questionnaire for Physical Therapists (MSQPT) is a patient-rated outcome questionnaire for evaluating the rehabilitation of persons with multiple sclerosis (MS). Responsiveness was evaluated, and minimal important difference (MID) estimates were calculated to provide thresholds for clinical change for four items, three sections and the total score of the MSQPT. This multicentre study used a combined distribution- and anchor-based approach with multiple anchors and multiple rating of change questions. Responsiveness was evaluated using effect size, standardized response mean (SRM), modified SRM and relative efficiency. For distribution-based MID estimates, 0.2 and 0.33 standard deviations (SD), standard error of measurement (SEM) and minimal detectable change were used . Triangulation of anchor- and distribution-based MID estimates provided a range of MID values for each of the four items, the three sections and the total score of the MSQPT. The MID values were tested for their sensitivity and specificity for amelioration and deterioration for each of the four items, the three sections and the total score of the MSQPT. The MID values of each item and section and of the total score with the best sensitivity and specificity were selected as thresholds for clinical change. The outcome measures were the MSQPT, Hamburg Quality of Life Questionnaire for Multiple Sclerosis (HAQUAMS), rating of change questionnaires, Expanded Disability Status Scale, 6-metre timed walking test, Berg Balance Scale and 6-minute walking test. The effect size ranged from 0.46 to 1.49. The SRM data showed comparable results. The modified SRM ranged from 0.00 to 0.60. Anchor-based MID estimates were very low and were comparable with SD- and SEM-based estimates. The MSQPT was more responsive than the HAQUAMS in detecting improvement but less responsive in finding deterioration. The best MID estimates of the items, sections and total score, expressed in percentage of their maximum score, were between 5.4% (activity) and 22% (item 10) change for improvement and between 5.7% (total score) and 22% (item 10) change for deterioration. The MSQPT is a responsive questionnaire with an adequate MID that may be used as threshold for change during rehabilitation of MS patients. This trial was retrospectively (01/24/2015) registered in ClinicalTrials.gov as NCT02346279.
O'Grady, Anthony; Allen, David; Happerfield, Lisa; Johnson, Nicola; Provenzano, Elena; Pinder, Sarah E; Tee, Lilian; Gu, Mai; Kay, Elaine W
2010-12-01
Immunohistochemistry (IHC) is used as the frontline assay to determine HER2 status in invasive breast cancer patients. The aim of the study was to compare the performance of the Leica Oracle HER2 Bond IHC System (Oracle) with the current most readily accepted Dako HercepTest (HercepTest), using both commercially validated and modified ASCO/CAP and UK HER2 IHC scoring guidelines. A total of 445 breast cancer samples from 3 international clinical HER2 referral centers were stained with the 2 test systems and scored in a blinded fashion by experienced pathologists. The overall agreement between the 2 tests in a 3×3 (negative, equivocal and positive) analysis shows a concordance of 86.7% and 86.3%, respectively when analyzed using commercially validated and modified ASCO/CAP and UK HER2 IHC scoring guidelines. There is a good concordance between the Oracle and the HercepTest. The advantages of a complete fully automated test such as the Oracle include standardization of key analytical factors and improved turn around time. The implementation of the modified ASCO/CAP and UK HER2 IHC scoring guidelines has minimal effect on either assay interpretation, showing that Oracle can be used as a methodology for accurately determining HER2 IHC status in formalin fixed, paraffin-embedded breast cancer tissue.
Biases and Power for Groups Comparison on Subjective Health Measurements
Hamel, Jean-François; Hardouin, Jean-Benoit; Le Neel, Tanguy; Kubis, Gildas; Roquelaure, Yves; Sébille, Véronique
2012-01-01
Subjective health measurements are increasingly used in clinical research, particularly for patient groups comparisons. Two main types of analytical strategies can be used for such data: so-called classical test theory (CTT), relying on observed scores and models coming from Item Response Theory (IRT) relying on a response model relating the items responses to a latent parameter, often called latent trait. Whether IRT or CTT would be the most appropriate method to compare two independent groups of patients on a patient reported outcomes measurement remains unknown and was investigated using simulations. For CTT-based analyses, groups comparison was performed using t-test on the scores. For IRT-based analyses, several methods were compared, according to whether the Rasch model was considered with random effects or with fixed effects, and the group effect was included as a covariate or not. Individual latent traits values were estimated using either a deterministic method or by stochastic approaches. Latent traits were then compared with a t-test. Finally, a two-steps method was performed to compare the latent trait distributions, and a Wald test was performed to test the group effect in the Rasch model including group covariates. The only unbiased IRT-based method was the group covariate Wald’s test, performed on the random effects Rasch model. This model displayed the highest observed power, which was similar to the power using the score t-test. These results need to be extended to the case frequently encountered in practice where data are missing and possibly informative. PMID:23115620
Cervical cancer screening among university students in South Africa: a theory based study.
Hoque, Muhammad Ehsanu; Ghuman, Shanaz; Coopoosmay, Roger; Van Hal, Guido
2014-01-01
Cervical cancer is a serious public health problem in South Africa. Even though the screening is free in health facilities in South Africa, the Pap smear uptake is very low. The objective of the study is to investigate the knowledge and beliefs of female university students in South Africa. A cross sectional study was conducted among university women in South Africa to elicit information about knowledge and beliefs, and screening history. A total of 440 students completed the questionnaire. The average age of the participants was 20.39 years (SD = 1.71 years). Regarding cervical cancer, 55.2% (n = 243) had ever heard about it. Results indicated that only 15% (22/147) of the students who had ever had sex and had heard about cervical cancer had taken a Pap test. Pearson correlation analysis showed that cervical cancer knowledge had a significantly negative relationship with barriers to cervical cancer screening. Susceptibility and seriousness score were significantly moderately correlated with benefit and motivation score as well as barrier score. Self-efficacy score also had a moderate correlation with benefit and motivation score. Students who had had a Pap test showed a significantly lower score in barriers to being screened compared to students who had not had a Pap test. This study showed that educated women in South Africa lack complete information on cervical cancer. Students who had had a Pap test had significantly lower barriers to cervical cancer screening than those students who had not had a Pap test.
Sawle, Leanne; Freeman, Jennifer; Marsden, Jonathan
2017-04-01
Balance is a complex construct, affected by multiple components such as strength and co-ordination. However, whilst assessing an athlete's dynamic balance is an important part of clinical examination, there is no gold standard measure. The multiple single-leg hop-stabilization test is a functional test which may offer a method of evaluating the dynamic attributes of balance, but it needs to show adequate intra-tester reliability. The purpose of this study was to assess the intra-rater reliability of a dynamic balance test, the multiple single-leg hop-stabilization test on the dominant and non-dominant legs. Intra-rater reliability study. Fifteen active participants were tested twice with a 10-minute break between tests. The outcome measure was the multiple single-leg hop-stabilization test score, based on a clinically assessed numerical scoring system. Results were analysed using an Intraclass Correlations Coefficient (ICC 2,1 ) and Bland-Altman plots. Regression analyses explored relationships between test scores, leg dominance, age and training (an alpha level of p = 0.05 was selected). ICCs for intra-rater reliability were 0.85 for the dominant and non-dominant legs (confidence intervals = 0.62-0.95 and 0.61-0.95 respectively). Bland-Altman plots showed scores within two standard deviations. A significant correlation was observed between the dominant and non-dominant leg on balance scores (R 2 =0.49, p<0.05), and better balance was associated with younger participants in their non-dominant leg (R 2 =0.28, p<0.05) and their dominant leg (R 2 =0.39, p<0.05), and a higher number of hours spent training for the non-dominant leg R 2 =0.37, p<0.05). The multiple single-leg hop-stabilisation test demonstrated strong intra-tester reliability with active participants. Younger participants who trained more, have better balance scores. This test may be a useful measure for evaluating the dynamic attributes of balance. 3.
Kaufman, Kathryn; Beale, Brian S; Thames, Howard D; Saunders, W Brian
2017-01-01
To compare articular cartilage scores in cranial cruciate ligament (CCL)-deficient dogs with or without concurrent bucket handle tears (BHT) of the medial meniscus. Retrospective case series. Client-owned dogs treated with arthroscopy and tibial plateau leveling osteotomy or extracapsular repair for complete CCL rupture (290 stifles from 264 dogs). Medical records and arthroscopic images were reviewed. Medial femoral condyle (MFC) and medial tibial plateau (MTP) cartilage was scored using the modified Outerbridge scale. Periarticular osteophytosis (PAO) and injury to the medial meniscus were recorded. Data were analyzed using Student's t-tests, Wilcoxon rank-sum test, and Fisher's exact test for changes in the stifle based on meniscal condition, body weight, and duration of lameness. PAO, MFC, and MTP articular cartilage scores were not significantly different in dogs with or without BHT. There were no significant differences in MFC or MTP scores when dogs were evaluated based on bodyweight and the presence or absence of a BHT. However, PAO formation was significantly increased in dogs weighing >13.6 kg and concurrent meniscal injury vs. dogs weighing <13.6 kg and concurrent meniscal injury (P < .001). Significantly more stifles with chronic lameness (40 of 89; 44.9%) had the highest PAO score of 2 reported compared to only 42 of 182 stifles (23.1%) with acute lameness (P < .001). The presence of a BHT of the medial meniscus was not associated with more severe arthroscopic articular cartilage lesions in the medial joint compartment at the time of surgery. © 2016 The American College of Veterinary Surgeons.
34 CFR 668.150 - Agreement between the Secretary and a test publisher or a State.
Code of Federal Regulations, 2011 CFR
2011-07-01
... computer-based test is used, provide the test administrator with software that will— (i) Immediately... any changes in test taker responses or test scores; (11) Promptly send to the student and the... during the period of test approval; (14) Upon request, give the Secretary, a State agency, an accrediting...
A Comparison of Student Understanding of Seasons Using Inquiry and Didactic Teaching Methods
NASA Astrophysics Data System (ADS)
Ashcraft, Paul G.
2006-02-01
Student performance on open-ended questions concerning seasons in a university physical science content course was examined to note differences between classes that experienced inquiry using a 5-E lesson planning model and those that experienced the same content with a traditional, didactic lesson. The class examined is a required content course for elementary education majors and understanding the seasons is part of the university's state's elementary science standards. The two self-selected groups of students showed no statistically significant differences in pre-test scores, while there were statistically significant differences between the groups' post-test scores with those who participated in inquiry-based activities scoring higher. There were no statistically significant differences between the pre-test and the post-test for the students who experienced didactic teaching, while there were statistically significant improvements for the students who experienced the 5-E lesson.
Standard of practice and Flynn Effect testimony in death penalty cases.
Gresham, Frank M; Reschly, Daniel J
2011-06-01
The Flynn Effect is a well-established psychometric fact documenting substantial increases in measured intelligence test performance over time. Flynn's (1984) review of the literature established that Americans gain approximately 0.3 points per year or 3 points per decade in measured intelligence. The accurate assessment and interpretation of intellectual functioning becomes critical in death penalty cases that seek to determine whether an individual meets the criteria for intellectual disability and thereby is ineligible for execution under Atkins v. Virginia (2002) . We reviewed the literature on the Flynn Effect and demonstrated how failure to adjust intelligence test scores based on this phenomenon invalidates test scores and may be in violation of the Standards for Educational and Psychological Testing as well as the "Ethical Principles for Psychologists and Code of Conduct." Application of the Flynn Effect and score adjustments for obsolete norms clearly is supported by science and should be implemented by practicing psychologists.
Petrović, Ivana B.; Vukelić, Milica; Čizmić, Svetlana
2017-01-01
Researchers are still searching for the ways to identify different categories of employees according to their exposure to negative acts and psychological experience of workplace bullying. We followed Notelaers and Einarsen’s application of the ROC analysis to determine the NAQ-R cut-off scores applying a “lower” and “higher” threshold. The main goal of this research was to develop and test different gold standards of personal and organizational relevance in determining the NAQ-R cut-off scores in a specific cultural and economic context of Serbia. Apart from combining self-labeling as a victim with self-perceived health, the objectives were to test the gold standards developed as a combination of self-labeling with life satisfaction, self-labeling with intention to leave and a complex gold standard based on self-labeling, self-perceived health, life satisfaction and intention to leave taken together. The ROC analysis on Serbian workforce data supports applying of different gold standards. For identifying employees in a preliminary stage of bullying, the most applicable was the gold standard based on self-labeling and intention to leave (score 34 and higher). The most accurate identification of victims could be based on the most complex gold standard (score 81 and higher). This research encourages further investigation of gold standards in different cultures. PMID:28119652
Chang, Jasper O; Levy, Susan S; Seay, Seth W; Goble, Daniel J
2014-05-01
Recent guidelines advocate sports medicine professionals to use balance tests to assess sensorimotor status in the management of concussions. The present study sought to determine whether a low-cost balance board could provide a valid, reliable, and objective means of performing this balance testing. Criterion validity testing relative to a gold standard and 7 day test-retest reliability. University biomechanics laboratory. Thirty healthy young adults. Balance ability was assessed on 2 days separated by 1 week using (1) a gold standard measure (ie, scientific grade force plate), (2) a low-cost Nintendo Wii Balance Board (WBB), and (3) the Balance Error Scoring System (BESS). Validity of the WBB center of pressure path length and BESS scores were determined relative to the force plate data. Test-retest reliability was established based on intraclass correlation coefficients. Composite scores for the WBB had excellent validity (r = 0.99) and test-retest reliability (R = 0.88). Both the validity (r = 0.10-0.52) and test-retest reliability (r = 0.61-0.78) were lower for the BESS. These findings demonstrate that a low-cost balance board can provide improved balance testing accuracy/reliability compared with the BESS. This approach provides a potentially more valid/reliable, yet affordable, means of assessing sports-related concussion compared with current methods.
Cognitive Style as a Factor Affecting Task-Based Reading Comprehension Test Scores
ERIC Educational Resources Information Center
Salmani-Nodoushan, Mohammad Ali
2005-01-01
For purposes of the present study, it was hypothesized that field (in)dependence would introduce systematic variance into Iranian EFL learners' overall and task-specific performance on task-based reading comprehension tests. 1743 freshman, sophomore, junior, and senior students all majoring in English at different Iranian universities and colleges…
Multilingual Data Selection for Low Resource Speech Recognition
2016-09-12
Figure 1: Identification of language clusters using scores from an LID system training languages used in the Base and OP1 evaluation periods of the Babel...the posterior scores over frames. For a set of languages that are used to train the lan- guage identification (LID) network, pairs of languages that...which are combined during test time to produce 10 dimensional language 3854 Figure 3: Identification of language clusters using scores from individually
Poisson Approximation-Based Score Test for Detecting Association of Rare Variants.
Fang, Hongyan; Zhang, Hong; Yang, Yaning
2016-07-01
Genome-wide association study (GWAS) has achieved great success in identifying genetic variants, but the nature of GWAS has determined its inherent limitations. Under the common disease rare variants (CDRV) hypothesis, the traditional association analysis methods commonly used in GWAS for common variants do not have enough power for detecting rare variants with a limited sample size. As a solution to this problem, pooling rare variants by their functions provides an efficient way for identifying susceptible genes. Rare variant typically have low frequencies of minor alleles, and the distribution of the total number of minor alleles of the rare variants can be approximated by a Poisson distribution. Based on this fact, we propose a new test method, the Poisson Approximation-based Score Test (PAST), for association analysis of rare variants. Two testing methods, namely, ePAST and mPAST, are proposed based on different strategies of pooling rare variants. Simulation results and application to the CRESCENDO cohort data show that our methods are more powerful than the existing methods. © 2016 John Wiley & Sons Ltd/University College London.
The effects of academic grouping on student performance in science
NASA Astrophysics Data System (ADS)
Scoggins, Sally Smykla
The current action research study explored how student placement in heterogeneous or homogeneous classes in seventh-grade science affected students' eighth-grade Science State of Texas Assessment of Academic Readiness (STAAR) scores, and how ability grouping affected students' scores based on race and socioeconomic status. The population included all eighth-grade students in the target district who took the regular eighth-grade science STAAR over four academic school years. The researcher ran three statistical tests: a t-test for independent samples, a one-way between subjects analysis of variance (ANOVA) and a two-way between subjects ANOVA. The results showed no statistically significant difference between eighth-grade Pre-AP students from seventh-grade Pre-AP classes and eighth-grade Pre-AP students from heterogeneous seventh-grade classes and no statistically significant difference between Pre-AP students' scores based on socioeconomic status. There was no statistically significant interaction between socioeconomic status and the seventh-grade science classes. The scores between regular eighth-grade students who were in heterogeneous seventh-grade classes were statistically significantly higher than the scores of regular eighth-grade students who were in regular seventh-grade classes. The results also revealed that the scores of students who were White were statistically significantly higher than the scores of students who were Black and Hispanic. Black and Hispanic scores did not differ significantly. Further results indicated that the STAAR Level II and Level III scores were statistically significantly higher for the Pre-AP eighth-grade students who were in heterogeneous seventh-grade classes than the STAAR Level II and Level III scores of Pre-AP eighth-grade students who were in Pre-AP seventh-grade classes.
Sefton, Gerri; Lane, Steven; Killen, Roger; Black, Stuart; Lyon, Max; Ampah, Pearl; Sproule, Cathryn; Loren-Gosling, Dominic; Richards, Caitlin; Spinty, Jean; Holloway, Colette; Davies, Coral; Wilson, April; Chean, Chung Shen; Carter, Bernie; Carrol, E D
2017-05-01
Pediatric Early Warning Scores are advocated to assist health professionals to identify early signs of serious illness or deterioration in hospitalized children. Scores are derived from the weighting applied to recorded vital signs and clinical observations reflecting deviation from a predetermined "norm." Higher aggregate scores trigger an escalation in care aimed at preventing critical deterioration. Process errors made while recording these data, including plotting or calculation errors, have the potential to impede the reliability of the score. To test this hypothesis, we conducted a controlled study of documentation using five clinical vignettes. We measured the accuracy of vital sign recording, score calculation, and time taken to complete documentation using a handheld electronic physiological surveillance system, VitalPAC Pediatric, compared with traditional paper-based charts. We explored the user acceptability of both methods using a Web-based survey. Twenty-three staff participated in the controlled study. The electronic physiological surveillance system improved the accuracy of vital sign recording, 98.5% versus 85.6%, P < .02, Pediatric Early Warning Score calculation, 94.6% versus 55.7%, P < .02, and saved time, 68 versus 98 seconds, compared with paper-based documentation, P < .002. Twenty-nine staff completed the Web-based survey. They perceived that the electronic physiological surveillance system offered safety benefits by reducing human error while providing instant visibility of recorded data to the entire clinical team.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wegner, Rodney E.; Oysul, Kaan; Pollock, Bruce E.
Purpose: The Pittsburgh radiosurgery-based arteriovenous malformation (AVM) grading scale was developed to predict patient outcomes after radiosurgery and was later modified with location as a two-tiered variable (deep vs. other). The purpose of this study was to test the modified radiosurgery-based AVM score in a separate set of AVM patients managed with radiosurgery. Methods and Materials: The AVM score is calculated as follows: AVM score = (0.1)(volume, cc) + (0.02)(age, years) + (0.5)(location; frontal/temporal/parietal/occipital/intraventricular/corpus callosum/cerebellar = 0, basal ganglia/thalamus/brainstem = 1). Testing of the modified system was performed on 293 patients having AVM radiosurgery from 1992 to 2004 at themore » University of Pittsburgh with dose planning based on a combination of stereotactic angiography and MRI. The median patient age was 38 years, the median AVM volume was 3.3 cc, and 57 patients (19%) had deep AVMs. The median modified AVM score was 1.25. The median patient follow-up was 39 months. Results: The modified AVM scale correlated with the percentage of patients with AVM obliteration without new deficits ({<=}1.00, 62%; 1.01-1.50, 51%; 1.51-2.00, 53%; and >2.00, 32%; F = 11.002, R{sup 2} = 0.8117, p = 0.001). Linear regression also showed a statistically significant correlation between outcome and dose prescribed to the margin (F = 25.815, p <0.001). Conclusions: The modified radiosurgery-based AVM grading scale using location as a two-tiered variable correlated with outcomes when tested on a cohort of patients who underwent both angiography and MRI for dose planning. This system can be used to guide choices among observation, endovascular, surgical, and radiosurgical management strategies for individual AVM patients.« less
Shear Bond Strength of Ceramic Brackets with Different Base Designs: Comparative In-vitro Study
Ansari, Mohd. Younus; Agarwal, Deepak K; Bhattacharya, Preeti; Ansar, Juhi; Bhandari, Ravi
2016-01-01
Introduction Knowledge about the Shear Bond Strength (SBS) of ceramic brackets with different base design is essential as it affects bond strength to enamel. Aim The aim of the present study was to evaluate and compare the effect of base designs of different ceramic brackets on SBS, and to determine the fracture site after debonding. Materials and Methods Four groups of ceramic brackets and one group of metal brackets with different base designs were used. Adhesive precoated base of Clarity Advanced (APC Flash-free) (Unitek/3M, Monrovia, California), microcrystalline base of Clarity Advanced (Unitek/3M, Monrovia, California), polymer mesh base of InVu (TP Orthodontics, Inc., La Porte, IN, United States), patented bead ball base of Inspire Ice (Ormco, Glendora, California), and a mechanical mesh base of Gemini Metal bracket (Unitek/3M, Monrovia, California). Ten brackets of each type were bonded to 50 maxillary premolars with Transbond XT (Unitek/3M). Samples were stored in distilled water at room temperature for 24 hours and subsequently tested in shear mode on a universal testing machine (Model 3382; Instron Corp., Canton, Massachusetts, USA) at a cross head speed of 1mm/minute with the help of a chisel. The debonded interface was recorded and analyzed to determine the predominant bond failure site under an optical microscope (Stereomicroscope) at 10X magnification. One way analysis of variance (ANOVA) was used to compare SBS. Tukey’s significant differences tests were used for post-hoc comparisons. The Adhesive Remnant Index (ARI) scores were compared by chi-square test. Results Mean SBS of microcrystalline base (27.26±1.73), was the highest followed by bead ball base (23.45±5.09), adhesive precoated base (20.13±5.20), polymer mesh base (17.54±1.91), and mechanical mesh base (17.50±2.41) the least. Comparing the frequency (%) of ARI Score among the groups, chi-square test showed significantly different ARI scores among the groups (χ2 = 34.07, p<0.001). Conclusion Different base designs of metal and ceramic brackets influence SBS to enamel and all were clinically acceptable. PMID:28050507
Screening for cognitive impairment in older individuals. Validation study of a computer-based test.
Green, R C; Green, J; Harrison, J M; Kutner, M H
1994-08-01
This study examined the validity of a computer-based cognitive test that was recently designed to screen the elderly for cognitive impairment. Criterion-related validity was examined by comparing test scores of impaired patients and normal control subjects. Construct-related validity was computed through correlations between computer-based subtests and related conventional neuropsychological subtests. University center for memory disorders. Fifty-two patients with mild cognitive impairment by strict clinical criteria and 50 unimpaired, age- and education-matched control subjects. Control subjects were rigorously screened by neurological, neuropsychological, imaging, and electrophysiological criteria to identify and exclude individuals with occult abnormalities. Using a cut-off total score of 126, this computer-based instrument had a sensitivity of 0.83 and a specificity of 0.96. Using a prevalence estimate of 10%, predictive values, positive and negative, were 0.70 and 0.96, respectively. Computer-based subtests correlated significantly with conventional neuropsychological tests measuring similar cognitive domains. Thirteen (17.8%) of 73 volunteers with normal medical histories were excluded from the control group, with unsuspected abnormalities on standard neuropsychological tests, electroencephalograms, or magnetic resonance imaging scans. Computer-based testing is a valid screening methodology for the detection of mild cognitive impairment in the elderly, although this particular test has important limitations. Broader applications of computer-based testing will require extensive population-based validation. Future studies should recognize that normal control subjects without a history of disease who are typically used in validation studies may have a high incidence of unsuspected abnormalities on neurodiagnostic studies.
The MMPI Assistant: A Microcomputer Based Expert System to Assist in Interpreting MMPI Profiles
Tanner, Barry A.
1989-01-01
The Assistant is an MS DOS program to aid clinical psychologists in interpreting the results of the Minnesota Multiphasic Personality Inventory (MMPI). Interpretive hypotheses are based on the professional literature and the author's experience. After scores are entered manually, the Assistant produces a hard copy which is intended for use by a psychologist knowledgeable about the MMPI. The rules for each hypothesis appear first on the monitor, and then in the printed output, followed by the patient's scores on the relevant scales, and narrative hypotheses for the scores. The data base includes hypotheses for 23 validity configurations, 45 two-point clinical codes, 10 high scoring single-point clinical scales, and 10 low scoring single-point clinical scales. The program can accelerate the production of test reports, while insuring that actuarial rules are not overlooked. It has been especially useful as a teaching tool with graduate students. The Assistant requires an IBM PC compatible with 128k available memory, DOS 2.x or higher, and a printer.
Wu, Wei; West, Stephen G.; Hughes, Jan N.
2008-01-01
We investigated the effects of grade retention in first grade on the growth of the Woodcock-Johnson broad mathematics and reading scores over three years using linear growth curve modeling on an academically at-risk sample. A large sample (n = 784) of first grade children who were at risk for retention were initially identified based on low literacy scores. Scores representing propensity for retention were constructed based on 72 variables collected in comprehensive baseline testing in first grade. We closely matched 97 pairs of retained and promoted children based on their propensity scores using optimal matching procedures. This procedure adjusted for baseline differences between the retained and promoted children. We found that grade retention decreased the growth rate of mathematical skills but had no significant effect on reading skills. In addition, several potential moderators of the effect of retention on growth of mathematical and reading skills were identified including limited English language proficiency and children's conduct problems. PMID:19083352
Wu, Wei; West, Stephen G; Hughes, Jan N
2008-02-01
We investigated the effects of grade retention in first grade on the growth of the Woodcock-Johnson broad mathematics and reading scores over three years using linear growth curve modeling on an academically at-risk sample. A large sample (n=784) of first grade children who were at risk for retention was initially identified based on low literacy scores. Scores representing propensity for retention were constructed based on 72 variables collected in comprehensive baseline testing in first grade. We closely matched 97 pairs of retained and promoted children based on their propensity scores using optimal matching procedures. This procedure adjusted for baseline differences between the retained and promoted children. We found that grade retention decreased the growth rate of mathematical skills but had no significant effect on reading skills. In addition, several potential moderators of the effect of retention on growth of mathematical and reading skills were identified including limited English language proficiency and children's conduct problems.
Promoting students' conceptual understanding using STEM-based e-book
NASA Astrophysics Data System (ADS)
Komarudin, U.; Rustaman, N. Y.; Hasanah, L.
2017-05-01
This study aims to examine the effect of Science, Technology, Engineering, and Mathematics (STEM) based e-book in promoting students'conceptual understanding on lever system in human body. The E-book used was the e-book published by National Ministry of Science Education. The research was conducted by a quasi experimental with pretest and posttest design. The subjects consist of two classes of 8th grade junior high school in Pangkalpinang, Indonesia, which were devided into experimental group (n=34) and control group (n=32). The students in experimental group was taught by STEM-based e-book, while the control group learned by non STEM-based e-book. The data was collected by an instrument pretest and postest. Pretest and posttest scored, thenanalyzed using descriptive statistics and independent t-test. The result of independent sample t-test shows that no significant differenceson students' pretest score between control and experimental group. However, there were significant differences on students posttest score and N-gain score between control and experimental group with sig = 0.000(p<0.005). N-gain analysis showsthe higher performance of students who were participated in experimental group (mean = 66.03) higher compared to control group (mean = 47.66) in answering conceptual understanding questions. Based on the results, it can be concluded that STEM-based e-book has positiveimpact in promoting students' understanding on lever system in human body. Therefore this learning approach is potential to be used as an alternative to triger the enhancement of students' understanding in science.
Aslami, Elahe; Alipour, Ahmad; Najib, Fatemeh Sadat; Aghayosefi, Alireza
2017-04-01
Anxiety and depression during the pregnancy period are among the factors affecting the pregnancy undesirable outcomes and delivery. One way of controlling anxiety and depression is mindfulness and cognitive behavioral therapy. The purpose of this study was to compare the efficiency of mindfulness based on the Islamic-spiritual schemas and group cognitive behavioral therapy on reduction of anxiety and depression in pregnant women. The research design was semi-experimental in the form of pretest-posttest using a control group. Among the pregnant women in the 16th to 32nd weeks of pregnancy who referred to the health center, 30 pregnant women with high anxiety level and 30 pregnant women with high depression participated in the research. Randomly 15 participants with high depression and 15 participants with high anxiety were considered in the intervention group under the treatment of mindfulness based on Islamic-spiritual schemes. In addition, 15 participants with high scores regarding depression and 15 with high scores in anxiety were considered in the other group. .The control group consisted of 15 pregnant women with high anxiety and depression. Beck anxiety-depression questionnaire was used in two steps of pre-test and post-test. Data were analyzed using SPSS, version 20, and P≤0.05 was considered as significant. The results of multivariate analysis of variance test and tracking Tokey test showed that there was a significant difference between the mean scores of anxiety and depression in the two groups of mindfulness based on spiritual- Islamic scheme (P<0.001) and the group of cognitive behavioral therapy with each other (P<0.001) and with the control group(P<0.001). The mean of anxiety and depression scores decreased in the intervention group, but it increased in the control group. Both therapy methods were effective in reduction of anxiety and depression of pregnant women, but the effect of mindfulness based on spiritual- Islamic schemes was more.
Vingerhoets, Johan; Nijs, Steven; Tambuyzer, Lotke; Hoogstoel, Annemie; Anderson, David; Picchio, Gaston
2012-01-01
The aims of this study were to compare various genotypic scoring systems commonly used to predict virological outcome to etravirine, and examine their concordance with etravirine phenotypic susceptibility. Six etravirine genotypic scoring systems were assessed: Tibotec 2010 (based on 20 mutations; TBT 20), Monogram, Stanford HIVdb, ANRS, Rega (based on 37, 30, 27 and 49 mutations, respectively) and virco(®)TYPE HIV-1 (predicted fold change based on genotype). Samples from treatment-experienced patients who participated in the DUET trials and with both genotypic and phenotypic data (n=403) were assessed using each scoring system. Results were retrospectively correlated with virological response in DUET. κ coefficients were calculated to estimate the degree of correlation between the different scoring systems. Correlation between the five scoring systems and the TBT 20 system was approximately 90%. Virological response by etravirine susceptibility was comparable regardless of which scoring system was utilized, with 70-74% of DUET patients determined as susceptible to etravirine by the different scoring systems achieving plasma viral load <50 HIV-1 RNA copies/ml. In samples classed as phenotypically susceptible to etravirine (fold change in 50% effective concentration ≤3), correlations with genotypic score were consistently high across scoring systems (≥70%). In general, the etravirine genotypic scoring systems produced similar results, and genotype-phenotype concordance was high. As such, phenotypic interpretations, and in their absence all genotypic scoring systems investigated, may be used to reliably predict the activity of etravirine.
Brooks, Brian L
2010-09-01
Low scores across a battery of tests are common in healthy people and vary by demographic characteristics. The purpose of the present article was to present the base rates of low scores for the Wechsler Intelligence Scale for Children, fourth edition (WISC-IV; D. Wechsler, 2003). Participants included 2,200 children and adolescents between 6 and 16 years of age from the WISC-IV U.S. standardization sample. Measures considered in the base rates analyses included the 10 core subtests and the 4 index scores. Analyses were conducted for the entire standardization sample as well as stratified by different classifications of intelligence and different years of parental education. In the total sample, it is uncommon to have 6 or more subtest scores or 2 or more Index scores
Willmes, K
1985-08-01
Methods for the analysis of a single subject's test profile(s) proposed by Huber (1973) are applied to the Aachen Aphasia Test (AAT). The procedures are based on the classical test theory model (Lord & Novick, 1968) and are suited for any (achievement) test with standard norms from a large standardization sample and satisfactory reliability estimates. Two test profiles of a Wernicke's aphasic, obtained before and after a 3-month period of speech therapy, are analyzed using inferential comparisons between (groups of) subtest scores on one test application and between two test administrations for single (groups of) subtests. For each of these comparisons, the two aspects of (i) significant (reliable) differences in performance beyond measurement error and (ii) the diagnostic validity of that difference in the reference population of aphasic patients are assessed. Significant differences between standardized subtest scores and a remarkably better preserved reading and writing ability could be found for both test administrations using the multiple test procedure of Holm (1979). Comparison of both profiles revealed an overall increase in performance for each subtest as well as changes in level of performance relations between pairs of subtests.
Investigating the mental abilities of rural Zulu primary school children in South Africa.
Jinabhai, C C; Taylor, M; Rangongo, M F; Mkhize, N J; Anderson, S; Pillay, B J; Sullivan, K R
2004-02-01
Maximising the full potential of health and educational interventions in South African schools requires assessment of the current level of mental abilities of the school children as measured by cognitive and scholastic tests and the identification of any barriers to improved performance. This study reports on the application and interpretation of a selected battery of mental ability tests among Zulu school children and the methodological and analytical issues that need to be addressed. The test scores of 806 primary school children from a rural community are presented, based on four tests: Raven's Coloured Progressive Matrices (CPM), an Auditory Verbal Learning Test (AVLT), the Symbol Digit Modalities Test (SDMT) and Young's Group Mathematics Test (GMT). Significant gender differences were found in the test scores, and the mean scores of Zulu children in this study were lower than those reported in other studies. The results of this selected test battery provide data for the further development of appropriate test instruments for South African conditions. These results can contribute towards the development of a test battery for South African children that can be used to assess and improve their school performance.
Bleau Lavigne, Maude; Reeves, Isabelle; Sasseville, Marie-Josée; Loignon, Christine
The primary purpose of this study was to develop 2 survey tools to explore factors influencing adoption of best practices for diabetic foot ulcer offloading treatment in primary health care settings. One survey was intended for the patients receiving care for a diabetic foot ulcer in primary health care settings and the other was intended for the health professionals providing treatment. The second purpose of this study was to evaluate the psychometric properties of the 2 surveys. Development and validation of survey instruments. Two surveys were developed using a published guide. Following review of pertinent literature and identification of variables to be measured, a bank of items was developed and pretested to determine clarity of the item and responses. Psychometric testing comprised measurement of content validity index (CVI) and intraclass correlation coefficient (ICC). Only items obtaining satisfactory CVI and ICC scores were included in the final version of the surveys. The final version of the patient survey contained 41 items and the final version of the survey for health care professionals contained 21 items. The patient-intended survey's items demonstrate high content validity scores and satisfactory test-retest reliability scores. The overall CVI score was 0.98. Forty of the 49 items eligible for testing obtain satisfactory ICC scores. One item's test-retest reliability could not be tested but it was retained based on its high CVI. The health professional-intended survey, an overall CVI score of 0.91 but items had lower ICC scores (63%, 31 of the 49 items), did not achieve a satisfactory ICC score for inclusion in the final instrument. This project led to development of 2 instruments designed to identify and explore factors influencing adoption of best practices for diabetic foot ulcer offloading treatment in the primary health care setting. Future research and testing is required to translate these French surveys into English and additional languages, in order to reach a broader population.
Assessing Student Understanding of Physical Hydrology
NASA Astrophysics Data System (ADS)
Castillo, A. J.; Marshall, J.; Cardenas, M. B.
2012-12-01
Our objective is to characterize and assess upper division and graduate student thinking by developing and testing an assessment tool for a physical hydrology class. The class' learning goals are: (1) Quantitative process-based understanding of hydrologic processes, (2) Experience with different methods in hydrology, (3) Learning, problem solving, communication skills. These goals were translated into two measurable tasks asked of students in a questionnaire: (1) Describe the significant processes in the hydrological cycle and (2) Describe laws governing these processes. A third question below assessed the students' ability to apply their knowledge: You have been hired as a consultant by __ to (1) assess how urbanization and the current drought have affected a local spring and (2) predict what the effects will be in the future if the drought continues. What information would you need to gather? What measurements would you make? What analyses would you perform? Student and expert responses to the questions were then used to develop a rubric to score responses. Using the rubric, 3 researchers independently blind-coded the full set of pre and post artifacts, resulting in 89% inter-rater agreement on the pre-tests and 83% agreement on the post-tests. We present student scores to illustrate the use of the rubric and to characterize student thinking prior to and following a traditional course. Most students interpreted Q1 in terms of physical processes affecting the water cycle, the primary organizing framework for hydrology, as intended. On the pre-test, one student scored 0, indicating no response, on this question. Twenty students scored 1, indicating rudimentary understanding, 2 students scored a 2, indicating a basic understanding, and no student scored a 3. Student scores on this question improved on the post-test. On the 22 post-tests that were blind scored, 11 students demonstrated some recognition of concepts, 9 students showed a basic understanding, and 2 students had a full understanding of the processes linked to hydrology. Half the students had provided evidence of the desired understanding; however, half still demonstrated only a rudimentary understanding. Results on Q2 were similar. On the pre-test, 2 students scored 0, 21 students scored 1, indicating rudimentary understanding, 2 students scored a 2, and no student scored a 3. On the post-test, again approximately half the students achieved the desired understanding: 9 students showed some recognition of concepts, 12 students demonstrated a basic understanding; only one student exhibited full understanding. On Q3, no student scored 0, 9 scored 1, 15 scored 2 and 1 student scored 3. On the post-test, one student scored 1, 16 students scored 2, and 5 students scored 3. Students were significantly better at responding to Q3 (the application) as opposed to Q1 and Q2, which were more abstract. Research has shown that students are often better able to solve contextualized problems when they are unable to deal with more abstract tasks. This result has limitations including the small number of participants, all from one institution, and the fact that the rubric was still under development. Nevertheless, the high inter-rater agreement by a group of experts is significant; the rubric we developed is a potentially useful tool for assessment of learning and understanding physical hydrology. Supported by NSF CAREER grant (EAR-0955750).
Gera, G; Freeman, D L; Blackinton, M T; Horak, F B; King, L
2016-02-01
Balance deficits in people with Parkinson's disease can affect any of the multiple systems encompassing balance control. Thus, identification of the specific deficit is crucial in customizing balance rehabilitation. The sensory organization test, a test of sensory integration for balance control, is sometimes used in isolation to identify balance deficits in people with Parkinson's disease. More recently, the Mini-Balance Evaluations Systems Test, a clinical scale that tests multiple domains of balance control, has begun to be used to assess balance in patients with Parkinson's disease. The purpose of our study was to compare the use of Sensory Organization Test and Mini-Balance Evaluations Systems Test in identifying balance deficits in people with Parkinson's disease. 45 participants (27M, 18F; 65.2 ± 8.2 years) with idiopathic Parkinson's disease participated in the cross-sectional study. Balance assessment was performed using the Sensory Organization Test and the Mini-Balance Evaluations Systems Test. People were classified into normal and abnormal balance based on the established cutoff scores (normal balance: Sensory Organization Test >69; Mini-Balance Evaluations Systems Test >73). More subjects were classified as having abnormal balance with the Mini-Balance Evaluations Systems Test (71% abnormal) than with the Sensory Organization Test (24% abnormal) in our cohort of people with Parkinson's disease. There were no subjects with a normal Mini-Balance Evaluations Systems Test score but abnormal Sensory Organization Test score. In contrast, there were 21 subjects who had an abnormal Mini-Balance Evaluations Systems Test score but normal Sensory Organization Test scores. Findings from this study suggest that investigation of sensory integration deficits, alone, may not be able to identify all types of balance deficits found in patients with Parkinson's disease. Thus, a comprehensive approach should be used to test of multiple balance systems to provide customized rehabilitation.
Wijsman, Liselotte Willemijn; Cachucho, Ricardo; Hoevenaar-Blom, Marieke Peternella; Mooijaart, Simon Pieter; Richard, Edo
2017-01-01
Background Smartphone-assisted technologies potentially provide the opportunity for large-scale, long-term, repeated monitoring of cognitive functioning at home. Objective The aim of this proof-of-principle study was to evaluate the feasibility and validity of performing cognitive tests in people at increased risk of dementia using smartphone-based technology during a 6 months follow-up period. Methods We used the smartphone-based app iVitality to evaluate five cognitive tests based on conventional neuropsychological tests (Memory-Word, Trail Making, Stroop, Reaction Time, and Letter-N-Back) in healthy adults. Feasibility was tested by studying adherence of all participants to perform smartphone-based cognitive tests. Validity was studied by assessing the correlation between conventional neuropsychological tests and smartphone-based cognitive tests and by studying the effect of repeated testing. Results We included 151 participants (mean age in years=57.3, standard deviation=5.3). Mean adherence to assigned smartphone tests during 6 months was 60% (SD 24.7). There was moderate correlation between the firstly made smartphone-based test and the conventional test for the Stroop test and the Trail Making test with Spearman ρ=.3-.5 (P<.001). Correlation increased for both tests when comparing the conventional test with the mean score of all attempts a participant had made, with the highest correlation for Stroop panel 3 (ρ=.62, P<.001). Performance on the Stroop and the Trail Making tests improved over time suggesting a learning effect, but the scores on the Letter-N-back, the Memory-Word, and the Reaction Time tests remained stable. Conclusions Repeated smartphone-assisted cognitive testing is feasible with reasonable adherence and moderate relative validity for the Stroop and the Trail Making tests compared with conventional neuropsychological tests. Smartphone-based cognitive testing seems promising for large-scale data-collection in population studies. PMID:28546139
Murphy, Jennifer; Ahmed, Fizaa; Lomen-Hoerth, Catherine
2015-03-01
The University of California San Francisco (UCSF) Screening Battery provides clinicians with a uniquely tailored tool to measure ALS patients' cognitive and behavioral changes, adjusting for dysarthria and hand weakness. The battery consists of the ALS-CBS ( 1 ), Written Fluency Test ( 2 ), and a new revision of the Frontal Behavior Inventory (FBI-ALS) ( 3 ). The validity of each component was tested by comparing results with a gold standard neuropsychological exam (GNE). Consensus criteria-based GNE diagnoses ( 4 ) were assigned (n = 24) and concurrent validity was tested for each screening exam component. Results showed that each of the four cognitive and behavioral screening test components were significantly associated with diagnoses confirmed by GNE. GNE diagnoses were significantly associated with FBI-ALS negative score, written S-words score, and ALS-CBS cognitive score. The total FBI-ALS score and C-words tests were less predictive of GNE-diagnosed impairment. In conclusion, the UCSF Cognitive Screening Battery demonstrates good external validity compared with GNE in this modest sample, encouraging its use in larger investigations. These data suggest that this battery may provide an effective screen to identify ALS patients who will then benefit from a full examination to confirm their diagnosis.
CAT Procedures for Passage-Based Tests.
ERIC Educational Resources Information Center
Thompson, Tony D.; Davey, Tim
Methods to control the test construct and the efficiency of a computerized adaptive test (CAT) were studied in the context of a reading comprehension test given as a part of a battery of tests for college admission. A goal of the study was to create test scores that were interchangeable with those from a fixed form paper and pencil test. The first…
Measuring change in critical thinking skills of dental students educated in a PBL curriculum.
Pardamean, Bens
2012-04-01
This study measured the change in critical thinking skills of dental students educated in a problem-based learning (PBL) pedagogical method. The quantitative analysis was focused on measuring students' critical thinking skills achievement from their first through third years of dental education at the University of Southern California. This non-experimental evaluation was based on a volunteer sample of ninety-eight dental students who completed a demographics/academic questionnaire and a psychometric assessment known as the Health Sciences Reasoning Test (HSRT). The HSRT produced the overall critical thinking skills score. Additionally, the HSRT generated five subscale scores: analysis, inference, evaluation, deductive reasoning, and inductive reasoning. The results of this study concluded that the students showed no continuous and significant incremental improvement in their overall critical thinking skills score achievement during their PBL-based dental education. Except for the inductive reasoning score, this result was very consistent with the four subscale scores. Moreover, after performing the statistical adjustment on total score and subscale scores, no significant statistical differences were found among the three student groups. However, the results of this study found some aspects of critical thinking achievements that differed by categories of gender, race, English as first language, and education level.
Allele-sharing models: LOD scores and accurate linkage tests.
Kong, A; Cox, N J
1997-11-01
Starting with a test statistic for linkage analysis based on allele sharing, we propose an associated one-parameter model. Under general missing-data patterns, this model allows exact calculation of likelihood ratios and LOD scores and has been implemented by a simple modification of existing software. Most important, accurate linkage tests can be performed. Using an example, we show that some previously suggested approaches to handling less than perfectly informative data can be unacceptably conservative. Situations in which this model may not perform well are discussed, and an alternative model that requires additional computations is suggested.
Allele-sharing models: LOD scores and accurate linkage tests.
Kong, A; Cox, N J
1997-01-01
Starting with a test statistic for linkage analysis based on allele sharing, we propose an associated one-parameter model. Under general missing-data patterns, this model allows exact calculation of likelihood ratios and LOD scores and has been implemented by a simple modification of existing software. Most important, accurate linkage tests can be performed. Using an example, we show that some previously suggested approaches to handling less than perfectly informative data can be unacceptably conservative. Situations in which this model may not perform well are discussed, and an alternative model that requires additional computations is suggested. PMID:9345087
Gattellari, Melina; Ward, Jeanette E
2005-05-01
Randomised evaluations of resources to facilitate informed decisions about prostate cancer screening are rarely conducted. In this study, 421 men recruited from the community were randomly allocated to receive a leaflet (n = 140) or one of two resources meeting criteria for a decision-aid: a video (n = 141) or an evidence-based booklet, developed by the authors (n = 140). Men in all three groups demonstrated significant increases in knowledge scores from pre to post-test. Scores were significantly higher at post-test amongst those who had received our evidence-based booklet compared with men who received the leaflet or video (P < 0.001). Scores were significantly modified by men's preferences for decisional control (P = 0.002). Decisional conflict was significantly lower amongst men receiving the evidence-based booklet (P = 0.038). Men receiving the evidence-based booklet also were less likely to accept a recommendation by a GP to undergo prostate-specific-antigen (PSA) screening (P = 0.003). Men require detailed information about the pros and cons of PSA screening in order to make an informed decision. Resources are not equivalent in achieving these outcomes.
Allen, D D; Bond, C A
2001-07-01
Good admissions decisions are essential for identifying successful students and good practitioners. Various parameters have been shown to have predictive power for academic success. Previous academic performance, the Pharmacy College Admissions Test (PCAT), and specific prepharmacy courses have been suggested as academic performance indicators. However, critical thinking abilities have not been evaluated. We evaluated the connection between academic success and each of the following predictive parameters: the California Critical Thinking Skills Test (CCTST) score, PCAT score, interview score, overall academic performance prior to admission at a pharmacy school, and performance in specific prepharmacy courses. We confirmed previous reports but demonstrated intriguing results in predicting practice-based skills. Critical thinking skills predict practice-based course success. Also, the CCTST and PCAT scores (Pearson correlation [pc] = 0.448, p < 0.001) were closely related in our students. The strongest predictors of practice-related courses and clerkship success were PCAT (pc=0.237, p<0.001) and CCTST (pc = 0.201, p < 0.001). These findings and other analyses suggest that PCAT may predict critical thinking skills in pharmacy practice courses and clerkships. Further study is needed to confirm this finding and determine which PCAT components predict critical thinking abilities.
ERIC Educational Resources Information Center
Rock, JoAnn Leah; Adler, Rachel M.
2014-01-01
The purpose of this study was to investigate the ways in which universities use the "GRE"® General Test scores to award merit-based fellowships to first-year graduate students in doctoral programs. While GRE use in fellowship award decisions is a common practice, there is very little validity evidence to support its use in this context.…
Validation of the Female Sexual Function Index (FSFI) for web-based administration.
Crisp, Catrina C; Fellner, Angela N; Pauls, Rachel N
2015-02-01
Web-based questionnaires are becoming increasingly valuable for clinical research. The Female Sexual Function Index (FSFI) is the gold standard for evaluating female sexual function; yet, it has not been validated in this format. We sought to validate the Female Sexual Function Index (FSFI) for web-based administration. Subjects enrolled in a web-based research survey of sexual function from the general population were invited to participate in this validation study. The first 151 respondents were included. Validation participants completed the web-based version of the FSFI followed by a mailed paper-based version. Demographic data were collected for all subjects. Scores were compared using the paired t test and the intraclass correlation coefficient. One hundred fifty-one subjects completed both web- and paper-based versions of the FSFI. Those subjects participating in the validation study did not differ in demographics or FSFI scores from the remaining subjects in the general population study. Total web-based and paper-based FSFI scores were not significantly different (mean 20.31 and 20.29 respectively, p = 0.931). The six domains or subscales of the FSFI were similar when comparing web and paper scores. Finally, intraclass correlation analysis revealed a high degree of correlation between total and subscale scores, r = 0.848-0.943, p < 0.001. Web-based administration of the FSFI is a valid alternative to the paper-based version.
NASA Astrophysics Data System (ADS)
Patke, Usha
Achievement data from the 3rd International Mathematics and Sciences Study and Program for International Student Assessment in science have indicated that Black students from economically disadvantaged families underachieve at alarming rates in comparison to White and economically advantaged peer groups. The study site was a predominately Black, urban school district experiencing underachievement. The purpose of this correlational study was to examine the relationship between students' use of inquiry-based laboratory investigations and their performance on the Biology End of Course Test, as well as to examine the relationship while partialling out the effects of student gender. Constructivist theory formed the theoretical foundation of the study. Students' perceived levels of experience with inquiry-based laboratory investigations were measured using the Laboratory Program Variable Inventory (LPVI) survey. LPVI scores of 256 students were correlated with test scores and were examined by student gender. The Pearson correlation coefficient revealed a small direct correlation between students' experience in inquiry-based laboratory investigation classes and standardized test scores on the Biology EOCT. A partial correlational analysis indicated that the correlation remained after controlling for gender. This study may prompt a change from teacher-centered to student-centered pedagogy at the local site in order to increase academic achievement for all students. The results of this study may also influence administrators and policy makers to initiate local, state, or nationwide curricular development. A change in curriculum may promote social change as students become more competent, and more able, to succeed in life beyond secondary school.
Automatic detection of cardiovascular risk in CT attenuation correction maps in Rb-82 PET/CTs
NASA Astrophysics Data System (ADS)
Išgum, Ivana; de Vos, Bob D.; Wolterink, Jelmer M.; Dey, Damini; Berman, Daniel S.; Rubeaux, Mathieu; Leiner, Tim; Slomka, Piotr J.
2016-03-01
CT attenuation correction (CTAC) images acquired with PET/CT visualize coronary artery calcium (CAC) and enable CAC quantification. CAC scores acquired with CTAC have been suggested as a marker of cardiovascular disease (CVD). In this work, an algorithm previously developed for automatic CAC scoring in dedicated cardiac CT was applied to automatic CAC detection in CTAC. The study included 134 consecutive patients undergoing 82-Rb PET/CT. Low-dose rest CTAC scans were acquired (100 kV, 11 mAs, 1.4mm×1.4mm×3mm voxel size). An experienced observer defined the reference standard with the clinically used intensity level threshold for calcium identification (130 HU). Five scans were removed from analysis due to artifacts. The algorithm extracted potential CAC by intensity-based thresholding and 3D connected component labeling. Each candidate was described by location, size, shape and intensity features. An ensemble of extremely randomized decision trees was used to identify CAC. The data set was randomly divided into training and test sets. Automatically identified CAC was quantified using volume and Agatston scores. In 33 test scans, the system detected on average 469mm3/730mm3 (64%) of CAC with 36mm3 false positive volume per scan. The intraclass correlation coefficient for volume scores was 0.84. Each patient was assigned to one of four CVD risk categories based on the Agatston score (0-10, 11-100, 101-400, <400). The correct CVD category was assigned to 85% of patients (Cohen's linearly weighted κ0.82). Automatic detection of CVD risk based on CAC scoring in rest CTAC images is feasible. This may enable large scale studies evaluating clinical value of CAC scoring in CTAC data.
Detection of suboptimal effort with symbol span: development of a new embedded index.
Young, J Christopher; Caron, Joshua E; Baughman, Brandon C; Sawyer, R John
2012-03-01
Developing embedded indicators of suboptimal effort on objective neurocognitive testing is essential for detecting increasingly sophisticated forms of symptom feigning. The current study explored whether Symbol Span, a novel Wechsler Memory Scale-fourth edition measure of supraspan visual attention, could be used to discriminate adequate effort from suboptimal effort. Archival data were collected from 136 veterans classified into Poor Effort (n = 42) and Good Effort (n = 94) groups based on symptom validity test (SVT) performance. The Poor Effort group had significantly lower raw scores (p < .001) and age-corrected scaled scores (p < .001) than the Good Effort group on the Symbol Span test. A raw score cutoff of <14 produced 83% specificity and 50% sensitivity for detection of Poor Effort. Similarly, sensitivity was 52% and specificity was 84% when employing a cutoff of <7 for Age-Corrected Scale Score. Collectively, present results suggest that Symbol Span can effectively differentiate veterans with multiple failures on established free-standing and embedded SVTs.
The Effectiveness of Language! in Raising Reading Scores in One Middle School
ERIC Educational Resources Information Center
Carmichel-Hall, Cathy
2010-01-01
This study sought to determine the effectiveness of the "Language!" reading curriculum and a school district-developed grade-level reading curriculum in raising student reading scores on Tennessee state-mandated tests and to determine if student attitudes toward academic and recreational reading differ based on reading curriculum. The…
Identifying and Evaluating External Validity Evidence for Passing Scores
ERIC Educational Resources Information Center
Davis-Becker, Susan L.; Buckendahl, Chad W.
2013-01-01
A critical component of the standard setting process is collecting evidence to evaluate the recommended cut scores and their use for making decisions and classifying students based on test performance. Kane (1994, 2001) proposed a framework by which practitioners can identify and evaluate evidence of the results of the standard setting from (1)…
Predicting End-of-Year Achievement Test Performance: A Comparison of Assessment Methods
ERIC Educational Resources Information Center
Kettler, Ryan J.; Elliott, Stephen N.; Kurz, Alexander; Zigmond, Naomi; Lemons, Christopher J.; Kloo, Amanda; Shrago, Jacqueline; Beddow, Peter A.; Williams, Leila; Bruen, Charles; Lupp, Lynda; Farmer, Jeanie; Mosiman, Melanie
2014-01-01
Motivated by the multiple-measures clause of recent federal policy regarding student eligibility for alternate assessments based on modified academic achievement standards (AA-MASs), this study examined how scores or combinations of scores from a diverse set of assessments predicted students' end-of-year proficiency status on statewide achievement…
Gurnani, Ashita S; John, Samantha E; Gavett, Brandon E
2015-05-01
The current study developed regression-based normative adjustments for a bi-factor model of the The Brief Test of Adult Cognition by Telephone (BTACT). Archival data from the Midlife Development in the United States-II Cognitive Project were used to develop eight separate linear regression models that predicted bi-factor BTACT scores, accounting for age, education, gender, and occupation-alone and in various combinations. All regression models provided statistically significant fit to the data. A three-predictor regression model fit best and accounted for 32.8% of the variance in the global bi-factor BTACT score. The fit of the regression models was not improved by gender. Eight different regression models are presented to allow the user flexibility in applying demographic corrections to the bi-factor BTACT scores. Occupation corrections, while not widely used, may provide useful demographic adjustments for adult populations or for those individuals who have attained an occupational status not commensurate with expected educational attainment. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Performance of machine-learning scoring functions in structure-based virtual screening.
Wójcikowski, Maciej; Ballester, Pedro J; Siedlecki, Pawel
2017-04-25
Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specifically concerning model overfitting and applicability to novel targets. Here we provide a new ready-to-use scoring function (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets. We use the full DUD-E data sets along with three docking tools, five classical and three machine-learning scoring functions for model building and performance assessment. Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and -0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results. We provide full data sets to facilitate further research in this area (http://github.com/oddt/rfscorevs) as well as ready-to-use RF-Score-VS (http://github.com/oddt/rfscorevs_binary).
Chao, Serena H; Brett, Belle; Wiecha, John M; Norton, Lisa E; Levine, Sharon A
2012-07-01
Web-based learning methods are being used increasingly to teach core curriculum in medical school clerkships, but few studies have compared the effectiveness of online methods with that of live lectures in teaching the same topics to students. Boston University School of Medicine has implemented an online, case-based, interactive curriculum using videos and text to teach delirium to fourth-year medical students during their required 1-month Geriatrics and Home Medical Care clerkship. A control group of 56 students who received a 1-hour live delirium lecture only was compared with 111 intervention group students who completed the online delirium curriculum only. Evaluation consisted of a short-answer test with two cases given as a pre- and posttest to both groups. The total possible maximum test score was 34 points, and the lowest possible score was -8 points. Mean pre- and posttest scores were 10.5 ± 4.0 and 12.7 ± 4.4, respectively, in the intervention group and 9.9 ± 3.5 and 11.2 ± 4.5, respectively, in the control group. The intervention group had statistically significant improvement between the pre- and posttest scores (2.21-point difference; P < .001), as did the control group (1.36-point difference; P = .03); the difference in test score improvement between the two groups was not statistically significant. An interactive case-based online curriculum in delirium is as effective as a live lecture in teaching delirium, although neither of these educational methods alone produces robust increases in knowledge. © 2012, Copyright the Authors Journal compilation © 2012, The American Geriatrics Society.