On the Validity of Useless Tests
ERIC Educational Resources Information Center
Sireci, Stephen G.
2016-01-01
A misconception exists that validity may refer only to the "interpretation" of test scores and not to the "uses" of those scores. The development and evolution of validity theory illustrate test score interpretation was a primary focus in the earliest days of modern testing, and that validating interpretations derived from test…
Can patients interpret health information? An assessment of the medical data interpretation test.
Schwartz, Lisa M; Woloshin, Steven; Welch, H Gilbert
2005-01-01
To establish the reliability/validity of an 18-item test of patients' medical data interpretation skills. Survey with retest after 2 weeks. Subjects. 178 people recruited from advertisements in local newspapers, an outpatient clinic, and a hospital open house. The percentage of correct answers to individual items ranged from 20% to 87%, and medical data interpretation test scores (on a 0- 100 scale) were normally distributed (median 61.1, mean 61.0, range 6-94). Reliability was good (test-retest correlation=0.67, Cronbach's alpha=0.71). Construct validity was supported in several ways. Higher scores were found among people with highest versus lowest numeracy (71 v. 36, P<0.001), highest quantitative literacy (65 v. 28, P<0.001), and highest education (69 v. 42, P=0.004). Scores for 15 physician experts also completing the survey were significantly higher than participants with other postgraduate degrees (mean score 89 v. 69, P<0.001). The medical data interpretation test is a reliable and valid measure of the ability to interpret medical statistics.
Utility of proverb interpretation measures with cardiac transplant candidates.
Dugbartey, A T
1998-12-01
To assess metaphorical understanding and proverb interpretation in cardiac transplant candidates, the neuropsychological assessment records of 22 adults with end-stage cardiac disease under consideration for transplantation were analyzed. Neuropsychological tests consisted of the Controlled Oral Word Association Test, Halstead Category Test, Rey-Osterrieth Complex Figure Test (Copy), Trial Making Test, and summed scores for the proverb items of the WAIS-R Comprehension subtest. Analysis showed that the group tended to interpret proverbs literally. Proverb scores were significantly associated with scores on the Similarities and Picture Arrangement subtests of the WAIS-R. There was a moderate negative association between number of reported heart attacks and Proverb scores. The need for brief yet robust assessments including measures of inferential thinking and conceptualization in transplant candidates are highlighted.
Shifting the Focus of Validity for Test Use
ERIC Educational Resources Information Center
Moss, Pamela A.
2016-01-01
The conventional focus of validity in educational measurement has been on intended interpretations and uses of test scores. Empirical studies of test use by teachers, administrators and policy-makers show that actual interpretations and uses of test scores in context are invariably shaped by local users' questions, which frequently require…
ERIC Educational Resources Information Center
Han, Chao
2016-01-01
As a property of test scores, reliability/dependability constitutes an important psychometric consideration, and it underpins the validity of measurement results. A review of interpreter certification performance tests (ICPTs) reveals that (a) although reliability/dependability checking has been recognized as an important concern, its theoretical…
Interpretation and Utilization of Scores on the Air Force Officer Qualifying Test.
ERIC Educational Resources Information Center
Miller, Robert E.
The report summarizes a large body of data relevant to the proper interpretation and use of aptitude scores on the Air Force Officer Qualifying Test (AFOQT). Included are descriptions of the AFOQT testing program and the test itself. Technical data include an extensive sampling of validation studies covering predictors of success in pilot…
Commentary: Student Cognition, the Situated Learning Context, and Test Score Interpretation
ERIC Educational Resources Information Center
La Marca, Paul M.
2006-01-01
Although it is assumed that student cognition contributes to student performance on achievement tests, it may be that current testing models lack the degree of specification necessary to warrant such inferences. With test score interpretations as the referent, the authors in this special issue address the role of student cognition in learning and…
Do Examinees Understand Score Reports for Alternate Methods of Scoring Computer Based Tests?
ERIC Educational Resources Information Center
Whittaker, Tiffany A.; Williams, Natasha J.; Dodd, Barbara G.
2011-01-01
This study assessed the interpretability of scaled scores based on either number correct (NC) scoring for a paper-and-pencil test or one of two methods of scoring computer-based tests: an item pattern (IP) scoring method and a method based on equated NC scoring. The equated NC scoring method for computer-based tests was proposed as an alternative…
Gavett, Brandon E
2015-03-01
The base rates of abnormal test scores in cognitively normal samples have been a focus of recent research. The goal of the current study is to illustrate how Bayes' theorem uses these base rates--along with the same base rates in cognitively impaired samples and prevalence rates of cognitive impairment--to yield probability values that are more useful for making judgments about the absence or presence of cognitive impairment. Correlation matrices, means, and standard deviations were obtained from the Wechsler Memory Scale--4th Edition (WMS-IV) Technical and Interpretive Manual and used in Monte Carlo simulations to estimate the base rates of abnormal test scores in the standardization and special groups (mixed clinical) samples. Bayes' theorem was applied to these estimates to identify probabilities of normal cognition based on the number of abnormal test scores observed. Abnormal scores were common in the standardization sample (65.4% scoring below a scaled score of 7 on at least one subtest) and more common in the mixed clinical sample (85.6% scoring below a scaled score of 7 on at least one subtest). Probabilities varied according to the number of abnormal test scores, base rates of normal cognition, and cutoff scores. The results suggest that interpretation of base rates obtained from cognitively healthy samples must also account for data from cognitively impaired samples. Bayes' theorem can help neuropsychologists answer questions about the probability that an individual examinee is cognitively healthy based on the number of abnormal test scores observed.
An Approach to Scoring and Equating Tests with Binary Items: Piloting With Large-Scale Assessments
ERIC Educational Resources Information Center
Dimitrov, Dimiter M.
2016-01-01
This article describes an approach to test scoring, referred to as "delta scoring" (D-scoring), for tests with dichotomously scored items. The D-scoring uses information from item response theory (IRT) calibration to facilitate computations and interpretations in the context of large-scale assessments. The D-score is computed from the…
Karr, Justin E; Garcia-Barrera, Mauricio A; Holdnack, James A; Iverson, Grant L
2018-01-01
Multivariate base rates allow for the simultaneous statistical interpretation of multiple test scores, quantifying the normal frequency of low scores on a test battery. This study provides multivariate base rates for the Delis-Kaplan Executive Function System (D-KEFS). The D-KEFS consists of 9 tests with 16 Total Achievement scores (i.e. primary indicators of executive function ability). Stratified by education and intelligence, multivariate base rates were derived for the full D-KEFS and an abbreviated four-test battery (i.e. Trail Making, Color-Word Interference, Verbal Fluency, and Tower Test) using the adult portion of the normative sample (ages 16-89). Multivariate base rates are provided for the full and four-test D-KEFS batteries, calculated using five low score cutoffs (i.e. ≤25th, 16th, 9th, 5th, and 2nd percentiles). Low scores occurred commonly among the D-KEFS normative sample, with 82.6 and 71.8% of participants obtaining at least one score ≤16th percentile for the full and four-test batteries, respectively. Intelligence and education were inversely related to low score frequency. The base rates provided herein allow clinicians to interpret multiple D-KEFS scores simultaneously for the full D-KEFS and an abbreviated battery of commonly administered tests. The use of these base rates will support clinicians when differentiating between normal variations in cognitive performance and true executive function deficits.
Predictors of Knowledge and Image Interpretation Skill Development in Radiology Residents.
Ravesloot, Cécile J; van der Schaaf, Marieke F; Kruitwagen, Cas L J J; van der Gijp, Anouk; Rutgers, Dirk R; Haaring, Cees; Ten Cate, Olle; van Schaik, Jan P J
2017-09-01
Purpose To investigate knowledge and image interpretation skill development in residency by studying scores on knowledge and image questions on radiology tests, mediated by the training environment. Materials and Methods Ethical approval for the study was obtained from the ethical review board of the Netherlands Association for Medical Education. Longitudinal test data of 577 of 2884 radiology residents who took semiannual progress tests during 5 years were retrospectively analyzed by using a nonlinear mixed-effects model taking training length as input variable. Tests included nonimage and image questions that assessed knowledge and image interpretation skill. Hypothesized predictors were hospital type (academic or nonacademic), training hospital, enrollment age, sex, and test date. Results Scores showed a curvilinear growth during residency. Image scores increased faster during the first 3 years of residency and reached a higher maximum than knowledge scores (55.8% vs 45.1%). The slope of image score development versus knowledge question scores of 1st-year residents was 16.8% versus 12.4%, respectively. Training hospital environment appeared to be an important predictor in both knowledge and image interpretation skill development (maximum score difference between training hospitals was 23.2%; P < .001). Conclusion Expertise developed rapidly in the initial years of radiology residency and leveled off in the 3rd and 4th training year. The shape of the curve was mainly influenced by the specific training hospital. © RSNA, 2017 Online supplemental material is available for this article.
Commentary on "Validating the Interpretations and Uses of Test Scores"
ERIC Educational Resources Information Center
Brennan, Robert L.
2013-01-01
Kane's paper "Validating the Interpretations and Uses of Test Scores" is the most complete and clearest discussion yet available of the argument-based approach to validation. At its most basic level, validation as formulated by Kane is fundamentally a simply-stated two-step enterprise: (1) specify the claims inherent in a particular interpretation…
ERIC Educational Resources Information Center
Powers, Donald; Schedl, Mary; Papageorgiou, Spiros
2017-01-01
The aim of this study was to develop, for the benefit of both test takers and test score users, enhanced "TOEFL ITP"® test score reports that go beyond the simple numerical scores that are currently reported. To do so, we applied traditional scale anchoring (proficiency scaling) to item difficulty data in order to develop performance…
Evaluation of Score Interpretive Information from the Perspective of Failed and Passed Test-Takers.
ERIC Educational Resources Information Center
Shannon, Gregory A.
Candidates who had taken examinations for certification required by the American Production and Inventory Control Society (APICS) were surveyed to acquire feedback about the effectiveness of score interpretive information given to test takers. Those sampled included 488 passers and 389 failers of the Inventory Management (IM) examination and 457…
A 2-year study of Gram stain competency assessment in 40 clinical laboratories.
Goodyear, Nancy; Kim, Sara; Reeves, Mary; Astion, Michael L
2006-01-01
We used a computer-based competency assessment tool for Gram stain interpretation to assess the performance of 278 laboratory staff from 40 laboratories on 40 multiple-choice questions. We report test reliability, mean scores, median, item difficulty, discrimination, and analysis of the highest- and lowest-scoring questions. The questions were reliable (KR-20 coefficient, 0.80). Overall mean score was 88% (range, 63%-98%). When categorized by cell type, the means were host cells, 93%; other cells (eg, yeast), 92%; gram-positive, 90%; and gram-negative, 88%. When categorized by type of interpretation, the means were other (eg, underdecolorization), 92%; identify by structure (eg, bacterial morphologic features), 91%; and identify by name (eg, genus and species), 87%. Of the 6 highest-scoring questions (mean scores, > or = 99%) 5 were identify by structure and 1 was identify by name. Of the 6 lowest-scoring questions (mean scores, < 75%) 5 were gram-negative and 1 was host cells. By type of interpretation, 2 were identify by structure and 4 were identify by name. Computer-based Gram stain competency assessment examinations are reliable. Our analysis helps laboratories identify areas for continuing education in Gram stain interpretation and will direct future revisions of the tests.
Why Education Practitioners and Stakeholders Should Care about Person Fit in Educational Assessments
ERIC Educational Resources Information Center
Walker, A. Adrienne
2017-01-01
In this article, A. Adrienne Walker introduces the concept of person fit to education stakeholders as a source of evidence to inform the trustworthiness of a test score for interpretation and use (validity). Person fit analyses are used in educational measurement research to explore the degree to which a person's test score can be interpreted as a…
ERIC Educational Resources Information Center
Kremmel, Benjamin; Schmitt, Norbert
2016-01-01
The scores from vocabulary size tests have typically been interpreted as demonstrating that the target words are "known" or "learned." But "knowing" a word should entail the ability to use it in real language communication in one or more of the four skills. It should also entail deeper knowledge, such as knowing the…
Validating Test Score Meaning and Defending Test Score Use: Different Aims, Different Methods
ERIC Educational Resources Information Center
Cizek, Gregory J.
2016-01-01
Advances in validity theory and alacrity in validation practice have suffered because the term "validity" has been used to refer to two incompatible concerns: (1) the degree of support for specified interpretations of test scores (i.e. intended score meaning) and (2) the degree of support for specified applications (i.e. intended test…
Summary of Score Changes (in other Tests).
ERIC Educational Resources Information Center
Cleary, T. Anne; McCandless, Sam A.
Scholastic Aptitude Test (SAT) scores have declined during the last 14 years. Similar score declines have been observed in many different testing programs, many groups, and tested areas. The declines, while not large in any given year, have been consistent over time, area, and group. The period around 1965 is critical for the interpretation of…
ERIC Educational Resources Information Center
Sullivan, Jeremy R.; Winter, Suzanne M.; Sass, Daniel A.; Svenkerud, Nicole
2014-01-01
Many tests provide users with several different types of scores to facilitate interpretation and description of students' performance. Common examples include raw scores, age- and grade-equivalent scores, and standard scores. However, when used within the context of assessing growth among young children, these scores should not be interchangeable…
Score Reporting in Teacher Certification Testing: A Review, Design, and Interview/Focus Group Study
ERIC Educational Resources Information Center
Klesch, Heather S.
2010-01-01
The reporting of scores on educational tests is at times misunderstood, misinterpreted, and potentially confusing to examinees and other stakeholders who may need to interpret test scores. In reporting test results to examinees, there is a need for clarity in the message communicated. As pressure rises for students to demonstrate performance at a…
Aligning Scales of Certification Tests. Research Report. ETS RR-10-07
ERIC Educational Resources Information Center
Dorans, Neil J.; Liang, Longjuan; Puhan, Gautam
2010-01-01
Scores are the most visible and widely used products of a testing program. The choice of score scale has implications for test specifications, equating, and test reliability and validity, as well as for test interpretation. At the same time, the score scale should be viewed as infrastructure likely to require repair at some point. In this report…
Interpreting Measures of Moral Development to Individuals.
ERIC Educational Resources Information Center
Lutwak, Nita; Hennessy, James
1985-01-01
Examined the interpretation of measures of moral development to individuals from three phases of test interpretation. Interpretations based on the meaning of the scale, the meaning of the score, and specific meaning to the individual are explored. (Author)
Eckner, James T; Rettmann, Ashley; Narisetty, Naveen; Greer, Jacob; Moore, Brandon; Brimacombe, Susan; He, Xuming; Broglio, Steven P
2016-01-01
To determine test-re-test reliabilities of novel Evoked Response Potential (ERP)-based Brain Network Activation (BNA) scores in healthy athletes. Observational, repeated-measures study. Forty-two healthy male and female high school and collegiate athletes completed auditory oddball and go/no-go ERP assessments at baseline, 1 week, 6 weeks and 1 year. The BNA algorithm was applied to the ERP data, considering electrode location, frequency band, peak latency and normalized amplitude to generate seven unique BNA scores for each testing session. Mean BNA scores, intra-class correlation coefficient (ICC) values and reliable change (RC) values were calculated for each of the seven BNA networks. BNA scores ranged from 46.3 ± 34.9 to 69.9 ± 22.8, ICC values ranged from 0.46-0.65 and 95% RC values ranged from 38.3-68.1 across the seven networks. The wide range of BNA scores observed in this population of healthy athletes suggests that a single BNA score or set of BNA scores from a single after-injury test session may be difficult to interpret in isolation without knowledge of the athlete's own baseline BNA score(s) and/or the results of serial tests performed at additional time points. The stability of each BNA network should be considered when interpreting test-re-test BNA score changes.
The Efficacy of Mammography Boot Camp to Improve the Performance of Radiologists
Lee, Eun Hye; Jung, Seung Eun; Kim, You Me; Choi, Nami
2014-01-01
Objective To evaluate the efficacy of a mammography boot camp (MBC) to improve radiologists' performance in interpreting mammograms in the National Cancer Screening Program (NCSP) in Korea. Materials and Methods Between January and July of 2013, 141 radiologists were invited to a 3-day educational program composed of lectures and group practice readings using 250 digital mammography cases. The radiologists' performance in interpreting mammograms were evaluated using a pre- and post-camp test set of 25 cases validated prior to the camp by experienced breast radiologists. Factors affecting the radiologists' performance, including age, type of attending institution, and type of test set cases, were analyzed. Results The average scores of the pre- and post-camp tests were 56.0 ± 12.2 and 78.3 ± 9.2, respectively (p < 0.001). The post-camp test scores were higher than the pre-camp test scores for all age groups and all types of attending institutions (p < 0.001). The rate of incorrect answers in the post-camp test decreased compared to the pre-camp test for all suspicious cases, but not for negative cases (p > 0.05). Conclusion The MBC improves radiologists' performance in interpreting mammograms irrespective of age and type of attending institution. Improved interpretation is observed for suspicious cases, but not for negative cases. PMID:25246818
Interpreting Linked Psychomotor Performance Scores
ERIC Educational Resources Information Center
Looney, Marilyn A.
2013-01-01
Given that equating/linking applications are now appearing in kinesiology literature, this article provides an overview of the different types of linked test scores: equated, concordant, and predicted. It also addresses the different types of evidence required to determine whether the scores from two different field tests (measuring the same…
ERIC Educational Resources Information Center
Reynolds, Matthew R.
2013-01-01
The linear loadings of intelligence test composite scores on a general factor ("g") have been investigated recently in factor analytic studies. Spearman's law of diminishing returns (SLODR), however, implies that the "g" loadings of test scores likely decrease in magnitude as g increases, or they are nonlinear. The purpose of…
Analysis of Added Value of Subscores with Respect to Classification
ERIC Educational Resources Information Center
Sinharay, Sandip
2014-01-01
Brennan noted that users of test scores often want (indeed, demand) that subscores be reported, along with total test scores, for diagnostic purposes. Haberman suggested a method based on classical test theory (CTT) to determine if subscores have added value over the total score. One way to interpret the method is that a subscore has added value…
ERIC Educational Resources Information Center
Ramos, Erica; Alfonso, Vincent C.; Schermerhorn, Susan M.
2009-01-01
The interpretation of cognitive test scores often leads to decisions concerning the diagnosis, educational placement, and types of interventions used for children. Therefore, it is important that practitioners administer and score cognitive tests without error. This study assesses the frequency and types of examiner errors that occur during the…
Keeping Your Audience in Mind: Applying Audience Analysis to the Design of Interactive Score Reports
ERIC Educational Resources Information Center
Zapata-Rivera, Juan Diego; Katz, Irvin R.
2014-01-01
Score reports have one or more intended audiences: the people who use the reports to make decisions about test takers, including teachers, administrators, parents and test takers. Attention to audience when designing a score report supports assessment validity by increasing the likelihood that score users will interpret and use assessment results…
Hawkins, Melanie; Elsworth, Gerald R; Osborne, Richard H
2018-07-01
Data from subjective patient-reported outcome measures (PROMs) are now being used in the health sector to make or support decisions about individuals, groups and populations. Contemporary validity theorists define validity not as a statistical property of the test but as the extent to which empirical evidence supports the interpretation of test scores for an intended use. However, validity testing theory and methodology are rarely evident in the PROM validation literature. Application of this theory and methodology would provide structure for comprehensive validation planning to support improved PROM development and sound arguments for the validity of PROM score interpretation and use in each new context. This paper proposes the application of contemporary validity theory and methodology to PROM validity testing. The validity testing principles will be applied to a hypothetical case study with a focus on the interpretation and use of scores from a translated PROM that measures health literacy (the Health Literacy Questionnaire or HLQ). Although robust psychometric properties of a PROM are a pre-condition to its use, a PROM's validity lies in the sound argument that a network of empirical evidence supports the intended interpretation and use of PROM scores for decision making in a particular context. The health sector is yet to apply contemporary theory and methodology to PROM development and validation. The theoretical and methodological processes in this paper are offered as an advancement of the theory and practice of PROM validity testing in the health sector.
ERIC Educational Resources Information Center
Qu, Yanxuan; Huo, Yan; Chan, Eric; Shotts, Matthew
2017-01-01
For educational tests, it is critical to maintain consistency of score scales and to understand the sources of variation in score means over time. This practice helps to ensure that interpretations about test takers' abilities are comparable from one administration (or one form) to another. This study examines the consistency of reported scores…
How Is Testing Supposed to Improve Schooling?
ERIC Educational Resources Information Center
Haertel, Edward
2013-01-01
Validation research for educational achievement tests is often limited to an examination of intended test score interpretations. This article calls for an expansion of validation research in three dimensions. First, validation must attend to actual test use and its consequences, not just score meaning. Second, validation must attend to unintended…
Test Takers and the Validity of Score Interpretations
ERIC Educational Resources Information Center
Kopriva, Rebecca J.; Thurlow, Martha L.; Perie, Marianne; Lazarus, Sheryl S.; Clark, Amy
2016-01-01
This article argues that test takers are as integral to determining validity of test scores as defining target content and conditioning inferences on test use. A principled sustained attention to how students interact with assessment opportunities is essential, as is a principled sustained evaluation of evidence confirming the validity or calling…
George, J M; Wagner, E E
1995-06-01
Pearson correlations between the Hand Test Pathology (PATH) score and Personality Assessment Inventory scales produced a cluster of relationships characteristic of an antisocial orientation. Likewise, PATH significantly differentiated between a "P" (Pathology) group flagged by a high Negative Impression score on the inventory, and an "N" (Normal) group of 100 pain patients. It was suggested that the interpretive simplicity of Hand Test scores renders the scores amenable to further correlational studies involving the inventory.
Fent, Graham; Gosai, Jivendra; Purva, Makani
2016-01-01
Accurate interpretation of the electrocardiogram (ECG) remains an essential skill for medical students and junior doctors. While many techniques for teaching ECG interpretation are described, no single method has been shown to be superior. This randomized control trial is the first to investigate whether teaching ECG interpretation using a computer simulator program or traditional teaching leads to improved scores in a test of ECG interpretation among medical students and postgraduate doctors immediately after and 3months following teaching. Participants' opinions of the program were assessed using a questionnaire. There were no differences in ECG interpretation test scores immediately after or 3months after teaching in the lecture or simulator groups. At present therefore, there is insufficient evidence to suggest that ECG simulator programs are superior to traditional teaching. Copyright © 2016 Elsevier Inc. All rights reserved.
Evaluating Test Validity: Reprise and Progress
ERIC Educational Resources Information Center
Shepard, Lorrie A.
2016-01-01
The AERA, APA, NCME Standards define validity as "the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests". A century of disagreement about validity does not mean that there has not been substantial progress. This consensus definition brings together interpretations and use so that it…
Psychometric Properties of Raw and Scale Scores on Mixed-Format Tests
ERIC Educational Resources Information Center
Kolen, Michael J.; Lee, Won-Chan
2011-01-01
This paper illustrates that the psychometric properties of scores and scales that are used with mixed-format educational tests can impact the use and interpretation of the scores that are reported to examinees. Psychometric properties that include reliability and conditional standard errors of measurement are considered in this paper. The focus is…
ERIC Educational Resources Information Center
Macmann, Gregg M.; Barnett, David W.
1997-01-01
Used computer simulation to examine the reliability of interpretations for Kaufman's "intelligent testing" approach to the Wechsler Intelligence Scale for Children (3rd ed.) (WISC-III). Findings indicate that factor index-score differences and other measures could not be interpreted with confidence. Argues that limitations of IQ testing…
ERIC Educational Resources Information Center
Carmichael, Karla Delle
The Developmental Indicators for the Assessment of Learning-Revised (Dial-R) Test, the Peabody Picture Vocabulary Test (PPVT), and the Motor-Free Visual Perception Test (MFVPT) were used for kindergarten screening in three rural schools in Texas. Teachers in the schools requested a handbook that would help them interpret test scores and plan…
Interactional Competence: Challenges for Validity.
ERIC Educational Resources Information Center
Young, Richard F.
One of the ways in which language testing interfaces with applied linguistics is in the definition and validation of the constructs that underlie language tests. When language testers and score users interpret scores on a test, they do so by implicit and explicit reference to the construct on which the test is based. Equally, when applied to new…
Developmental prosopagnosia and the Benton Facial Recognition Test.
Duchaine, Bradley C; Nakayama, Ken
2004-04-13
The Benton Facial Recognition Test is used for clinical and research purposes, but evidence suggests that it is possible to pass the test with impaired face discrimination abilities. The authors tested 11 patients with developmental prosopagnosia using this test, and a majority scored in the normal range. Consequently, scores in the normal range should be interpreted cautiously, and testing should always be supplemented by other face tests.
ERIC Educational Resources Information Center
Weisenburger, Susan M.; Harkness, Allan R.; McNulty, John L.; Graham, John R.; Ben-Porath, Yossef S.
2008-01-01
The Minnesota Mutiphasic Personality Inventory-2 (MMPI-2)-based Personality Psychopathology-Five (PSY-5) scales provide an overview of personality individual differences. Several textbooks and a test report offer instruction on interpreting MMPI-2 PSY-5 scores. On the basis of an earlier item response theory article (S. V. Rouse, M. S. Finger,…
ERIC Educational Resources Information Center
Atkinson, Becky M.
2012-01-01
The study reported in this article examines how teachers read and respond to their students' Stanford Achievement Test 10 (SAT 10) scores with the goal of investigating the assumption that data-based teaching practice is more "objective" and less susceptible to divergent teacher interpretation. The study uses reader response theory to…
A Note on the Use of the Hiskey-Nebraska Test of Learning Aptitude with Deaf Children.
ERIC Educational Resources Information Center
Watson, Betty U.; Goldgar, David E.
1985-01-01
Comparing distribution of scores on the Hiskey-Nebraska Test of Learning Aptitude (H-NTLA) with those from the Wechsler Performance Scales for 71 hearing impaired Ss revealed a correlation of .85. However, the H-NTLA yielded more Ss with extreme scores. Findings stress the need for caution in interpreting extreme H-NTLA scores. (CL)
Improving Analysis: Dealing with Information Processing Errors
2006-11-01
obviating this issue, psychological test data provides information that is normed and scored in a common standardized metric (e.g., a z score. A z score is a...to take these into account when interpreting psychological test information. Clinicians are not alone in their relative inability to outperform...1980); M. Snyder and B. Campbell, " Testing hypotheses about other people: The role of the hypothesis," Personality and Social Psychology Bulletin, No. 6
ERIC Educational Resources Information Center
Papageorgiou, Spiros; Morgan, Rick; Becker, Valerie
2015-01-01
The purpose of this study was to enhance the meaning of the scores of an English-language test by developing performance levels and descriptors for reporting overall test performance. The levels and descriptors were intended to accompany the total scale scores of TOEFL Junior® Standard, an international test of English as a second/foreign…
NASA Astrophysics Data System (ADS)
Young, Jerry Wayne
The purpose of this study was to determine the effects of four instructional methods (direct instruction, computer-aided instruction, video observation, and microcomputer-based lab activities), gender, and time of testing (pretest, immediate posttest for determining the immediate effect of instruction, and a delayed posttest two weeks later to determine the retained effect of the instruction) on the achievement of sixth graders who were learning to interpret graphs of displacement and velocity. The dependent variable of achievement was reflected in the scores earned by students on a testing instrument of established validity and reliability. The 107 students participating in the study were divided by gender and were then randomly assigned to the four treatment groups, each taught by a different teacher. Each group had approximately equal numbers of males and females. The students were pretested and then involved in two class periods of the instructional method which was unique to their group. Immediately following treatment they were posttested and two weeks later they were posttested again. The data in the form of test scores were analyzed with a two-way split-plot analysis of variance to determine if there was significant interaction among technique, gender, and time of testing. When significant interaction was indicated, the Tukey HSD test was used to determine specific mean differences. The results of the analysis indicated no gender effect. Only students in the direct instruction group and the microcomputer-based laboratory group had significantly higher posttest-1 scores than pretest scores. They also had significantly higher posttest-2 scores than pretest scores. This suggests that the learning was retained. The other groups experienced no significant differences among pretest, posttest-1, and posttest-2 scores. Recommendations are that direct instruction and microcomputer-based laboratory activities should be considered as effective stand-alone methods for teaching sixth grade students to interpret graphs of displacement and velocity. However, video and computer instruction may serve as supplemental activities.
Electrocardiogram interpretation skills among ambulance nurses.
Werner, Kristoffer; Kander, Kristofer; Axelsson, Christer
2016-06-01
To describe ambulance nurses' practical electrocardiogram (ECG) interpretation skills and to measure the correlation between these skills and factors that may impact on the level of knowledge. This study was conducted using a prospective quantitative survey with questionnaires and a knowledge test. A convenience sample collection was conducted among ambulance nurses in three different districts in western Sweden. The knowledge test consisted of nine different ECGs. The score of the ECG test were correlated against the questions in the questionnaire regarding both general ECG interpretation skill and ability to identify acute myocardial infarction using Mann-Whitney U test, Kruskal-Wallis test and Spearman's rank correlation. On average, the respondents had 54% correct answers on the test and identified 46% of the ECGs indicating acute myocardial infarction. The median total score was 9 of 16 (interquartile range 7-11) and 1 of 3 (IQR 1-2) in infarction points. No correlation between ECG interpretation skill and factors such as education and professional experience was found, except that coronary care unit experience was associated with better results on the ECG test. Ambulance nurses have deficiencies in their ECG interpretation skills. This also applies to conditions where the ambulance crew has great potential to improve the outcome of the patient's health, such as myocardial infarction and cardiac arrest. Neither education, extensive experience in ambulance service nor in nursing contributed to an improved result. The only factor of importance for higher ECG interpretation knowledge was prior experience of working in a coronary care unit. © The European Society of Cardiology 2014.
Middle Grade Students' Interpretations of Contourmaps
ERIC Educational Resources Information Center
Carter, Glenda; Cook, Michelle; Park, John C.; Wiebe, Eric N.; Butler, Susan M.
2008-01-01
This study examined eighth graders' approach to three tasks implemented to assist students with learning to interpret contour maps. Students' approach to and interpretation of these three tasks were analyzed qualitatively. When students were rank ordered according to their scores on a standardized test of spatial ability, the Minnesota Paper Form…
Interpretation of the Rasch Ability and Difficulty Scales for Educational Purposes.
ERIC Educational Resources Information Center
Woodcock, Richard W.
Though many test developers have utilized item response theory in their work, few have taken advantage of the potential of item response theory for providing new interpretation procedures that accentuate the educational implications to be drawn from test scores. This paper describes several features, based upon the Rasch difficulty and ability…
ERIC Educational Resources Information Center
Sklar, Jeffrey C.; Zwick, Rebecca
2009-01-01
Proper interpretation of standardized test scores is a crucial skill for K-12 teachers and school personnel; however, many do not have sufficient knowledge of measurement concepts to appropriately interpret and communicate test results. In a recent four-year project funded by the National Science Foundation, three web-based instructional…
Discriminant Validity of the WISC-IV Culture-Language Interpretive Matrix
ERIC Educational Resources Information Center
Styck, Kara M.; Watkins, Marley W.
2014-01-01
The Culture-Language Interpretive Matrix (C-LIM) was developed to help practitioners determine the validity of test scores obtained from students who are culturally and linguistically different from the normative group of a test. The present study used an idiographic approach to investigate the diagnostic utility of the C-LIM for the Wechsler…
Aawar, Nadine; Moore, Richard; Riley, Stephen; Salek, Sam
2016-07-01
High Renal Quality of Life Profile (RQLP) scores are associated with impaired health-related quality of life; however, the clinical meaning of the scores is difficult for clinicians and healthcare planners to interpret. The aim of this study was to determine clinical significance of RQLP scores which could be used to aid clinical decision-making. The anchor-based technique (a method for categorizing numeric scores to ease interpretation) was used to develop a categorization system for the RQLP scores using a global question (GQ). The GQ scores (i.e. no effect to extremely large effect) were mapped against the RQLP scores, and intraclass correlation coefficient (ICC) was used to test their agreement. The RQLP and the GQ were administered to 260 adult patients (males = 165 and females = 95) with chronic renal failure (CRF). The mean RQLP score was 67.2, median = 61, SD = 41.5, and range 0-172. The mean GQ score was 1.74, median = 2, SD = 1.27, and range 0-4. The mean, mode, and median of the GQ scores for each RQLP score were used to devise several sets of categories of RQLP score, and the ICC test of agreement was calculated. The proposed set of RQLP score banding for adoption includes: 0-20 = no effect on patient's life (GQ = 0, n = 35); 21-51 = small effect on patient's life (GQ = 1, n = 66); 52-93 = moderate effect on patient's life (GQ = 2, n = 87); 94-134 = very large effect on patient's life (GQ = 3, n = 54); and 135-172 = extremely large effect on patient's life (GQ = 4, n = 18). The ICC coefficient for the proposed banding system was 0.80. The proposed categorization of the RQLP will aid the clinical interpretation of change in RQLP score informing treatment decision-making in routine practice.
ERIC Educational Resources Information Center
Tong, Ye; Kolen, Michael J.
2010-01-01
"Scaling" is the process of constructing a score scale that associates numbers or other ordered indicators with the performance of examinees. Scaling typically is conducted to aid users in interpreting test results. This module describes different types of raw scores and scale scores, illustrates how to incorporate various sources of…
ERIC Educational Resources Information Center
Evans, Richard M.; Surkan, Alvin J.
The recent arrival of portable computer systems with high-level language interpreters now makes it practical to rapidly develop complex testing and scoring programs. These programs permit undergraduates access, at arbitrary times, to testing as an integral part of a mastery learning strategy. Effects of introducing the computer were studied by…
Breen, Cathal; Zhu, Tingting; Bond, Raymond; Finlay, Dewar; Clifford, Gari
2016-01-01
The aim of this study is to present and evaluate the integration of a low resource JavaScript based ECG training interface (CrowdLabel) and a standardised curriculum for self-guided tuition in ECG interpretation. Participants practiced interpreting ECGs weekly using the CrowdLabel interface to assist with the learning of the traditional didactic taught course material during a 6 week training period. To determine competency students were tested during week 7. A total of 245 unique ECG cases were submitted by each student. Accuracy scores during the training period ranged from 0-59.5% (median = 33.3%). Conversely accuracy scores during the test ranged from 30 - 70% (median = 37.5%) (p < 0.05). There was no correlation between students who interpreted high numbers of ECGs during the training period and their marks obtained. CrowdLabel is shown to be a readily accessible dedicated learning platform to support ECG interpretation competency. Copyright © 2016 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Stanley, Leanne M.; Edwards, Michael C.
2016-01-01
The purpose of this article is to highlight the distinction between the reliability of test scores and the fit of psychometric measurement models, reminding readers why it is important to consider both when evaluating whether test scores are valid for a proposed interpretation and/or use. It is often the case that an investigator judges both the…
Item Response Theory Modeling of the Philadelphia Naming Test
ERIC Educational Resources Information Center
Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D.
2015-01-01
Purpose: In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating…
Comparing Standard Deviation Effects across Contexts
ERIC Educational Resources Information Center
Ost, Ben; Gangopadhyaya, Anuj; Schiman, Jeffrey C.
2017-01-01
Studies using tests scores as the dependent variable often report point estimates in student standard deviation units. We note that a standard deviation is not a standard unit of measurement since the distribution of test scores can vary across contexts. As such, researchers should be cautious when interpreting differences in the numerical size of…
The Effect of Stakes on Accountability Test Scores and Pass Rates
ERIC Educational Resources Information Center
Steedle, Jeffrey T.; Grochowalski, Joseph
2017-01-01
Students may not fully demonstrate their knowledge and skills on accountability tests if there are no stakes attached to individual performance. In that case, assessment results may not accurately reflect student achievement, so the validity of score interpretations and uses suffers. For this study, matched samples of students taking state…
Multi-group measurement invariance of the multiple sclerosis walking scale-12?
Motl, Robert W; Mullen, Sean; McAuley, Edward
2012-03-01
One primary assumption underlying the interpretation of composite multiple sclerosis walking scale-12 (MSWS-12) scores across levels of disability status is multi-group measurement invariance. This assumption was tested in the present study between samples that differed in self-reported disability status. Participants (n = 867) completed a battery of questionnaires that included the MSWS-12 and patient-determined disease step (PDDS) scale. The multi-group invariance was tested between samples that had PDDS scores of ≤2 (i.e. no mobility limitation; n = 470) and PDDS scores ≥3 (onset of mobility limitation; n = 397) using Mplus 6·0. The omnibus test of equal covariance matrices indicated that the MSWS-12 was not invariant between the two samples that differed in disability status. The source of non-invariance occurred with the initial equivalence test of the factor structure itself. We provide evidence that questions the unambiguous interpretation of scores from the MSWS-12 as a measure of walking impairment between samples of persons with multiple sclerosis who differ in disability status.
Huynh-Thu, Vân Anh; Saeys, Yvan; Wehenkel, Louis; Geurts, Pierre
2012-07-01
Univariate statistical tests are widely used for biomarker discovery in bioinformatics. These procedures are simple, fast and their output is easily interpretable by biologists but they can only identify variables that provide a significant amount of information in isolation from the other variables. As biological processes are expected to involve complex interactions between variables, univariate methods thus potentially miss some informative biomarkers. Variable relevance scores provided by machine learning techniques, however, are potentially able to highlight multivariate interacting effects, but unlike the p-values returned by univariate tests, these relevance scores are usually not statistically interpretable. This lack of interpretability hampers the determination of a relevance threshold for extracting a feature subset from the rankings and also prevents the wide adoption of these methods by practicians. We evaluated several, existing and novel, procedures that extract relevant features from rankings derived from machine learning approaches. These procedures replace the relevance scores with measures that can be interpreted in a statistical way, such as p-values, false discovery rates, or family wise error rates, for which it is easier to determine a significance level. Experiments were performed on several artificial problems as well as on real microarray datasets. Although the methods differ in terms of computing times and the tradeoff, they achieve in terms of false positives and false negatives, some of them greatly help in the extraction of truly relevant biomarkers and should thus be of great practical interest for biologists and physicians. As a side conclusion, our experiments also clearly highlight that using model performance as a criterion for feature selection is often counter-productive. Python source codes of all tested methods, as well as the MATLAB scripts used for data simulation, can be found in the Supplementary Material.
Truth and Evidence in Validity Theory
ERIC Educational Resources Information Center
Borsboom, Denny; Markus, Keith A.
2013-01-01
According to Kane (this issue), "the validity of a proposed interpretation or use depends on how well the evidence supports" the claims being made. Because truth and evidence are distinct, this means that the validity of a test score interpretation could be high even though the interpretation is false. As an illustration, we discuss the case of…
A Factor Comparison of Old and New MCAT Scales.
ERIC Educational Resources Information Center
Jones, Robert F.; Thomae-Forgues, Maria
1981-01-01
The old Medical College Admission Test (MCAT) and the New MCAT were compared by factor-analyzing the scores of a sample of 1,484 examinees who took both tests during 1976-77. Three common factors are interpreted: a general science quantitative factor, verbal ability, and interpretation skills. Variances are noted and implications of the data for…
The effect of lab based instruction on ACT science scores
NASA Astrophysics Data System (ADS)
Hamilton, Michelle
Standardized tests, although unpopular, are required for a multitude of reasons. One of these tests is the ACT. The ACT is a college readiness test that many high school juniors take to gain college admittance. Students throughout the United States are unprepared for this assessment. The average high school junior is three points behind twenty-four, the ACT recommended score, for the science section. The science section focuses on reading text and, interpreting graphs, charts, tables and diagrams with an emphasis on experimental design and relationships among variables. For students to become better at interpreting and understanding scientific graphics they must have vast experience developing their own graphics. The purpose of this study was to provide students the opportunity to generate their own graphics to master interpretation of them on the ACT. According to a t-test the results show that students who are continually exposed to creating graphs are able to understand and locate information from graphs at a significantly faster rate.
Barthelemy, Francois X; Segard, Julien; Fradin, Philippe; Hourdin, Nicolas; Batard, Eric; Pottier, Pierre; Potel, Gilles; Montassier, Emmanuel
2017-04-01
ECG interpretation is a pivotal skill to acquire during residency, especially for Emergency Department (ED) residents. Previous studies reported that ECG interpretation competency among residents was rather low. However, the optimal resource to improve ECG interpretation skills remains unclear. The aim of our study was to compare two teaching modalities to improve the ECG interpretation skills of ED residents: e-learning and lecture-based courses. The participants were first-year and second-year ED residents, assigned randomly to the two groups. The ED residents were evaluated by means of a precourse test at the beginning of the study and a postcourse test after the e-learning and lecture-based courses. These evaluations consisted of the interpretation of 10 different ECGs. We included 39 ED residents from four different hospitals. The precourse test showed that the overall average score of ECG interpretation was 40%. Nineteen participants were then assigned to the e-learning course and 20 to the lecture-based course. Globally, there was a significant improvement in ECG interpretation skills (accuracy score=55%, P=0.0002). However, this difference was not significant between the two groups (P=0.14). Our findings showed that the ECG interpretation was not optimal and that our e-learning program may be an effective tool for enhancing ECG interpretation skills among ED residents. A large European study should be carried out to evaluate ECG interpretation skills among ED residents before the implementation of ECG learning, including e-learning strategies, during ED residency.
Reynolds, Matthew R
2013-03-01
The linear loadings of intelligence test composite scores on a general factor (g) have been investigated recently in factor analytic studies. Spearman's law of diminishing returns (SLODR), however, implies that the g loadings of test scores likely decrease in magnitude as g increases, or they are nonlinear. The purpose of this study was to (a) investigate whether the g loadings of composite scores from the Differential Ability Scales (2nd ed.) (DAS-II, C. D. Elliott, 2007a, Differential Ability Scales (2nd ed.). San Antonio, TX: Pearson) were nonlinear and (b) if they were nonlinear, to compare them with linear g loadings to demonstrate how SLODR alters the interpretation of these loadings. Linear and nonlinear confirmatory factor analysis (CFA) models were used to model Nonverbal Reasoning, Verbal Ability, Visual Spatial Ability, Working Memory, and Processing Speed composite scores in four age groups (5-6, 7-8, 9-13, and 14-17) from the DAS-II norming sample. The nonlinear CFA models provided better fit to the data than did the linear models. In support of SLODR, estimates obtained from the nonlinear CFAs indicated that g loadings decreased as g level increased. The nonlinear portion for the nonverbal reasoning loading, however, was not statistically significant across the age groups. Knowledge of general ability level informs composite score interpretation because g is less likely to produce differences, or is measured less, in those scores at higher g levels. One implication is that it may be more important to examine the pattern of specific abilities at higher general ability levels. PsycINFO Database Record (c) 2013 APA, all rights reserved.
The MMPI Assistant: A Microcomputer Based Expert System to Assist in Interpreting MMPI Profiles
Tanner, Barry A.
1989-01-01
The Assistant is an MS DOS program to aid clinical psychologists in interpreting the results of the Minnesota Multiphasic Personality Inventory (MMPI). Interpretive hypotheses are based on the professional literature and the author's experience. After scores are entered manually, the Assistant produces a hard copy which is intended for use by a psychologist knowledgeable about the MMPI. The rules for each hypothesis appear first on the monitor, and then in the printed output, followed by the patient's scores on the relevant scales, and narrative hypotheses for the scores. The data base includes hypotheses for 23 validity configurations, 45 two-point clinical codes, 10 high scoring single-point clinical scales, and 10 low scoring single-point clinical scales. The program can accelerate the production of test reports, while insuring that actuarial rules are not overlooked. It has been especially useful as a teaching tool with graduate students. The Assistant requires an IBM PC compatible with 128k available memory, DOS 2.x or higher, and a printer.
ERIC Educational Resources Information Center
Boote, Stacy K.
2014-01-01
This study examined how 12- and 13-year-old students' mathematics and science background knowledge affected line graph interpretations and how interpretations were affected by graph question levels. A purposive sample of 14 students engaged in think aloud interviews while completing an excerpted Test of Graphing in Science. Data were…
Pessoa, Rebeca Rodrigues; Araújo, Sarah Cueva Cândido Soares de; Isotani, Selma Mie; Puccini, Rosana Fiorini; Perissinoto, Jacy
To assess the development of language regarding the ability to recognize and interpret lexical ambiguity in low-birth-weight schoolchildren enrolled at the school system in the municipality of Embu das Artes, Sao Paulo state, compared with that of schoolchildren with normal birth weight. A case-control, retrospective, cross-sectional study conducted with 378 schoolchildren, both genders, aged 5 to 9.9 years, from the municipal schools of Embu das Artes. Study Group (SG) comprising 210 schoolchildren with birth weight < 2500 g. Control Group (CG) composed of 168 school children with birth weight ≥ 2500 g. Participants of both groups were compared with respect to the skills of recognition and verbal interpretation of sentences containing lexical ambiguity using the Test of Language Competence. Variables of interest: Age and gender of children; age and schooling of mothers. Statistical analysis: Descriptive analysis to characterize the sample and score per group; Student's t test for comparison between the total scores of each skill/subtest; Chi-square test to compare items within each subtest; multiple regression analysis for the intervening variables. Participants of the SG presented lower scores for ambiguous sentences compared with those of participants of the CG. Multiple regression analysis showed that child's current age was a predictor for all metalinguistic skills regarding interpretation of ambiguities in both groups. Participants of the SG presented lower specific and total scores than those of participants of the CG for ambiguity skills. The child's current age factor positively influenced the ambiguity skills in both groups.
The Composition of Normative Groups and Diagnostic Decision Making: Shooting Ourselves in the Foot
ERIC Educational Resources Information Center
Pena, Elizabeth D.; Spaulding, Tammie J.; Plante, Elena
2006-01-01
Purpose: The normative group of a norm-referenced test is intended to provide a basis for interpreting test scores. However, the composition of the normative group may facilitate or impede different types of diagnostic interpretations. This article considers who should be included in a normative sample and how this decision must be made relative…
Montassier, Emmanuel; Hardouin, Jean-Benoît; Segard, Julien; Batard, Eric; Potel, Gilles; Planchon, Bernard; Trochu, Jean-Noël; Pottier, Pierre
2016-04-01
An ECG is pivotal for the diagnosis of coronary heart disease. Previous studies have reported deficiencies in ECG interpretation skills that have been responsible for misdiagnosis. However, the optimal way to acquire ECG interpretation skills is still under discussion. Thus, our objective was to compare the effectiveness of e-learning and lecture-based courses for learning ECG interpretation skills in a large randomized study. We conducted a prospective, randomized, controlled, noninferiority study. Participants were recruited from among fifth-year medical students and were assigned to the e-learning group or the lecture-based group using a computer-generated random allocation sequence. The e-learning and lecture-based groups were compared on a score of effectiveness, comparing the 95% unilateral confidence interval (95% UCI) of the score of effectiveness with the mean effectiveness in the lecture-based group, adjusted for a noninferiority margin. Ninety-eight students were enrolled. As compared with the lecture-based course, e-learning was noninferior with regard to the postcourse test score (15.1; 95% UCI 14.2; +∞), which can be compared with 12.5 [the mean effectiveness in the lecture-based group (15.0) minus the noninferiority margin (2.5)]. Furthermore, there was a significant increase in the test score points in both the e-learning and lecture-based groups during the study period (both P<0.0001). Our randomized study showed that the e-learning course is an effective tool for the acquisition of ECG interpretation skills by medical students. These preliminary results should be confirmed with further multicenter studies before the implementation of e-learning courses for learning ECG interpretation skills during medical school.
A contemporary approach to validity arguments: a practical guide to Kane's framework.
Cook, David A; Brydges, Ryan; Ginsburg, Shiphra; Hatala, Rose
2015-06-01
Assessment is central to medical education and the validation of assessments is vital to their use. Earlier validity frameworks suffer from a multiplicity of types of validity or failure to prioritise among sources of validity evidence. Kane's framework addresses both concerns by emphasising key inferences as the assessment progresses from a single observation to a final decision. Evidence evaluating these inferences is planned and presented as a validity argument. We aim to offer a practical introduction to the key concepts of Kane's framework that educators will find accessible and applicable to a wide range of assessment tools and activities. All assessments are ultimately intended to facilitate a defensible decision about the person being assessed. Validation is the process of collecting and interpreting evidence to support that decision. Rigorous validation involves articulating the claims and assumptions associated with the proposed decision (the interpretation/use argument), empirically testing these assumptions, and organising evidence into a coherent validity argument. Kane identifies four inferences in the validity argument: Scoring (translating an observation into one or more scores); Generalisation (using the score[s] as a reflection of performance in a test setting); Extrapolation (using the score[s] as a reflection of real-world performance), and Implications (applying the score[s] to inform a decision or action). Evidence should be collected to support each of these inferences and should focus on the most questionable assumptions in the chain of inference. Key assumptions (and needed evidence) vary depending on the assessment's intended use or associated decision. Kane's framework applies to quantitative and qualitative assessments, and to individual tests and programmes of assessment. Validation focuses on evaluating the key claims, assumptions and inferences that link assessment scores with their intended interpretations and uses. The Implications and associated decisions are the most important inferences in the validity argument. © 2015 John Wiley & Sons Ltd.
Chevalier, Thérèse M.; Stewart, Garth; Nelson, Monty; McInerney, Robert J.; Brodie, Norman
2016-01-01
It has been well documented that IQ scores calculated using Canadian norms are generally 2–5 points lower than those calculated using American norms on the Wechsler IQ scales. However, recent findings have demonstrated that the difference may be significantly larger for individuals with certain demographic characteristics, and this has prompted discussion about the appropriateness of using the Canadian normative system with a clinical population in Canada. This study compared the interpretive effects of applying the American and Canadian normative systems in a clinical sample. We used a multivariate analysis of variance (ANOVA) to calculate differences between IQ and Index scores in a clinical sample, and mixed model ANOVAs to assess the pattern of differences across age and ability level. As expected, Full Scale IQ scores calculated using Canadian norms were systematically lower than those calculated using American norms, but differences were significantly larger for individuals classified as having extremely low or borderline intellectual functioning when compared with those who scored in the average range. Implications of clinically different conclusions for up to 52.8% of patients based on these discrepancies highlight a unique dilemma facing Canadian clinicians, and underscore the need for caution when choosing a normative system with which to interpret WAIS-IV results in the context of a neuropsychological test battery in Canada. Based on these findings, we offer guidelines for best practice for Canadian clinicians when interpreting data from neuropsychological test batteries that include different normative systems, and suggestions to assist with future test development. PMID:27246955
A "Nonbiased Assessment" of Intelligence Testing.
ERIC Educational Resources Information Center
Vandivier, Phillip L.; Vandivier, Stella Sue
1979-01-01
Arguments and prejudices against the use of individually administered intelligence tests are considered and compared with possible values that may be obtained. Cautions about test score interpretation are discussed. Implications of abolishing intelligence testing are considered and recommendations for effective testing policies are presented. (CTM)
Consistency of response and image recognition, pulmonary nodules
Liu, M A Q; Galvan, E; Bassett, R; Murphy, W A; Matamoros, A; Marom, E M
2014-01-01
Objective: To investigate the effect of recognition of a previously encountered radiograph on consistency of response in localized pulmonary nodules. Methods: 13 radiologists interpreted 40 radiographs each to locate pulmonary nodules. A few days later, they again interpreted 40 radiographs. Half of the images in the second set were new. We asked the radiologists whether each image had been in the first set. We used Fisher's exact test and Kruskal–Wallis test to evaluate the correlation between recognition of an image and consistency in its interpretation. We evaluated the data using all possible recognition levels—definitely, probably or possibly included vs definitely, probably or possibly not included by collapsing the recognition levels into two and by eliminating the “possibly included” and “possibly not included” scores. Results: With all but one of six methods of looking at the data, there was no significant correlation between consistency in interpretation and recognition of the image. When the possibly included and possibly not included scores were eliminated, there was a borderline statistical significance (p = 0.04) with slightly greater consistency in interpretation of recognized than that of non-recognized images. Conclusion: We found no convincing evidence that radiologists' recognition of images in an observer performance study affects their interpretation on a second encounter. Advances in knowledge: Conscious recognition of chest radiographs did not result in a greater degree of consistency in the tested interpretation than that in the interpretation of images that were not recognized. PMID:24697724
The criterion and discriminant validity of the Referential Thinking (REF) scale.
Startup, Mike; Sakrouge, Rebecca; Mason, Oliver J
2010-03-01
The Referential Thinking (REF) scale was designed to be a comprehensive self-report measure of both simple and guilty ideas of reference in the general population. One aim of the present study was to test the proposed interpretations of REF scores by comparing REF scores with ratings of delusions among psychotic patients. A 2nd aim was to test whether REF scores are better predicted by the severity of patients' delusions of reference (DoRs) than by the severity of their auditory verbal hallucinations (AVHs), thus supporting the scores' ability to discriminate between proneness to the 2 different symptoms. The REF scale was completed by 56 healthy controls and 53 acutely psychotic patients. The severity of the patients' DoRs and AVHs were assessed in structured clinical interviews. REF scores differed significantly not only between the patients and controls but also between patients with versus without DoRs. REF scores correlated significantly with the severity of the patients' DoRs but not their AVHs. The interpretation of REF scores as a measure of proneness to simple and guilty ideas of reference was supported. PsycINFO Database Record (c) 2010 APA, all rights reserved.
Standard Errors and Confidence Intervals of Norm Statistics for Educational and Psychological Tests.
Oosterhuis, Hannah E M; van der Ark, L Andries; Sijtsma, Klaas
2016-11-14
Norm statistics allow for the interpretation of scores on psychological and educational tests, by relating the test score of an individual test taker to the test scores of individuals belonging to the same gender, age, or education groups, et cetera. Given the uncertainty due to sampling error, one would expect researchers to report standard errors for norm statistics. In practice, standard errors are seldom reported; they are either unavailable or derived under strong distributional assumptions that may not be realistic for test scores. We derived standard errors for four norm statistics (standard deviation, percentile ranks, stanine boundaries and Z-scores) under the mild assumption that the test scores are multinomially distributed. A simulation study showed that the standard errors were unbiased and that corresponding Wald-based confidence intervals had good coverage. Finally, we discuss the possibilities for applying the standard errors in practical test use in education and psychology. The procedure is provided via the R function check.norms, which is available in the mokken package.
ERIC Educational Resources Information Center
Linn, Robert L.; And Others
Norm-referenced test results reported by states and school districts and factors related to those scores were studied through mail and telephone surveys of 35 states and a nationally representative sample of 153 school districts to determine the degree to which "above average" results were being reported. Part of the stimulus for this…
Meijer, Rob R; Niessen, A Susan M; Tendeiro, Jorge N
2016-02-01
Although there are many studies devoted to person-fit statistics to detect inconsistent item score patterns, most studies are difficult to understand for nonspecialists. The aim of this tutorial is to explain the principles of these statistics for researchers and clinicians who are interested in applying these statistics. In particular, we first explain how invalid test scores can be detected using person-fit statistics; second, we provide the reader practical examples of existing studies that used person-fit statistics to detect and to interpret inconsistent item score patterns; and third, we discuss a new R-package that can be used to identify and interpret inconsistent score patterns. © The Author(s) 2015.
Houssaini, Allal; Assoumou, Lambert; Miller, Veronica; Calvez, Vincent; Marcelin, Anne-Geneviève; Flandre, Philippe
2013-01-01
Background Several attempts have been made to determine HIV-1 resistance from genotype resistance testing. We compare scoring methods for building weighted genotyping scores and commonly used systems to determine whether the virus of a HIV-infected patient is resistant. Methods and Principal Findings Three statistical methods (linear discriminant analysis, support vector machine and logistic regression) are used to determine the weight of mutations involved in HIV resistance. We compared these weighted scores with known interpretation systems (ANRS, REGA and Stanford HIV-db) to classify patients as resistant or not. Our methodology is illustrated on the Forum for Collaborative HIV Research didanosine database (N = 1453). The database was divided into four samples according to the country of enrolment (France, USA/Canada, Italy and Spain/UK/Switzerland). The total sample and the four country-based samples allow external validation (one sample is used to estimate a score and the other samples are used to validate it). We used the observed precision to compare the performance of newly derived scores with other interpretation systems. Our results show that newly derived scores performed better than or similar to existing interpretation systems, even with external validation sets. No difference was found between the three methods investigated. Our analysis identified four new mutations associated with didanosine resistance: D123S, Q207K, H208Y and K223Q. Conclusions We explored the potential of three statistical methods to construct weighted scores for didanosine resistance. Our proposed scores performed at least as well as already existing interpretation systems and previously unrecognized didanosine-resistance associated mutations were identified. This approach could be used for building scores of genotypic resistance to other antiretroviral drugs. PMID:23555613
ERIC Educational Resources Information Center
Freund, Philipp Alexander; Holling, Heinz
2011-01-01
The interpretation of retest scores is problematic because they are potentially affected by measurement and predictive bias, which impact construct validity, and because their size differs as a function of various factors. This paper investigates the construct stability of scores on a figural matrices test and models retest effects at the level of…
Developmental Levels of Processing in Metaphor Interpretation.
ERIC Educational Resources Information Center
Johnson, Janice; Pascual-Leone, Juan
1989-01-01
Outlines a theory of metaphor that posits varying levels of semantic processing and formalizes the levels in terms of kinds of semantic mapping operators. Predicted complexity of semantic mapping operators was tested using metaphor interpretations of 204 children aged 6-12 years and 24 adults. Processing score increased predictably with age. (SAK)
Adequate proverb interpretation is associated with performance on the independent living scales.
Ahmed, Fayeza S; Miller, L Stephen
2015-01-01
The purpose of this study was to examine proverb interpretation performance and functional independence in older adults. From the limited literature on proverb interpretation in aging and its conceptualization as an executive function, it was hypothesized that proverb interpretation would be related to functional independence similar to other executive functions. Tests of proverb interpretation, additional executive functions, and functional ability were administered to nondemented older adults. Results showed that proverb interpretation accounted for a significant amount of unique variance of functional ability scores. This supports including a measure of proverb interpretation to the assessment of older adults.
Effort, symptom validity testing, performance validity testing and traumatic brain injury.
Bigler, Erin D
2014-01-01
To understand the neurocognitive effects of brain injury, valid neuropsychological test findings are paramount. This review examines the research on what has been referred to a symptom validity testing (SVT). Above a designated cut-score signifies a 'passing' SVT performance which is likely the best indicator of valid neuropsychological test findings. Likewise, substantially below cut-point performance that nears chance or is at chance signifies invalid test performance. Significantly below chance is the sine qua non neuropsychological indicator for malingering. However, the interpretative problems with SVT performance below the cut-point yet far above chance are substantial, as pointed out in this review. This intermediate, border-zone performance on SVT measures is where substantial interpretative challenges exist. Case studies are used to highlight the many areas where additional research is needed. Historical perspectives are reviewed along with the neurobiology of effort. Reasons why performance validity testing (PVT) may be better than the SVT term are reviewed. Advances in neuroimaging techniques may be key in better understanding the meaning of border zone SVT failure. The review demonstrates the problems with rigidity in interpretation with established cut-scores. A better understanding of how certain types of neurological, neuropsychiatric and/or even test conditions may affect SVT performance is needed.
NASA Astrophysics Data System (ADS)
Peterman, Karen; Cranston, Kayla A.; Pryor, Marie; Kermish-Allen, Ruth
2015-11-01
This case study was conducted within the context of a place-based education project that was implemented with primary school students in the USA. The authors and participating teachers created a performance assessment of standards-aligned tasks to examine 6-10-year-old students' graph interpretation skills as part of an exploratory research project. Fifty-five students participated in a performance assessment interview at the beginning and end of a place-based investigation. Two forms of the assessment were created and counterbalanced within class at pre and post. In situ scoring was conducted such that responses were scored as correct versus incorrect during the assessment's administration. Criterion validity analysis demonstrated an age-level progression in student scores. Tests of discriminant validity showed that the instrument detected variability in interpretation skills across each of three graph types (line, bar, dot plot). Convergent validity was established by correlating in situ scores with those from the Graph Interpretation Scoring Rubric. Students' proficiency with interpreting different types of graphs matched expectations based on age and the standards-based progression of graphs across primary school grades. The assessment tasks were also effective at detecting pre-post gains in students' interpretation of line graphs and dot plots after the place-based project. The results of the case study are discussed in relation to the common challenges associated with performance assessment. Implications are presented in relation to the need for authentic and performance-based instructional and assessment tasks to respond to the Common Core State Standards and the Next Generation Science Standards.
Normative data for the Maryland CNC Test.
Mendel, Lisa Lucks; Mustain, William D; Magro, Jessica
2014-09-01
The Maryland consonant-vowel nucleus-consonant (CNC) Test is routinely used in Veterans Administration medical centers, yet there is a paucity of published normative data for this test. The purpose of this study was to provide information on the means and distribution of word-recognition scores on the Maryland CNC Test as a function of degree of hearing loss for a veteran population. A retrospective, descriptive design was conducted. The sample consisted of records from veterans who had Compensation and Pension (C&P) examinations at a Veterans Administration medical center (N = 1,760 ears). Audiometric records of veterans who had C&P examinations during a 10 yr period were reviewed, and the pure-tone averages (PTA4) at four frequencies (1000, 2000, 3000, and 4000 Hz) were documented. The maximum word-recognition score (PBmax) was determined from the performance-intensity functions obtained using the Maryland CNC Test. Correlations were made between PBmax and PTA4. A wide range of word-recognition scores were obtained at all levels of PTA4 for this population. In addition, a strong negative correlation between the PBmax and the PTA4 was observed, indicating that as PTA4 increased, PBmax decreased. Word-recognition scores decreased significantly as hearing loss increased beyond a mild hearing loss. Although threshold was influenced by age, no statistically significant relationship was found between word-recognition score and the age of the participants. RESULTS from this study provide normative data in table and figure format to assist audiologists in interpreting patient results on the Maryland CNC test for a veteran population. These results provide a quantitative method for audiologists to use to interpret word-recognition scores based on pure-tone hearing loss. American Academy of Audiology.
Contrasting OLS and Quantile Regression Approaches to Student "Growth" Percentiles
ERIC Educational Resources Information Center
Castellano, Katherine Elizabeth; Ho, Andrew Dean
2013-01-01
Regression methods can locate student test scores in a conditional distribution, given past scores. This article contrasts and clarifies two approaches to describing these locations in terms of readily interpretable percentile ranks or "conditional status percentile ranks." The first is Betebenner's quantile regression approach that results in…
Socioeconomic Status and MMPI-2 Interpretation.
ERIC Educational Resources Information Center
Long, Kathleen A.; And Others
1994-01-01
Examined differences in Minnesota Multiphasic Personality Inventory-2 (MMPI-2) scores between persons of differing educational levels and family income in the MMPI-2 normative sample to determine if MMPI-2 scores are differentially accurate in predicting relevant extra-test characteristics of persons of differing socioeconomic levels. MMPI-2…
Ho, Andrew D; Yu, Carol C
2015-06-01
Many statistical analyses benefit from the assumption that unconditional or conditional distributions are continuous and normal. More than 50 years ago in this journal, Lord and Cook chronicled departures from normality in educational tests, and Micerri similarly showed that the normality assumption is met rarely in educational and psychological practice. In this article, the authors extend these previous analyses to state-level educational test score distributions that are an increasingly common target of high-stakes analysis and interpretation. Among 504 scale-score and raw-score distributions from state testing programs from recent years, nonnormal distributions are common and are often associated with particular state programs. The authors explain how scaling procedures from item response theory lead to nonnormal distributions as well as unusual patterns of discreteness. The authors recommend that distributional descriptive statistics be calculated routinely to inform model selection for large-scale test score data, and they illustrate consequences of nonnormality using sensitivity studies that compare baseline results to those from normalized score scales.
Evidence of Construct Validity in Published Achievement Tests.
ERIC Educational Resources Information Center
Nolet, Victor; Tindal, Gerald
Valid interpretation of test scores is the shared responsibility of the test designer and the test user. Test publishers must provide evidence of the validity of the decisions their tests are intended to support, while test users are responsible for analyzing this evidence and subsequently using the test in the manner indicated by the publisher.…
Volumetric CT-images improve testing of radiological image interpretation skills.
Ravesloot, Cécile J; van der Schaaf, Marieke F; van Schaik, Jan P J; ten Cate, Olle Th J; van der Gijp, Anouk; Mol, Christian P; Vincken, Koen L
2015-05-01
Current radiology practice increasingly involves interpretation of volumetric data sets. In contrast, most radiology tests still contain only 2D images. We introduced a new testing tool that allows for stack viewing of volumetric images in our undergraduate radiology program. We hypothesized that tests with volumetric CT-images enhance test quality, in comparison with traditional completely 2D image-based tests, because they might better reflect required skills for clinical practice. Two groups of medical students (n=139; n=143), trained with 2D and volumetric CT-images, took a digital radiology test in two versions (A and B), each containing both 2D and volumetric CT-image questions. In a questionnaire, they were asked to comment on the representativeness for clinical practice, difficulty and user-friendliness of the test questions and testing program. Students' test scores and reliabilities, measured with Cronbach's alpha, of 2D and volumetric CT-image tests were compared. Estimated reliabilities (Cronbach's alphas) were higher for volumetric CT-image scores (version A: .51 and version B: .54), than for 2D CT-image scores (version A: .24 and version B: .37). Participants found volumetric CT-image tests more representative of clinical practice, and considered them to be less difficult than volumetric CT-image questions. However, in one version (A), volumetric CT-image scores (M 80.9, SD 14.8) were significantly lower than 2D CT-image scores (M 88.4, SD 10.4) (p<.001). The volumetric CT-image testing program was considered user-friendly. This study shows that volumetric image questions can be successfully integrated in students' radiology testing. Results suggests that the inclusion of volumetric CT-images might improve the quality of radiology tests by positively impacting perceived representativeness for clinical practice and increasing reliability of the test. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
A new IRT-based standard setting method: application to eCat-listening.
García, Pablo Eduardo; Abad, Francisco José; Olea, Julio; Aguado, David
2013-01-01
Criterion-referenced interpretations of tests are highly necessary, which usually involves the difficult task of establishing cut scores. Contrasting with other Item Response Theory (IRT)-based standard setting methods, a non-judgmental approach is proposed in this study, in which Item Characteristic Curve (ICC) transformations lead to the final cut scores. eCat-Listening, a computerized adaptive test for the evaluation of English Listening, was administered to 1,576 participants, and the proposed standard setting method was applied to classify them into the performance standards of the Common European Framework of Reference for Languages (CEFR). The results showed a classification closely related to relevant external measures of the English language domain, according to the CEFR. It is concluded that the proposed method is a practical and valid standard setting alternative for IRT-based tests interpretations.
Storey, Jennifer E; Hart, Stephen D; Cooke, David J; Michie, Christine
2016-04-01
The Hare Psychopathy Checklist-Revised (PCL-R; Hare, 2003) is a commonly used psychological test for assessing traits of psychopathic personality disorder. Despite the abundance of research using the PCL-R, the vast majority of research used samples of convenience rather than systematic methods to minimize sampling bias and maximize the generalizability of findings. This potentially complicates the interpretation of test scores and research findings, including the "norms" for offenders from the United States and Canada included in the PCL-R manual. In the current study, we evaluated the psychometric properties of PCL-R scores for all male offenders admitted to a regional reception center of the Correctional Service of Canada during a 1-year period (n = 375). Because offenders were admitted for assessment prior to institutional classification, they comprise a sample that was heterogeneous with respect to correctional risks and needs yet representative of all offenders in that region of the service. We examined the distribution of PCL-R scores, classical test theory indices of its structural reliability, the factor structure of test items, and the external correlates of test scores. The findings were highly consistent with those typically reported in previous studies. We interpret these results as indicating it is unlikely any sampling limitations of past research using the PCL-R resulted in findings that were, overall, strongly biased or unrepresentative. (c) 2016 APA, all rights reserved).
ERIC Educational Resources Information Center
Domenech, Daniel A.
2000-01-01
The question of validity, or how high-stakes tests are being used and interpreted, threatens to undermine the entire standards movement. Joint standards developed by three professional associations say decisions affecting students' life chances should not be based on test scores alone. Objectivity and teaching to tests are real concerns. (MLH)
Barth, A; Küfferle, B
2001-11-01
Concretism is considered an important aspect of schizophrenic thought disorder. Traditionally it is measured using the method of proverb interpretation, in which metaphoric proverbs are presented with the request that the subject tell its meaning. Interpretations are recorded and scored on concretistic tendencies. However, this method has two problems: its reliability is doubtful and it is rather complicated to perform. In this paper, a new version of a multiple choice proverb test is presented which can solve these problems in a reliable and economic manner. Using the new test, it is has been shown that schizophrenic patients have greater deficits in proverb interpretation than depressive patients.
Esmaeili, Alireza; Stewart, Andrew M; Hopkins, William G; Elias, George P; Lazarus, Brendan H; Rowell, Amber E; Aughey, Robert J
2018-01-01
Aim: The sit and reach test (S&R), dorsiflexion lunge test (DLT), and adductor squeeze test (AST) are commonly used in weekly musculoskeletal screening for athlete monitoring and injury prevention purposes. The aim of this study was to determine the normal week to week variability of the test scores, individual differences in variability, and the effects of training load on the scores. Methods: Forty-four elite Australian rules footballers from one club completed the weekly screening tests on day 2 or 3 post-main training (pre-season) or post-match (in-season) over a 10 month season. Ratings of perceived exertion and session duration for all training sessions were used to derive various measures of training load via both simple summations and exponentially weighted moving averages. Data were analyzed via linear and quadratic mixed modeling and interpreted using magnitude-based inference. Results: Substantial small to moderate variability was found for the tests at both season phases; for example over the in-season, the normal variability ±90% confidence limits were as follows: S&R ±1.01 cm, ±0.12; DLT ±0.48 cm, ±0.06; AST ±7.4%, ±0.6%. Small individual differences in variability existed for the S&R and AST (factor standard deviations between 1.31 and 1.66). All measures of training load had trivial effects on the screening scores. Conclusion: A change in a test score larger than the normal variability is required to be considered a true change. Athlete monitoring and flagging systems need to account for the individual differences in variability. The tests are not sensitive to internal training load when conducted 2 or 3 days post-training or post-match, and the scores should be interpreted cautiously when used as measures of recovery.
Predicting clinical concussion measures at baseline based on motivation and academic profile.
Trinidad, Katrina J; Schmidt, Julianne D; Register-Mihalik, Johna K; Groff, Diane; Goto, Shiho; Guskiewicz, Kevin M
2013-11-01
The purpose of this study was to predict baseline neurocognitive and postural control performance using a measure of motivation, high school grade point average (hsGPA), and Scholastic Aptitude Test (SAT) score. Cross-sectional. Clinical research center. Eighty-eight National Collegiate Athletic Association Division I incoming student-athletes (freshman and transfers). Participants completed baseline clinical concussion measures, including a neurocognitive test battery (CNS Vital Signs), a balance assessment [Sensory Organization Test (SOT)], and motivation testing (Rey Dot Counting). Participants granted permission to access hsGPA and SAT total score. Standard scores for each CNS Vital Signs domain and SOT composite score. Baseline motivation, hsGPA, and SAT explained a small percentage of the variance of complex attention (11%), processing speed (12%), and composite SOT score (20%). Motivation, hsGPA, and total SAT score do not explain a significant amount of the variance in neurocognitive and postural control measures but may still be valuable to consider when interpreting neurocognitive and postural control measures.
Clarifying the Consensus Definition of Validity
ERIC Educational Resources Information Center
Newton, Paul E.
2012-01-01
The 1999 "Standards for Educational and Psychological Testing" defines validity as the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests. Although quite explicit, there are ways in which this definition lacks precision, consistency, and clarity. The history of validity has taught us…
Teachers' Interpretations of Exit Exam Scores and College Readiness
ERIC Educational Resources Information Center
McIntosh, Shelby
2013-01-01
This study examined teachers' interpretations of Virginia's high school exit exam policy through the teachers' responses to a survey. The survey was administered to teachers from one school district in Northern Virginia. The teachers selected for the survey taught a subject in which students must pass a Standards of Learning (SOL) test in order to…
Hasbún Avalos, Oswaldo; Pennington, Kaylin; Osterberg, Lars
2013-12-01
In our ever-increasingly multicultural, multilingual society, medical interpreters serve an important role in the provision of care. Though it is known that using untrained interpreters leads to decreased quality of care for limited English proficiency patients, because of a short supply of professionals and a lack of formalized, feasible education programs for volunteers, community health centers and internal medicine practices continue to rely on untrained interpreters. To develop and formally evaluate a novel medical interpreter education program that encompasses major tenets of interpretation, tailored to the needs of volunteer medical interpreters. One-armed, quasi-experimental retro-pre-post study using survey ratings and feedback correlated by assessment scores to determine educational intervention effects. Thirty-eight students; 24 Spanish, nine Mandarin, and five Vietnamese. The majority had prior interpreting experience but no formal medical interpreter training. Students completed retrospective pre-test and post-test surveys measuring confidence in and perceived knowledge of key skills of interpretation. Primary outcome measures were a 10-point Likert scale for survey questions of knowledge, skills, and confidence, written and oral assessments of interpreter skills, and qualitative evidence of newfound knowledge in written reflections. Analyses showed a statistically significant (P <0.001) change of about two points in mean self-ratings on knowledge, skills, and confidence, with large effect sizes (d > 0.8). The second half of the program was also quantitatively and qualitatively shown to be a vital learning experience, resulting in 18 % more students passing the oral assessments; a 19 % increase in mean scores for written assessments; and a newfound understanding of interpreter roles and ways to navigate them. This innovative program was successful in increasing volunteer interpreters' skills and knowledge of interpretation, as well as confidence in own abilities. Additionally, the program effectively taught how to navigate the roles of the interpreter to maintain clear communication.
Habets, Petra; Jeandarme, Inge; Uzieblo, Kasia; Oei, Karel; Bogaerts, Stefan
2015-05-01
A stable assessment of cognition is of paramount importance for forensic psychiatric patients (FPP). The purpose of this study was to compare repeated measures of IQ scores in FPPs with and without intellectual disability. Repeated measurements of IQ scores in FPPs (n = 176) were collected. Differences between tests were computed, and each IQ score was categorized. Additionally, t-tests and regression analyses were performed. Differences of 10 points or more were found in 66% of the cases comparing WAIS-III with RAVEN scores. Fisher's exact test revealed differences between two WAIS-III scores and the WAIS categories. The WAIS-III did not predict other IQs (WAIS or RAVEN) in participants with intellectual disability. This study showed that stability or interchangeability of scores is lacking, especially in individuals with intellectual disability. Caution in interpreting IQ scores is therefore recommended, and the use of the unitary concept of IQ should be discouraged. © 2014 John Wiley & Sons Ltd.
ERIC Educational Resources Information Center
Teglasi, Hedwig
This book provides guidance into the use of storytelling techniques as an approach to personality assessment and explains how to administer, score, and interpret such tests. The tests discussed include the Thematic Apperception Test (TAT), the Roberts Apperception Test for Children, and the TEMAS (Tell-Me-a-Story). Each chapter contains callout…
Identifying Aberrant Responding: Use of Multiple Measures
ERIC Educational Resources Information Center
Steinkamp, Susan Christa
2017-01-01
For test scores that rely on the accurate estimation of ability via an IRT model, their use and interpretation is dependent upon the assumption that the IRT model fits the data. Examinees who do not put forth full effort in answering test questions, have prior knowledge of test content, or do not approach a test with the intent of answering…
Assessment of American Indian Children as Measured by the SON-R and WISC-III.
ERIC Educational Resources Information Center
Curran, Lisa; And Others
A major criticism of standardized intelligence tests is their improper use in measuring the intellectual competence of culturally diverse children. Factors which complicate the issue are the definition of intelligence, content bias in intelligence tests, and the interpretation of test scores between white middle class children and children of…
Test Review: An Interview with Amy Gabel--About the WISC-V
ERIC Educational Resources Information Center
Greathouse, Dan; Shaughnessy, Michael F.
2016-01-01
Whenever a major intelligence or achievement test is revised, there is always renewed interest in the underlying structure of the test as well as a renewed interest in the scoring, administration, and interpretation changes. In this interview, Amy Gabel discusses the most recent revision of the "Wechsler Intelligence Scale for Children-Fifth…
ERIC Educational Resources Information Center
Patalino, Marianne
Problems in current course evaluation methods are discussed and an alternative method is described for the construction, analysis, and interpretation of a test to evaluate instructional programs. The method presented represents a different approach to the traditional overreliance on standardized achievement tests and the total scores they provide.…
Appropriateness of the TOEIC[R] Bridge Test for Students in Three Countries of South America
ERIC Educational Resources Information Center
Sinharay, Sandip; Powers, Donald E.; Feng, Ying; Saldivia, Luis; Giunta, Anthony; Simpson, Annabelle; Weng, Vincent
2009-01-01
In order to facilitate the interpretation of test scores from the TOEIC[R] "Bridge" as a measure of English language proficiency, one form of the test was administered to more than 6000 test takers in three South American countries--Colombia, Chile and Ecuador. The appropriateness of the TOEIC "Bridge" test as a measure of…
ERIC Educational Resources Information Center
Eckes, Thomas
2014-01-01
Testlets are subsets of test items that are based on the same stimulus and are administered together. Tests that contain testlets are in widespread use in language testing, but they also share a fundamental problem: Items within a testlet are locally dependent with possibly adverse consequences for test score interpretation and use. Building on…
ERIC Educational Resources Information Center
Ferrara, Steve
2017-01-01
Test security is not an end in itself; it is important because we want to be able to make valid interpretations from test scores. In this article, I propose a framework for comprehensive test security systems: prevention, detection, investigation, and resolution. The article discusses threats to test security, roles and responsibilities, rigorous…
Gispen, Fiona E; Magid, Donna
2016-05-01
Correct selection of imaging tests is essential f or clinicians but until recently has been largely neglected in medical education. How and when students acquire such non-interpretive skills are unknown. This study will assess student knowledge of imaging test selection before and after a general radiology elective. Between 2008 and 2015, an unannounced 13-item test was administered to second, third, and fourth-year students on the first and last days of the Johns Hopkins School of Medicine radiology elective. Scores (0–13) were based on the American College of Radiology Appropriateness Criteria. Pre- and posttest means were compared using paired samples t tests. Whether performance on the pretest and posttest differed by class year was assessed using analysis of variance and Kruskal-Wallis, respectively, and whether year was associated with posttest score after controlling for pretest score was assessed using analysis of covariance. Posttest means were significantly higher than pretest means for students in all years (P values <.0001). Pretest scores differed by year (F(2, 360) = 66.85, P <.0001): fourth-year students scored highest (mean = 9.96 of 13) and second-year students scored lowest (mean = 7.01 of 13). Posttest scores did not differ (χ2(2, 270) = 0.348, P = .841). Year in school had no independent effect on posttest score (F(2, 239) = 0.45, P = .637). Knowledge of modality selection increases with clinical training, but room for improvement remains. A general radiology elective increases this knowledge. Second-year students improve most, suggesting that taking radiology early is efficient, but further research to evaluate retention of this knowledge is needed. Medical student education in radiology must increasingly recognize and address non-interpretive skills and intelligent imaging utilization.
A two-factor theory for concussion assessment using ImPACT: memory and speed.
Schatz, Philip; Maerlender, Arthur
2013-12-01
We present the initial validation of a two-factor structure of Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) using ImPACT composite scores and document the reliability and validity of this factor structure. Factor analyses were conducted for baseline (N = 21,537) and post-concussion (N = 560) data, yielding "Memory" (Verbal and Visual) and "Speed" (Visual Motor Speed and Reaction Time) Factors; inclusion of Total Symptom Scores resulted in a third discrete factor. Speed and Memory z-scores were calculated, and test-retest reliability (using intra-class correlation coefficients) at 1 month (0.88/0.81), 1 year (0.85/0.75), and 2 years (0.76/0.74) were higher than published data using Composite scores. Speed and Memory scores yielded 89% sensitivity and 70% specificity, which was higher than composites (80%/62%) and comparable with subscales (91%/69%). This emergent two-factor structure has improved test-retest reliability with no loss of sensitivity/specificity and may improve understanding and interpretability of ImPACT test results.
[Propensity score matching in SPSS].
Huang, Fuqiang; DU, Chunlin; Sun, Menghui; Ning, Bing; Luo, Ying; An, Shengli
2015-11-01
To realize propensity score matching in PS Matching module of SPSS and interpret the analysis results. The R software and plug-in that could link with the corresponding versions of SPSS and propensity score matching package were installed. A PS matching module was added in the SPSS interface, and its use was demonstrated with test data. Score estimation and nearest neighbor matching was achieved with the PS matching module, and the results of qualitative and quantitative statistical description and evaluation were presented in the form of a graph matching. Propensity score matching can be accomplished conveniently using SPSS software.
The Criterion and Discriminant Validity of the Referential Thinking (REF) Scale
ERIC Educational Resources Information Center
Startup, Mike; Sakrouge, Rebecca; Mason, Oliver J.
2010-01-01
The Referential Thinking (REF) scale was designed to be a comprehensive self-report measure of both simple and guilty ideas of reference in the general population. One aim of the present study was to test the proposed interpretations of REF scores by comparing REF scores with ratings of delusions among psychotic patients. A 2nd aim was to test…
ERIC Educational Resources Information Center
McGill, Ryan J.; Spurgin, Angelia R.
2016-01-01
The current study examined the incremental validity of the Luria interpretive scheme for the Kaufman Assessment Battery for Children-Second Edition (KABC-II) for predicting scores on the Kaufman Test of Educational Achievement-Second Edition (KTEA-II). All participants were children and adolescents (N = 2,025) drawn from the nationally…
Effects of handcuffs on neuropsychological testing: Implications for criminal forensic evaluations.
Biddle, Christine M; Fazio, Rachel L; Dyshniku, Fiona; Denney, Robert L
2018-01-01
Neuropsychological evaluations are increasingly performed in forensic contexts, including in criminal settings where security sometimes cannot be compromised to facilitate evaluation according to standardized procedures. Interpretation of nonstandardized assessment results poses significant challenges for the neuropsychologist. Research is limited in regard to the validation of neuropsychological test accommodation and modification practices that deviate from standard test administration; there is no published research regarding the effects of hand restraints upon neuropsychological evaluation results. This study provides preliminary results regarding the impact of restraints on motor functioning and common neuropsychological tests with a motor component. When restrained, performance on nearly all tests utilized was significantly impacted, including Trail Making Test A/B, a coding test, and several tests of motor functioning. Significant performance decline was observed in both raw scores and normative scores. Regression models are also provided in order to help forensic neuropsychologists adjust for the effect of hand restraints on raw scores of these tests, as the hand restraints also resulted in significant differences in normative scores; in the most striking case there was nearly a full standard deviation of discrepancy.
Techniques for Teaching Projective Assessment.
ERIC Educational Resources Information Center
Stagner, Brian H.
1984-01-01
Describes innovative methods for teaching projective assessment in a one semester graduate level clinical psychology course. The course fosters competence in the basics of administration, scoring, and in-depth interpretation of at least one test. (RM)
ERIC Educational Resources Information Center
Zilberberg, Anna; Finney, Sara J.; Marsh, Kimberly R.; Anderson, Robin D.
2014-01-01
Given worldwide prevalence of low-stakes testing for monitoring educational quality and students' progress through school (e.g., Trends in International Mathematics and Science Study, Program for International Student Assessment), interpretability of resulting test scores is of global concern. The nonconsequential nature of low-stakes tests…
The Search for the Holy Grail: Content-Referenced Score Interpretations from Large-Scale Tests
ERIC Educational Resources Information Center
Marion, Scott F.
2015-01-01
The measurement industry is in crisis. The public outcry against "over testing" and the opt-out movement are symptoms of a larger sociopolitical battle being fought over Common Core, teacher evaluation, federal intrusion, and a host of other issues, but much of the vitriol is directed at the tests and the testing industry. If we, as…
ERIC Educational Resources Information Center
Yoo, Hanwook; Manna, Venessa F.
2017-01-01
This study assessed the factor structure of the Test of English for International Communication (TOEIC®) Listening and Reading test, and its invariance across subgroups of test-takers. The subgroups were defined by (a) gender, (b) age, (c) employment status, (d) time spent studying English, and (e) having lived in a country where English is the…
Predicting Reading and Interpreting I. Q. Differences
ERIC Educational Resources Information Center
Miller, Wallace D.
1973-01-01
The purposes of this study were to determine which of the tests, PPVT, SIT, or Lorge-Thorndike (LT), may be the most useful to the classroom teacher in predicting reading achievement, for use in reading expectancy formulas, and for analyzing I.Q. scores obtained on different tests. (Author/RK)
A Constructivist Technique Which Improves Reading Comprehension.
ERIC Educational Resources Information Center
Raleigh, June
This study investigated whether seventh- and ninth-grade students who did prewriting activities in English class preceding a related literature comprehension test would produce higher raw test scores on literal and interpretive questions than would students who did not use prewriting. The study took place in 1993 and 1995. Participants included…
The Influences of Linguistic Demand and Cultural Loading on Cognitive Test Scores
ERIC Educational Resources Information Center
Cormier, Damien C.; McGrew, Kevin S.; Ysseldyke, James E.
2014-01-01
The increasing diversity of the U.S. population has resulted in increased concerns about the psychological assessment of students from culturally and linguistically diverse backgrounds. To date, little empirical research supports recommendations in test selection and interpretation, such as those presented in the Culture-Language Interpretative…
Does the Test Work? Evaluating a Web-Based Language Placement Test
ERIC Educational Resources Information Center
Long, Avizia Y.; Shin, Sun-Young; Geeslin, Kimberly; Willis, Erik W.
2018-01-01
In response to the need for examples of test validation from which everyday language programs can benefit, this paper reports on a study that used Bachman's (2005) assessment use argument (AUA) framework to examine evidence to support claims made about the intended interpretations and uses of scores based on a new web-based Spanish language…
How Standardized Tests Shape--and Limit--Student Learning. A Policy Research Brief
ERIC Educational Resources Information Center
National Council of Teachers of English, 2014
2014-01-01
The term "standardized" tests is often heard along with "high-stakes." Standardized tests are administered, scored, and interpreted in a consistent way, so that the performances of large groups of students can be compared. They are not in themselves high-stakes, but they are often used for high-stakes purposes such as…
A New Interpretation of Augmented Subscores and Their Added Value in Terms of Parallel Forms
ERIC Educational Resources Information Center
Sinharay, Sandip
2018-01-01
The value-added method of Haberman is arguably one of the most popular methods to evaluate the quality of subscores. The method is based on the classical test theory and deems a subscore to be of added value if the subscore predicts the corresponding true subscore better than does the total score. Sinharay provided an interpretation of the added…
Nielsen, Dorte Guldbrand; Gotzsche, Ole; Sonne, Ole; Eika, Berit
2012-10-01
Two major views on the relationship between basic science knowledge and clinical knowledge stand out; the Two-world view seeing basic science and clinical science as two separate knowledge bases and the encapsulated knowledge view stating that basic science knowledge plays an overt role being encapsulated in the clinical knowledge. However, resent research has implied that a more complex relationship between the two knowledge bases exists. In this study, we explore the relationship between immediate relevant basic science (physiology) and clinical knowledge within a specific domain of medicine (echocardiography). Twenty eight medical students in their 3rd year and 45 physicians (15 interns, 15 cardiology residents and 15 cardiology consultants) took a multiple-choice test of physiology knowledge. The physicians also viewed images of a transthoracic echocardiography (TTE) examination and completed a checklist of possible pathologies found. A total score for each participant was calculated for the physiology test, and for all physicians also for the TTE checklist. Consultants scored significantly higher on the physiology test than did medical students and interns. A significant correlation between physiology test scores and TTE checklist scores was found for the cardiology residents only. Basic science knowledge of immediate relevance for daily clinical work expands with increased work experience within a specific domain. Consultants showed no relationship between physiology knowledge and TTE interpretation indicating that experts do not use basic science knowledge in routine daily practice, but knowledge of immediate relevance remains ready for use.
Lods, wrods, and mods: the interpretation of lod scores calculated under different models.
Hodge, S E; Elston, R C
1994-01-01
In this paper we examine the relationships among classical lod scores, "wrod" scores (lod scores calculated under the wrong genetic model), and "mod" scores (lod scores maximized over genetic model parameters). We compare the behavior of these scores when the state of nature is linkage to their behavior when the state of nature is no linkage. We describe sufficient conditions for mod scores to be valid and discuss their use to determine the correct genetic model. We show that lod scores represent a likelihood-ratio test for independence. We explain the "ascertainment-assumption-free" aspect of using mod scores to determine mode of inheritance and we set this aspect into a well-established statistical framework. Finally, we summarize practical guidelines for the use of mod scores.
NASA Astrophysics Data System (ADS)
Soh, BaoLin P.; Lee, Warwick B.; Wong, Jill; Sim, Llewellyn; Hillis, Stephen L.; Tapia, Kriscia A.; Brennan, Patrick C.
2016-03-01
Aim: To compare the performance of Australian and Singapore breast readers interpreting a single test-set that consisted of mammographic examinations collected from the Australian population. Background: In the teleradiology era, breast readers are interpreting mammographic examinations from different populations. The question arises whether two groups of readers with similar training backgrounds, demonstrate the same level of performance when presented with a population familiar only to one of the groups. Methods: Fifty-three Australian and 15 Singaporean breast radiologists participated in this study. All radiologists were trained in mammogram interpretation and had a median of 9 and 15 years of experience in reading mammograms respectively. Each reader interpreted the same BREAST test-set consisting of sixty de-identified mammographic examinations arising from an Australian population. Performance parameters including JAFROC, ROC, case sensitivity as well as specificity were compared between Australian and Singaporean readers using a Mann Whitney U test. Results: A significant difference (P=0.036) was demonstrated between the JAFROC scores of the Australian and Singaporean breast radiologists. No other significant differences were observed. Conclusion: JAFROC scores for Australian radiologists were higher than those obtained by the Singaporean counterparts. Whilst it is tempting to suggest this is down to reader expertise, this may be a simplistic explanation considering the very similar training and audit backgrounds of the two populations of radiologists. The influence of reading images that are different from those that radiologists normally encounter cannot be ruled out and requires further investigation, particularly in the light of increasing international outsourcing of radiologic reporting.
Ott, Summer; Schatz, Philip; Solomon, Gary; Ryan, Joseph J
2014-03-01
This study documented baseline neurocognitive performance of 23,815 athletes on the Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) test. Specifically, 9,733 Hispanic, Spanish-speaking athletes who completed the ImPACT test in English and 2,087 Hispanic, Spanish-speaking athletes who completed the test in Spanish were compared with 11,955 English-speaking athletes who completed the test in English. Athletes were assigned to age groups (13-15, 16-18). Results revealed a significant effect of language group (p < .001; partial η(2) = 0.06) and age (p < .001; partial η(2) = 0.01) on test performance. Younger athletes performed more poorly than older athletes, and Spanish-speaking athletes completing the test in Spanish scored more poorly than Spanish-speaking and English-speaking athletes completing the test in English, on all Composite scores and Total Symptom scores. Spanish-speaking athletes completing the test in English also performed more poorly than English-speaking athletes completing the test in English on three Composite scores. These differences in performance and reported symptoms highlight the need for caution in interpreting ImPACT test data for Hispanic Americans.
The creation, management, and use of data quality information for life cycle assessment.
Edelen, Ashley; Ingwersen, Wesley W
2018-04-01
Despite growing access to data, questions of "best fit" data and the appropriate use of results in supporting decision making still plague the life cycle assessment (LCA) community. This discussion paper addresses revisions to assessing data quality captured in a new US Environmental Protection Agency guidance document as well as additional recommendations on data quality creation, management, and use in LCA databases and studies. Existing data quality systems and approaches in LCA were reviewed and tested. The evaluations resulted in a revision to a commonly used pedigree matrix, for which flow and process level data quality indicators are described, more clarity for scoring criteria, and further guidance on interpretation are given. Increased training for practitioners on data quality application and its limits are recommended. A multi-faceted approach to data quality assessment utilizing the pedigree method alongside uncertainty analysis in result interpretation is recommended. A method of data quality score aggregation is proposed and recommendations for usage of data quality scores in existing data are made to enable improved use of data quality scores in LCA results interpretation. Roles for data generators, data repositories, and data users are described in LCA data quality management. Guidance is provided on using data with data quality scores from other systems alongside data with scores from the new system. The new pedigree matrix and recommended data quality aggregation procedure can now be implemented in openLCA software. Additional ways in which data quality assessment might be improved and expanded are described. Interoperability efforts in LCA data should focus on descriptors to enable user scoring of data quality rather than translation of existing scores. Developing and using data quality indicators for additional dimensions of LCA data, and automation of data quality scoring through metadata extraction and comparison to goal and scope are needed.
Keijzers, Gerben; Sithirasenan, Vasugi
2012-02-01
To assess the chest computed tomography (CT) imaging interpreting skills of emergency department (ED) doctors and to study the effect of a CT chest imaging interpretation lecture on these skills. Sixty doctors in two EDs were randomized, using computerized randomization, to either attend a chest CT interpretation lecture or not to attend this lecture. Within 2 weeks of the lecture, the participants completed a questionnaire on demographic variables, anatomical knowledge, and diagnostic interpretation of 10 chest CT studies. Outcome measures included anatomical knowledge score, diagnosis score, and the combined overall score, all expressed as a percentage of correctly answered questions (0-100). Data on 58 doctors were analyzed, of which 27 were randomized to attend the lecture. The CT interpretation lecture did not have an effect on anatomy knowledge scores (72.9 vs. 70.2%), diagnosis scores (71.2 vs. 69.2%), or overall scores (71.4 vs. 69.5%). Twenty-nine percent of doctors stated that they had a systematic approach to chest CT interpretation. Overall self-perceived competency for interpreting CT imaging (brain, chest, abdomen) was low (between 3.2 and 5.2 on a 10-point Visual Analogue Scale). A single chest CT interpretation lecture did not improve chest CT interpretation by ED doctors. Less than one-third of doctors had a systematic approach to chest CT interpretation. A standardized systematic approach may improve interpretation skills.
Stein, Janine; Luppa, Melanie; Luck, Tobias; Maier, Wolfgang; Wagner, Michael; Daerr, Moritz; van den Bussche, Hendrik; Zimmermann, Thomas; Köhler, Mirjam; Bickel, Horst; Mösch, Edelgard; Weyerer, Siegfried; Kaufeler, Teresa; Pentzek, Michael; Wiese, Birgitt; Wollny, Anja; König, Hans-Helmut; Riedel-Heller, Steffi G
2012-01-01
The Consortium to Establish a Registry for Alzheimer's Disease-Neuropsychological (CERAD-NP) battery represents a commonly used neuropsychological instrument to measure cognitive functioning in the elderly. This study provides normative data for changes in cognitive function that normally occur in cognitively healthy individuals to interpret changes in CERAD-NP test scores over longer time periods. Longitudinal cohort study with three assessments at 1.5-year intervals over a period of 3 years. : Primary care medical record registry sample. As part of the German Study on Ageing, Cognition, and Dementia in Primary Care Patients, a sample of 1,450 cognitively healthy general practitioner patients, age 75 years and older, was assessed. Age-, education-, and gender-specific Reliable Change Indices (RCIs) were computed for a 90% confidence interval for selected subtests of the CERAD-NP battery. Across different age, education, and gender subgroups, changes from at least six to nine points in Verbal Fluency, four to eight points in Word List Memory, two to four points in Word List Recall, and one to four points in Word List Recognition indicated significant (i.e. reliable) changes in CERAD-NP test scores at the 90% confidence level. Furthermore, the calculation of RCIs for individual patients is demonstrated. Smaller changes in CERAD-NP test scores can be interpreted with only high uncertainty because of probable measurement error, practice effects, and normal age-related cognitive decline. This study, for the first time, provides age-, education-, and gender-specific CERAD-NP reference values on the basis of RCI methods for the interpretation of cognitive changes in older-age groups.
ERIC Educational Resources Information Center
Airola, Denise Tobin
2011-01-01
Changes to state tests impact the ability of State Education Agencies (SEAs) to monitor change in performance over time. The purpose of this study was to evaluate the Standardized Performance Growth Index (PGIz), a proposed statistical model for measuring change in student and school performance, across transitions in tests. The PGIz is a…
Motivation Filtering on a Multi-Institution Assessment of General College Outcomes
ERIC Educational Resources Information Center
Steedle, Jeffrey T.
2014-01-01
Possible lack of motivation is a perpetual concern when tests have no stakes attached to performance. Specifically, the validity of test score interpretations may be compromised when examinees are unmotivated to exert their best efforts. Motivation filtering, a procedure that filters out apparently unmotivated examinees, was applied to the…
ERIC Educational Resources Information Center
McFarland, Dennis J.
2014-01-01
Purpose: Factor analysis is a useful technique to aid in organizing multivariate data characterizing speech, language, and auditory abilities. However, knowledge of the limitations of factor analysis is essential for proper interpretation of results. The present study used simulated test scores to illustrate some characteristics of factor…
Alignment of Standards and Assessments as an Accountability Criterion. ERIC Digest.
ERIC Educational Resources Information Center
La Marca, Paul M.
This digest provides an overview of the concept of alignment and the role it plays in assessment and accountability systems. It also discusses methodological issues affecting the study of alignment and explores the relationship between alignment and test score interpretation. Alignment refers to the degree of match between test content and subject…
Dichotomous scoring of Trails B in patients referred for a dementia evaluation.
Schmitt, Andrew L; Livingston, Ronald B; Smernoff, Eric N; Waits, Bethany L; Harris, James B; Davis, Kent M
2010-04-01
The Trail Making Test is a popular neuropsychological test and its interpretation has traditionally used time-based scores. This study examined an alternative approach to scoring that is simply based on the examinees' ability to complete the test. If an examinee is able to complete Trails B successfully, they are coded as "completers"; if not, they are coded as "noncompleters." To assess this approach to scoring Trails B, the performance of 97 diagnostically heterogeneous individuals referred for a dementia evaluation was examined. In this sample, 55 individuals successfully completed Trails B and 42 individuals were unable to complete it. Point-biserial correlations indicated a moderate-to-strong association (r(pb)=.73) between the Trails B completion variable and the Total Scale score of the Repeatable Battery for the Assessment of Neurological Status (RBANS), which was larger than the correlation between the Trails B time-based score and the RBANS Total Scale score (r(pb)=.60). As a screen for dementia status, Trails B completion showed a sensitivity of 69% and a specificity of 100% in this sample. These results suggest that dichotomous scoring of Trails B might provide a brief and clinically useful measure of dementia status.
The reliability and validity of qualitative scores for the Controlled Oral Word Association Test.
Ross, Thomas P; Calhoun, Emily; Cox, Tara; Wenner, Carolyn; Kono, Whitney; Pleasant, Morgan
2007-05-01
The reliability and validity of two qualitative scoring systems for the Controlled Oral Word Association Test [Benton, A. L., Hamsher, de S. K., & Sivan, A. B. (1983). Multilingual aplasia examination (2nd ed.). Iowa City, IA: AJA Associates] were examined in 108 healthy young adults. The scoring systems developed by Troyer et al. [Troyer, A. K., Moscovich, M., & Winocur, G. (1997). Clustering and switching as two components of verbal fluency: Evidence from younger and older healthy adults. Neuropsychology, 11, 138-146] and by Abwender et al. [Abwender, D. A., Swan, J. G., Bowerman, J. T., & Connolly, S. W. (2001a). Qualitative analysis of verbal fluency output: Review and comparison of several scoring methods. Assessment, 8, 323-336] each demonstrated excellent interrater reliability (all indices at or above r(icc)=.9). Consistent with previous research [e.g., Ross, T. P. (2003). The reliability of cluster and switch scores for the COWAT. Archives of Clinical Psychology, 18, 153-164), test-retest reliability coefficients (N=53; M interval 44.6 days) for the qualitative scores were modest to poor (r(icc)=.6 to .4 range). Correlations among COWAT scores, measures of executive functioning, verbal learning, working memory, and vocabulary were examined. The idea that qualitative scores represent distinct executive functions such as cognitive flexibility or strategy utilization was not supported. We offer the interpretation that COWAT performance may require the ability to retrieve words in a non-routine manner while suppressing habitual responses and associated processing interference, presumably due to a spread of activation across semantic or lexical networks. This interpretation, though speculative at present, implies that clustering and switching on the COWAT may not be entirely deliberate, but rather an artifact of a passive (i.e., state-dependent) process. Ideas for future research, most noticeably experimental studies using cognitive methods (e.g., priming), are discussed.
Kindermann, David; Schmid, Carolin; Derreza-Greeven, Cassandra; Huhn, Daniel; Kohl, Rupert Maria; Junne, Florian; Schleyer, Maritta; Daniels, Judith K; Ditzen, Beate; Herzog, Wolfgang; Nikendei, Christoph
2017-01-01
A substantial proportion of refugees, fleeing persecution, torture, and war, are estimated to suffer from psychological traumatization. After being sheltered in reception centers, the refugees come in close contact with different occupational groups, e.g., physicians, social workers, and interpreters. Previous studies ascertained that such interpreters themselves often suffer from primary psychological traumatization. Moreover, through translating refugees' potentially traumatic depictions, the interpreters are in danger of developing a so-called secondary traumatization. The present study aimed (1) to analyze the prevalence rates of primary traumatization in interpreters, (2) to assess the prevalence of secondary traumatization, depression, anxiety, and stress symptoms, (3) to examine the association between secondary traumatization symptoms and resilience factors in terms of sense of coherence, social support, and attachment style, and (4) to test whether these resilience factors mediate the relationship between primary and secondary traumatization. Participating interpreters (n = 64) were assessed for past exposure to potentially traumatic events as well as symptoms of posttraumatic stress disorder (PTSD), secondary traumatization, depressive symptoms, anxiety, and subjective stress levels. Furthermore, we conducted psychometric surveys to measure interpreters' sense of coherence, degree of social support, and attachment style as potential predictors. (1) 9% of the interpreters fulfilled all criteria for PTSD and a further 33% had subclinical PTSD; (2) a secondary traumatization was present in 21% of the examined interpreters - of these, 6% showed very high total scores indicating a severe secondary traumatization; furthermore, we found higher scores for depression, anxiety, and stress as compared to representative population samples, especially for females; (3) a present sense of coherence, an existing social support network, and a secure or preoccupied attachment style correlated significantly with low scores for secondary traumatization; and (4) a significant correlation emerged between primary and secondary traumatization (r = 0.595, p < 0.001); a mediation analysis revealed that this effect is partially mediated by secure attachment. A substantial proportion of interpreters working with refugees suffer from primary as well as secondary traumatization. However, high scores for sense of coherence and social support, male gender, and especially a secure attachment style were identified as resilience factors for secondary traumatization. The results may have implications for the selection, training, and supervision of interpreters. © 2017 S. Karger AG, Basel.
Training Effectiveness Assessment. Volume II. Problems, Concepts, and Evaluation Alternatives.
1976-12-01
i nforma ti on abou t areas where course impr ov emer t might be indicated . Percentiles , pretest and posttest scores , or other measures of amount...statistical sophisti- cation. Interpretation of gain scores derived from pretests - posttests of trainees and other forms of trend analysis requires...CPM ), computer - managed testing (CMI). time-series analysi s, pretest / posttest design , and secondary anal ysis. Criterion -referenced measurement is
ERIC Educational Resources Information Center
Floyd, Randy G.; McGrew, Kevin S.; Barry, Amberly; Rafael, Fawziya; Rogers, Joshua
2009-01-01
Many school psychologists focus their interpretation on composite scores from intelligence test batteries designed to measure the broad abilities from the Cattell-Horn-Carroll theory. The purpose of this study was to investigate the general factor loadings and specificity of the broad ability composite scores from one such intelligence test…
Evidence-based practice knowledge, attitudes, and practice of online graduate nursing students.
Rojjanasrirat, Wilaiporn; Rice, Jan
2017-06-01
This study aimed to evaluate changes in evidence-based practice (EBP) knowledge, attitudes, and practice of nursing students before and after completing an online, graduate level, introductory research/EBP course. A prospective one-group pretest-posttest design. A private university in the Midwestern, USA. Sixty-three online nurse practitioner students in Master's program. A convenient sample of online graduate nursing students who enrolled in the research/EBP course was invited to participate in the study. Study outcomes were measured using the Evidence-Based Practice Questionnaire (EBPQ) before and after completing the course. Descriptive statistics and paired-Samples t-test was used to assess the mean differences between pre-and post-test scores. Overall, students' post-test EBP scores were significantly improved over pre-test scores, t(63)=-9.034, p<0.001). Statistically significant differences were found for practice of EBP mean scores t(63)=-12.78, p=0.001). No significant differences were found between pre and post-tests on knowledge and attitudes toward EBP scores. Most frequently cited barriers to EBP were lack of understanding of statistics, interpretation of findings, lack of time, and lack of library resources. Copyright © 2017 Elsevier Ltd. All rights reserved.
Rubinstein, Jack; Dhoble, Abhijeet; Ferenchick, Gary
2009-01-13
Most medical professionals are expected to possess basic electrocardiogram (EKG) interpretation skills. But, published data suggests that residents' and physicians' EKG interpretation skills are suboptimal. Learning styles differ among medical students; individualization of teaching methods has been shown to be viable and may result in improved learning. Puzzles have been shown to facilitate learning in a relaxed environment. The objective of this study was to assess efficacy of teaching puzzle in EKG interpretation skills among medical students. This is a reader blinded crossover trial. Third year medical students from College of Human Medicine, Michigan State University participated in this study. Two groups (n = 9) received two traditional EKG interpretation skills lectures followed by a standardized exam and two extra sessions with the teaching puzzle and a different exam. Two other groups (n = 6) received identical courses and exams with the puzzle session first followed by the traditional teaching. EKG interpretation scores on final test were used as main outcome measure. The average score after only traditional teaching was 4.07 +/- 2.08 while after only the puzzle session was 4.04 +/- 2.36 (p = 0.97). The average improvement after the traditional session was followed up with a puzzle session was 2.53 +/- 1.94 while the average improvement after the puzzle session was followed with the traditional session was 2.08 +/- 1.73 (p = 0.67). The final EKG exam score for this cohort (n = 15) was 84.1 compared to 86.6 (p = 0.22) for a comparable sample of medical students (n = 15) at a different campus. Teaching EKG interpretation with puzzles is comparable to traditional teaching and may be particularly useful for certain subgroups of students. Puzzle session are more interactive and relaxing, and warrant further investigations on larger scale.
ERIC Educational Resources Information Center
Malone, Margaret E.; Montee, Megan
2014-01-01
The "TOEFL iBT"® test presents test takers with tasks meant to simulate the tasks required of students in English-medium universities. Research establishing the validity argument for the test provides evidence for score interpretation and the use of the test for university admissions and placement. Now that the test has been operational…
Do We Really Become Smarter When Our Fluid-Intelligence Test Scores Improve?
Hayes, Taylor R.; Petrov, Alexander A.; Sederberg, Per B.
2014-01-01
Recent reports of training-induced gains on fluid intelligence tests have fueled an explosion of interest in cognitive training—now a billion-dollar industry. The interpretation of these results is questionable because score gains can be dominated by factors that play marginal roles in the scores themselves, and because intelligence gain is not the only possible explanation for the observed control-adjusted far transfer across tasks. Here we present novel evidence that the test score gains used to measure the efficacy of cognitive training may reflect strategy refinement instead of intelligence gains. A novel scanpath analysis of eye movement data from 35 participants solving Raven’s Advanced Progressive Matrices on two separate sessions indicated that one-third of the variance of score gains could be attributed to test-taking strategy alone, as revealed by characteristic changes in eye-fixation patterns. When the strategic contaminant was partialled out, the residual score gains were no longer significant. These results are compatible with established theories of skill acquisition suggesting that procedural knowledge tacitly acquired during training can later be utilized at posttest. Our novel method and result both underline a reason to be wary of purported intelligence gains, but also provide a way forward for testing for them in the future. PMID:25395695
Do We Really Become Smarter When Our Fluid-Intelligence Test Scores Improve?
Hayes, Taylor R; Petrov, Alexander A; Sederberg, Per B
2015-01-01
Recent reports of training-induced gains on fluid intelligence tests have fueled an explosion of interest in cognitive training-now a billion-dollar industry. The interpretation of these results is questionable because score gains can be dominated by factors that play marginal roles in the scores themselves, and because intelligence gain is not the only possible explanation for the observed control-adjusted far transfer across tasks. Here we present novel evidence that the test score gains used to measure the efficacy of cognitive training may reflect strategy refinement instead of intelligence gains. A novel scanpath analysis of eye movement data from 35 participants solving Raven's Advanced Progressive Matrices on two separate sessions indicated that one-third of the variance of score gains could be attributed to test-taking strategy alone, as revealed by characteristic changes in eye-fixation patterns. When the strategic contaminant was partialled out, the residual score gains were no longer significant. These results are compatible with established theories of skill acquisition suggesting that procedural knowledge tacitly acquired during training can later be utilized at posttest. Our novel method and result both underline a reason to be wary of purported intelligence gains, but also provide a way forward for testing for them in the future.
Standard of practice and Flynn Effect testimony in death penalty cases.
Gresham, Frank M; Reschly, Daniel J
2011-06-01
The Flynn Effect is a well-established psychometric fact documenting substantial increases in measured intelligence test performance over time. Flynn's (1984) review of the literature established that Americans gain approximately 0.3 points per year or 3 points per decade in measured intelligence. The accurate assessment and interpretation of intellectual functioning becomes critical in death penalty cases that seek to determine whether an individual meets the criteria for intellectual disability and thereby is ineligible for execution under Atkins v. Virginia (2002) . We reviewed the literature on the Flynn Effect and demonstrated how failure to adjust intelligence test scores based on this phenomenon invalidates test scores and may be in violation of the Standards for Educational and Psychological Testing as well as the "Ethical Principles for Psychologists and Code of Conduct." Application of the Flynn Effect and score adjustments for obsolete norms clearly is supported by science and should be implemented by practicing psychologists.
Comparison of two teaching methods for cardiac arrhythmia interpretation among nursing students.
Varvaroussis, Dimitrios P; Kalafati, Maria; Pliatsika, Paraskevi; Castrén, Maaret; Lott, Carsten; Xanthos, Theodoros
2014-02-01
The aim of this study was to compare the six-stage method (SSM) for instructing primary cardiac arrhythmias interpretation to students without basic electrocardiogram (ECG) knowledge with a descriptive teaching method in a single educational intervention. This is a randomized trial. Following a brief instructional session, undergraduate nursing students, assigned to group A (SSM) and group B (descriptive teaching method), undertook a written test in cardiac rhythm recognition, immediately after the educational intervention (initial exam). Participants were also examined with an unannounced retention test (final exam), one month after instruction. Altogether 134 students completed the study. Interpretation accuracy for each cardiac arrhythmia was assessed. Mean score at the initial exam was 8.71±1.285 for group A and 8.74±1.303 for group B. Mean score at the final exam was 8.25±1.46 for group A vs 7.84±1.44 for group B. Overall results showed that the SSM was equally effective with the descriptive teaching method. The study showed that in each group bradyarrhythmias were identified correctly by more students than tachyarrhythmias. No significant difference between the two teaching methods was seen for any specific cardiac arrhythmia. The SSM effectively develops staff competency for interpreting common cardiac arrhythmias in students without ECG knowledge. More research is needed to support this conclusion and the method's effectiveness must be evaluated if being implemented to trainee groups with preexisting basic ECG interpretation knowledge. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Caruso, J C
2001-06-01
The unreliability of difference scores is a well documented phenomenon in the social sciences and has led researchers and practitioners to interpret differences cautiously, if at all. In the case of the Kaufman Adult and Adolescent Intelligence Test (KAIT), the unreliability of the difference between the Fluid IQ and the Crystallized IQ is due to the high correlation between the two scales. The consequences of the lack of precision with which differences are identified are wide confidence intervals and unpowerful significance tests (i.e., large differences are required to be declared statistically significant). Reliable component analysis (RCA) was performed on the subtests of the KAIT in order to address these problems. RCA is a new data reduction technique that results in uncorrelated component scores with maximum proportions of reliable variance. Results indicate that the scores defined by RCA have discriminant and convergent validity (with respect to the equally weighted scores) and that differences between the scores, derived from a single testing session, were more reliable than differences derived from equal weighting for each age group (11-14 years, 15-34 years, 35-85+ years). This reliability advantage results in narrower confidence intervals around difference scores and smaller differences required for statistical significance.
Jenkinson, Toni-Marie; Muncer, Steven; Wheeler, Miranda; Brechin, Don; Evans, Stephen
2018-06-01
Neuropsychological assessment requires accurate estimation of an individual's premorbid cognitive abilities. Oral word reading tests, such as the test of premorbid functioning (TOPF), and demographic variables, such as age, sex, and level of education, provide a reasonable indication of premorbid intelligence, but their ability to predict other related cognitive abilities is less well understood. This study aimed to develop regression equations, based on the TOPF and demographic variables, to predict scores on tests of verbal fluency and naming ability. A sample of 119 healthy adults provided demographic information and were tested using the TOPF, FAS, animal naming test (ANT), and graded naming test (GNT). Multiple regression analyses, using the TOPF and demographics as predictor variables, were used to estimate verbal fluency and naming ability test scores. Change scores and cases of significant impairment were calculated for two clinical samples with diagnosed neurological conditions (TBI and meningioma) using the method in Knight, McMahon, Green, and Skeaff (). Demographic variables provided a significant contribution to the prediction of all verbal fluency and naming ability test scores; however, adding TOPF score to the equation considerably improved prediction beyond that afforded by demographic variables alone. The percentage of variance accounted for by demographic variables and/or TOPF score varied from 19 per cent (FAS), 28 per cent (ANT), and 41 per cent (GNT). Change scores revealed significant differences in performance in the clinical groups, particularity the TBI group. Demographic variables, particularly education level, and scores on the TOPF should be taken into consideration when interpreting performance on tests of verbal fluency and naming ability. © 2017 The British Psychological Society.
The Impact of Educational Policy on English Learners in a Rural Indiana School Corporation
ERIC Educational Resources Information Center
Burke, April M.
2015-01-01
Indiana English learners (ELs) constitute a rapidly growing portion of the state's school-aged population, and those classified as limited English proficient are low performers on the state test. The purpose of this embedded mixed methods study was to understand how school personnel respond to accountability mandates, interpret test scores, and…
Interpreting Mathematics Scores on the New Jersey College Basic Skills Placement Test.
ERIC Educational Resources Information Center
Dass, Jane; Pine, Charles
The New Jersey College Basic Skills Placement Test (NJCBSPT) is designed to measure certain basic language and mathematics skills of students entering New Jersey colleges. The primary purpose of the two mathematics sections is to determine whether students are prepared to begin certain college-level work without a handicap in computation or…
ERIC Educational Resources Information Center
Davis, Paul; Kvern, Brent; Donen, Neil; Andrews, Elaine; Nixon, Olga
2000-01-01
Pre/posttest data on 40 physicians who completed problem-based clinical scenarios on osteoporosis revealed that 39 showed improvement or modest change in postworkshop scores, especially in terms of management of male patients, determination of risk factors, and use and interpretation of bone density tests. (SK)
Karr, Justin E; Garcia-Barrera, Mauricio A; Holdnack, James A; Iverson, Grant L
2017-05-01
Executive function consists of multiple cognitive processes that operate as an interactive system to produce volitional goal-oriented behavior, governed in large part by frontal microstructural and physiological networks. Identification of deficits in executive function in those with neurological or psychiatric conditions can be difficult because the normal variation in executive function test scores, in healthy adults when multiple tests are used, is largely unknown. This study addresses that gap in the literature by examining the prevalence of low scores on a brief battery of executive function tests. The sample consisted of 1,050 healthy individuals (ages 16-89) from the standardization sample for the Delis-Kaplan Executive Function System (D-KEFS). Seven individual test scores from the Trail Making Test, Color-Word Interference Test, and Verbal Fluency Test were analyzed. Low test scores, as defined by commonly used clinical cut-offs (i.e., ≤25th, 16th, 9th, 5th, and 2nd percentiles), occurred commonly among the adult portion of the D-KEFS normative sample (e.g., 62.8% of the sample had one or more scores ≤16th percentile, 36.1% had one or more scores ≤5th percentile), and the prevalence of low scores increased with lower intelligence and fewer years of education. The multivariate base rates (BR) in this article allow clinicians to understand the normal frequency of low scores in the general population. By use of these BRs, clinicians and researchers can improve the accuracy with which they identify executive dysfunction in clinical groups, such as those with traumatic brain injury or neurodegenerative diseases. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Bello, Ajediran I; Ofori, Eric K; Alabi, Oluwasegun J; Adjei, David N
2014-03-29
Objective physical assessment of patients with lumbar spondylosis involves plain film radiographs (PFR) viewing and interpretation by the radiologists. Physiotherapists also routinely assess PFR within the scope of their practice. However, studies appraising the level of agreement of physiotherapists' PFR interpretation with radiologists are not common in Ghana. Forty-one (41) physiotherapists took part in the cross-sectional survey. An assessment guide was developed from findings of the interpretation of three PFR of patients with lumbar spondylosis by a radiologist. The three PFR were selected from a pool of different radiographs based on clarity, common visible pathological features, coverage body segments and short post production period. Physiotherapists were required to view the same PFR after which they were assessed with the assessment guide according to the number of features identified correctly or incorrectly. The score range on the assessment form was 0-24, interpreted as follow: 0-8 points (low), 9-16 points (moderate) and 17-24 points (high) levels of agreement. Data were analyzed using one sample t-test and fisher's exact test at α = 0.05. The mean score of interpretation for the physiotherapists was 12.7 ± 2.6 points compared to the radiologist's interpretation of 24 points (assessment guide). The physiotherapists' levels were found to be significantly associated with their academic qualification (p = 0.006) and sex (p = 0.001). However, their levels of agreement were not significantly associated with their age group (p = 0.098), work settings (p = 0.171), experience (p = 0.666), preferred PFR view (p = 0.088) and continuing education (p = 0.069). The physiotherapists' skills fall short of expectation for interpreting PFR of patients with lumbar spondylosis. The levels of agreement with radiologist's interpretation have no link with year of clinial practice, age, work settings and continuing education. Thus, routine PFR viewing techniques should be made a priority in physiotherapists' continuing professional education.
Stroop Color-Word Interference Test: Normative data for Spanish-speaking pediatric population.
Rivera, D; Morlett-Paredes, A; Peñalver Guia, A I; Irías Escher, M J; Soto-Añari, M; Aguayo Arelis, A; Rute-Pérez, S; Rodríguez-Lorenzana, A; Rodríguez-Agudelo, Y; Albaladejo-Blázquez, N; García de la Cadena, C; Ibáñez-Alfonso, J A; Rodriguez-Irizarry, W; García-Guerrero, C E; Delgado-Mejía, I D; Padilla-López, A; Vergara-Moragues, E; Barrios Nevado, M D; Saracostti Schwartzman, M; Arango-Lasprilla, J C
2017-01-01
To generate normative data for the Stroop Word-Color Interference test in Spanish-speaking pediatric populations. The sample consisted of 4,373 healthy children from nine countries in Latin America (Chile, Cuba, Ecuador, Guatemala, Honduras, Mexico, Paraguay, Peru, and Puerto Rico) and Spain. Each participant was administered the Stroop Word-Color Interference test as part of a larger neuropsychological battery. The Stroop Word, Stroop Color, Stroop Word-Color, and Stroop Interference scores were normed using multiple linear regressions and standard deviations of residual values. Age, age2, sex, and mean level of parental education (MLPE) were included as predictors in the analyses. The final multiple linear regression models showed main effects for age on all scores, except on Stroop Interference for Guatemala, such that scores increased linearly as a function of age. Age2 affected Stroop Word scores for all countries, Stroop Color scores for Ecuador, Mexico, Peru, and Spain; Stroop Word-Color scores for Ecuador, Mexico, and Paraguay; and Stroop Interference scores for Cuba, Guatemala, and Spain. MLPE affected Stroop Word scores for Chile, Mexico, and Puerto Rico; Stroop Color scores for Mexico, Puerto Rico, and Spain; Stroop Word-Color scores for Ecuador, Guatemala, Mexico, Puerto Rico and Spain; and Stroop-Interference scores for Ecuador, Mexico, and Spain. Sex affected Stroop Word scores for Spain, Stroop Color scores for Mexico, and Stroop Interference for Honduras. This is the largest Spanish-speaking pediatric normative study in the world, and it will allow neuropsychologists from these countries to have a more accurate approach to interpret the Stroop Word-Color Interference test in pediatric populations.
2014-05-01
hand and right hand on the piano, or strumming and chording on the guitar . Perceptual This skill category involves detecting and interpreting sensory...measured as the percent correct, # correct, accumulated points, task/test scoring correct action/timing/performance. This also includes quality rating by...competition and scoring , as well as constraints, privileges and penalties. Simulation-Based The primary delivery environment is an interactive synthetic
ERIC Educational Resources Information Center
Austin Independent School District, TX.
Designed for junior high and high school students and their parents, this brochure explains the structure, function, and method for interpretation of the Iowa Tests of Basic Skills and the Sequential Tests of Educational Progress. A question and answer format is used to provide information on scope and purposes of the tests, meaning and accuracy…
Cozzi-Lepri, Alessandro; Prosperi, Mattia C F; Kjær, Jesper; Dunn, David; Paredes, Roger; Sabin, Caroline A; Lundgren, Jens D; Phillips, Andrew N; Pillay, Deenan
2011-01-01
The question of whether a score for a specific antiretroviral (e.g. lopinavir/r in this analysis) that improves prediction of viral load response given by existing expert-based interpretation systems (IS) could be derived from analyzing the correlation between genotypic data and virological response using statistical methods remains largely unanswered. We used the data of the patients from the UK Collaborative HIV Cohort (UK CHIC) Study for whom genotypic data were stored in the UK HIV Drug Resistance Database (UK HDRD) to construct a training/validation dataset of treatment change episodes (TCE). We used the average square error (ASE) on a 10-fold cross-validation and on a test dataset (the EuroSIDA TCE database) to compare the performance of a newly derived lopinavir/r score with that of the 3 most widely used expert-based interpretation rules (ANRS, HIVDB and Rega). Our analysis identified mutations V82A, I54V, K20I and I62V, which were associated with reduced viral response and mutations I15V and V91S which determined lopinavir/r hypersensitivity. All models performed equally well (ASE on test ranging between 1.1 and 1.3, p = 0.34). We fully explored the potential of linear regression to construct a simple predictive model for lopinavir/r-based TCE. Although, the performance of our proposed score was similar to that of already existing IS, previously unrecognized lopinavir/r-associated mutations were identified. The analysis illustrates an approach of validation of expert-based IS that could be used in the future for other antiretrovirals and in other settings outside HIV research.
Dong, Ruimin; Yang, Xiaoyan; Xing, Bangrong; Zou, Zihao; Zheng, Zhenda; Xie, Xujing; Zhu, Jieming; Chen, Lin; Zhou, Hanjian
2015-01-01
Concept mapping is an effective method in teaching and learning, however this strategy has not been evaluated among electrocardiogram (ECG) diagnosis learning. This study explored the use of concept maps to assist ECG study, and sought to analyze whether this method could improve undergraduate students’ ECG interpretation skills. There were 126 undergraduate medical students who were randomly selected and assigned to two groups, group A (n = 63) and group B (n = 63). Group A was taught to use concept maps to learn ECG diagnosis, while group B was taught by traditional methods. After the course, all of the students were assessed by having an ECG diagnostic test. Quantitative data which comprised test score and ECG features completion index was compared by using the unpaired Student’s t-test between the two groups. Further, a feedback questionnaire on concept maps used was also completed by group A, comments were evaluated by a five-point Likert scale. The test scores of ECGs interpretation was 7.36 ± 1.23 in Group A and 6.12 ± 1.39 in Group B. A significant advantage (P = 0.018) of concept maps was observed in ECG interpretation accuracy. No difference in the average ECG features completion index was observed between Group A (66.75 ± 15.35%) and Group B (62.93 ± 13.17%). According qualitative analysis, majority of students accepted concept maps as a helpful tool. Difficult to learn at the beginning and time consuming are the two problems in using this method, nevertheless most of the students indicated to continue using it. Concept maps could be a useful pedagogical tool in enhancing undergraduate medical students’ ECG interpretation skills. Furthermore, students indicated a positive attitude to it, and perceived it as a resource for learning. PMID:26221331
Patient movement characteristics and the impact on CBCT image quality and interpretability.
Spin-Neto, Rubens; Costa, Cláudio; Salgado, Daniela Mra; Zambrana, Nataly Rm; Gotfredsen, Erik; Wenzel, Ann
2018-01-01
To assess the impact of patient movement characteristics and metal/radiopaque materials in the field-of-view (FOV) on CBCT image quality and interpretability. 162 CBCT examinations were performed in 134 consecutive (i.e. prospective data collection) patients (age average: 27.2 years; range: 9-73). An accelerometer-gyroscope system registered patient's head position during examination. The threshold for movement definition was set at ≥0.5-mm movement distance based on accelerometer-gyroscope recording. Movement complexity was defined as uniplanar/multiplanar. Three observers scored independently: presence of stripe (i.e. streak) artefacts (absent/"enamel stripes"/"metal stripes"/"movement stripes"), overall unsharpness (absent/present) and image interpretability (interpretable/not interpretable). Kappa statistics assessed interobserver agreement. χ 2 tests analysed whether movement distance, movement complexity and metal/radiopaque material in the FOV affected image quality and image interpretability. Relevant risk factors (p ≤ 0.20) were entered into a multivariate logistic regression analysis with "not interpretable" as the outcome. Interobserver agreement for image interpretability was good (average = 0.65). Movement distance and presence of metal/radiopaque materials significantly affected image quality and interpretability. There were 22-28 cases, in which the observers stated the image was not interpretable. Small movements (i.e. <3 mm) did not significantly affect image interpretability. For movements ≥ 3 mm, the risk that a case was scored as "not interpretable" was significantly (p ≤ 0.05) increased [OR 3.2-11.3; 95% CI (0.70-65.47)]. Metal/radiopaque material was also a significant (p ≤ 0.05) risk factor (OR 3.61-5.05). Patient movement ≥3 mm and metal/radiopaque material in the FOV significantly affected CBCT image quality and interpretability.
Pourmand, Ali; Woodward, Christina; Shokoohi, Hamid; King, Jordan B; Taheri, M Reza; King, Jackson; Lawrence, Christopher
2018-01-01
Context Web-based learning (WBL) modules are effectively used to improve medical education curriculum; however, they have not been evaluated to improve head computed tomography (CT) scan interpretation in an emergency medicine (EM) setting. Objective To evaluate the effectiveness of a WBL module to aid identification of cranial structures on CT and to improve ability to distinguish between normal and abnormal findings. Design Prospective, before-and-after trial in the Emergency Department of an academic center. Baseline head CT knowledge was assessed via a standardized test containing ten head CT scans, including normal scans and those showing hemorrhagic stroke, trauma, and infection (abscess). All trainees then participated in a WBL intervention. Three weeks later, they were given the same ten CT scans to evaluate in a standardized posttest. Main Outcome Measures Improvement in test scores. Results A total of 131 EM clerkship students and 32 EM residents were enrolled. Pretest scores correlated with stage of training, with students and first-year residents demonstrating the lowest scores. Overall, there was a significant improvement in percentage of correctly classified CT images after the training intervention from a mean pretest score of 32% ± 12% to posttest score of 67% ± 13% (mean improvement = 35% ± 13%, p < 0.001). Among subsets by training level, all subgroups except first-year residents demonstrated a statistically significant increase in scores after the training. Conclusion Incorporating asynchronous WBL modules into EM clerkship and residency curriculum provides early radiographic exposure in their clinical training and can enhance diagnostic head CT scan interpretation. PMID:29272248
Validity threats: overcoming interference with proposed interpretations of assessment data.
Downing, Steven M; Haladyna, Thomas M
2004-03-01
Factors that interfere with the ability to interpret assessment scores or ratings in the proposed manner threaten validity. To be interpreted in a meaningful manner, all assessments in medical education require sound, scientific evidence of validity. The purpose of this essay is to discuss 2 major threats to validity: construct under-representation (CU) and construct-irrelevant variance (CIV). Examples of each type of threat for written, performance and clinical performance examinations are provided. The CU threat to validity refers to undersampling the content domain. Using too few items, cases or clinical performance observations to adequately generalise to the domain represents CU. Variables that systematically (rather than randomly) interfere with the ability to meaningfully interpret scores or ratings represent CIV. Issues such as flawed test items written at inappropriate reading levels or statistically biased questions represent CIV in written tests. For performance examinations, such as standardised patient examinations, flawed cases or cases that are too difficult for student ability contribute CIV to the assessment. For clinical performance data, systematic rater error, such as halo or central tendency error, represents CIV. The term face validity is rejected as representative of any type of legitimate validity evidence, although the fact that the appearance of the assessment may be an important characteristic other than validity is acknowledged. There are multiple threats to validity in all types of assessment in medical education. Methods to eliminate or control validity threats are suggested.
Performance of high school male athletes on the Functional Movement Screen™.
Smith, Laura J; Creps, James R; Bean, Ryan; Rodda, Becky; Alsalaheen, Bara
2017-09-01
(1) Describe the performance of the Functional Movement Screen™ (FMS™) by reporting the proportion of adolescents with a score of ≤14 and the frequency of asymmetries in a cross-sectional sample; (2) explore associations between FMS™ to age and body mass, and explore the construct validity of the FMS™ against common postural stability measures; (3) examine the inter-rater and test-retest reliability of the FMS™ in adolescents. Cross-sectional. Field-setting. 94 male high-school athletes. The FMS™, Y-Balance Test (YBT) and Balance Error Scoring System (BESS). The median FMS™ composite score was 16 (9-21), 33% of participants scored below the suggested injury risk cutoff composite score of ≤14, and 62.8% had at least one asymmetry. No relationship was observed between the FMS™ to common static/dynamic balance tests. The inter-rater reliability of the FMS™ composite score suggested good reliability (ICC = 0.88, CI 95%:0.77, 0.94) and test-retest reliability for FMS™ composite scores was good with ICC = 0.83 (CI 95%:0.56, 0.95). FMS™ results should be interpreted cautiously with attention to the asymmetries identified during the screen, regardless of composite score. The lack of relationship between the FMS™ and other balance measures supports the notion that multiple screening tests should be used in order to provide a comprehensive picture of the adolescent athlete. Copyright © 2017 Elsevier Ltd. All rights reserved.
Dench, Rosalie; Sulistyo, Fransiska; Fahroni, Agus; Philippa, Joost
2015-12-01
The tuberculin skin test (TST) has been the mainstay of tuberculosis (TB) testing in primates for decades, but its interpretation in orangutans (Pongo spp.) is challenging, because many animals react strongly, without evidence of infection with Mycobacterium tuberculosis complex. One explanation is cross-reactivity with environmental nontuberculous mycobacteria (NTM). The use of a comparative TST (CTST), comparing reactivity to avian (representing NTM) and bovine (representing tuberculous mycobacteria) tuberculins aids in distinguishing cross-reactivity due to sensitization by NTM from shared antigens. The specificity of the TST can be increased with the use of CTST. We considered three interpretations of the TST in rehabilitant Bornean orangutans ( Pongo pygmaeus ) using avian purified protein derivative (APPD; 25,000 IU/ml) and two concentrations of bovine purified protein derivative (BPPD; 100,000 and 32,500 IU/ml). The tests were evaluated for their ability to identify accurately seven orangutans previously diagnosed with and treated for TB from a group of presumed negative individuals (n = 288 and n = 161 for the two respective BPPD concentrations). BPPD at 32,500 IU/ml had poor diagnostic capacity, whereas BPPD at 100,000 IU/ml performed better. The BPPD-only interpretation had moderate sensitivity (57%) and poor specificity (40%) and accuracy (41%). The comparative interpretation at 72 hr had similar sensitivity (57%) but improved specificity (95%) and accuracy (94%). However, best results were obtained by a comparative interpretation incorporating the 48- and 72-hr scores, which had good sensitivity (86%), specificity (95%) and accuracy (95%). These data reinforce recommendations that a CTST be used in orangutans and support the use of APPD at 25,000 IU/ml and BPPD at 100,000 IU/ml. The highest score at each site from the 48- and 72-hr checks should be considered the result for that tuberculin. If the bovine result is greater than the avian result, the animal should be considered a TB suspect.
Paramedic electrocardiogram and rhythm identification: a convenient training device.
Hale, Peggy; Lowe, Robert; Seamon, Jason P; Jenkins, James J
2011-10-01
A common reason for utilizing local paramedics and the emergency medical services is for the recognition and immediate treatment of chest pain, a complaint that has multiple possible etiologies. While many of those complaining of disease processes responsible for chest pain are benign, some will be life-threatening and will require immediate identification and treatment. The ability of paramedics to not only perform field electrocardiograms (ECGs), but to accurately diagnose various unstable cardiac rhythms has shown significant reduction in time to specific treatments. Increasing the overall accuracy of ECG interpretation by paramedics has the potential to facilitate early and appropriate treatment and decrease patient morbidity and mortality. A convenient training device (flip book) on ambulances and in common areas in the fire station could improve field interpretation of certain cardiac rhythms. This training device consists of illustrated sample ECG tracings and their associated diagnostic criteria. The goal was to enhance the recognition and interpretation of ECGs, and thereby, reduce delays in the initiation of treatment and potential complications associated with misinterpretation.This study was a prospective, observational study using a matched pre-test/post-test design. The study period was from November 2008 to December 2008. A total of 136 paramedics were approached to participate in this study. A pre-test consisting of 15 12-lead ECGs was given to all paramedics who agreed to participate in the study. Once the pre-tests were completed, the flip books were placed in common areas. Approximately one month after the flip books were made available to the paramedics, a post-test was administered.Statistical comparisons were made between the pre- and post-test scores for both the global test and each type of rhythm. Using these data, there were no statistically significant improvements in the global ECG interpretation or on individual rhythm interpretations. A flip book with multiple ECG rhythms and definitions without the benefit of any outside support was not effective in improving paramedic identification of ECG rhythms on a post-test. Suggestions for further research include repeating the study with a larger sample size; utilizing a lecturer to explain how to use the flip book in the most efficient manner; reiterating how to read and interpret ECGs; and answering questions. Comparing test scores of paramedic students, and newly certified paramedics as opposed to veteran paramedics also may indicate that the flip books are more suited for one group over another.
ERIC Educational Resources Information Center
Alaska Department of Education and Early Development, 2005
2005-01-01
The purpose of the High School Graduation Qualifying Examination (HSGQE) is to determine student competency in the areas of reading, English, and mathematics. The HSGQE provides this information in the form of test scores that reflect the essential skills that students should know as a result of their public school experience. The requirement to…
ERIC Educational Resources Information Center
Meloy, Linda L.; Deville, Craig; Frisbie, David
The effect of the Read Aloud accommodation on the performances of learning disabled in reading (LD-R) and non-learning disabled (non LD) middle school students was studied using selected texts from the Iowa Tests of Basic Skills (ITBS) achievement battery. Science, Usage and Expression, Math Problem Solving and Data Interpretation, and Reading…
Omoruyi, Emma A; Dunkle, Jesse; Dendy, Colby; McHugh, Erin; Barratt, Michelle S
2018-03-01
Telephone interpretation and recent technology advances assist patients with more timely access to rare languages, but no one has examined the role of this technology in the medical setting and how medical students can be prepared for their use. We sought to determine if structured curriculum on interpretation would promote learners self-reported competency in these encounters and if proficiency would be demonstrated in actual patient encounters. Training on the principles of interpreter use with a focus on communication technology was added to medical student education. The students later voluntarily completed a retrospective pre/post training competency self-assessment. A cohort of students rotating at a clinical site had a blinded review of their telephone interpretation encounters scored on a modified validated scale and compared to scored encounters with preintervention learners. Nested ANOVA models were used for audio file analysis. A total of 176 students who completed the training reported a statistically significant improvement in all 4 interpretation competency domains. Eighty-three audio files were analyzed from students before and after intervention. These scored encounters showed no statistical difference between the scores of the 2 groups. However, plotting the mean scores over time from each encounter suggests that those who received the curriculum started their rotation with higher scores and maintained those scores. In an evaluation of learners' ability to use interpreters in actual patient encounters, focused education led to earlier proficiency of using interpreters compared to peers who received no training. Copyright © 2018 Academic Pediatric Association. Published by Elsevier Inc. All rights reserved.
Beck, Irene R; Schmid, Nicole S; Berres, Manfred; Monsch, Andreas U
2014-06-01
The diagnosis of mild cognitive impairment (MCI) and dementia requires detailed neuropsychological examinations. These examinations typically yield a large number of outcome variables, which may complicate the interpretation and communication of results. The purposes of this study were the following: (i) to reduce a large data set of interrelated neuropsychological variables to a smaller number of cognitive dimensions; (ii) to create a common metric for these dimensions (z-scores); and (iii) to study the ability of the cognitive dimensions to distinguish between groups of patients with different types of cognitive impairment. We tested 1646 patients with different forms of dementia or with a major depression with a standard (n = 632) or, if cognitively less affected, a challenging neuropsychological battery (n = 1014). To identify the underlying cognitive dimensions of the two test batteries, maximum likelihood factor analyses with a promax rotation were conducted. To interpret the sum scores of the factors as standard scores, we divided them by the standard deviation of a cognitively healthy sample (n = 1145). The factor analyses yielded seven factors for each test battery. The cognitive dimensions in both test batteries distinguished patients with different forms of dementia (MCI, Alzheimer's dementia or frontotemporal dementia) and patients with major depression. Furthermore, patients with stable MCI could be separated from patients with progressing MCI. Discriminant analyses with an independent new sample of patients (n = 306) revealed that the new dimension scores distinguished new samples of patients with MCI from patients with Alzheimer's dementia with high accuracy. These findings suggest that these cognitive dimensions may benefit neuropsychological diagnostics. © 2013 The Authors International Journal of Geriatric Psychiatry Published by John Wiley & Sons Ltd.
Beck, Irene R; Schmid, Nicole S; Berres, Manfred; Monsch, Andreas U
2014-01-01
Objective The diagnosis of mild cognitive impairment (MCI) and dementia requires detailed neuropsychological examinations. These examinations typically yield a large number of outcome variables, which may complicate the interpretation and communication of results. The purposes of this study were the following: (i) to reduce a large data set of interrelated neuropsychological variables to a smaller number of cognitive dimensions; (ii) to create a common metric for these dimensions (z-scores); and (iii) to study the ability of the cognitive dimensions to distinguish between groups of patients with different types of cognitive impairment. Methods We tested 1646 patients with different forms of dementia or with a major depression with a standard (n = 632) or, if cognitively less affected, a challenging neuropsychological battery (n = 1014). To identify the underlying cognitive dimensions of the two test batteries, maximum likelihood factor analyses with a promax rotation were conducted. To interpret the sum scores of the factors as standard scores, we divided them by the standard deviation of a cognitively healthy sample (n = 1145). Results The factor analyses yielded seven factors for each test battery. The cognitive dimensions in both test batteries distinguished patients with different forms of dementia (MCI, Alzheimer's dementia or frontotemporal dementia) and patients with major depression. Furthermore, patients with stable MCI could be separated from patients with progressing MCI. Discriminant analyses with an independent new sample of patients (n = 306) revealed that the new dimension scores distinguished new samples of patients with MCI from patients with Alzheimer's dementia with high accuracy. Conclusion These findings suggest that these cognitive dimensions may benefit neuropsychological diagnostics. PMID:24227657
Adaptive Modulation Approach for Robust MPEG-4 AAC Encoded Audio Transmission
2011-11-01
as shown in Table 1. Table 1 specifies the perceptual interpretation of the ODG. Subjective Difference Grade ( SDG ) = Grade Signal under test... SDG using human hearing and cognitive model [8], [9]. Freely available PEAQ basic model, “PQevalAudio,” is used in this paper which is available as...PEAQ-ODG Score [6] Impairment ITU-R Five Grade Impairment Scale SDG /PEAQ-ODG Score Imperceptible 5.00 0.00 Perceptible, but not Annoying 4.00
Wells, Erica L; Kofler, Michael J; Soto, Elia F; Schaefer, Hillary S; Sarver, Dustin E
2018-01-01
Pediatric ADHD is associated with impairments in working memory, but these deficits often go undetected when using clinic-based tests such as digit span backward. The current study pilot-tested minor administration/scoring modifications to improve digit span backward's construct and predictive validities in a well-characterized sample of children with ADHD. WISC-IV digit span was modified to administer all trials (i.e., ignore discontinue rule) and count digits rather than trials correct. Traditional and modified scores were compared to a battery of criterion working memory (construct validity) and academic achievement tests (predictive validity) for 34 children with ADHD ages 8-13 (M=10.41; 11 girls). Traditional digit span backward scores failed to predict working memory or KTEA-2 achievement (allns). Alternate administration/scoring of digit span backward significantly improved its associations with working memory reordering (r=.58), working memory dual-processing (r=.53), working memory updating (r=.28), and KTEA-2 achievement (r=.49). Consistent with prior work, these findings urge caution when interpreting digit span performance. Minor test modifications may address test validity concerns, and should be considered in future test revisions. Digit span backward becomes a valid measure of working memory at exactly the point that testing is traditionally discontinued. Copyright © 2017 Elsevier Ltd. All rights reserved.
Hanson, Lisa C; McBurney, Helen; Taylor, Nicholas F
2012-03-01
The purpose of this paper was to determine if the Six-minute Walk Test (6MWT) was a reliable exercise test for patients referred to cardiac rehabilitation when up to three tests were performed and to determine if test scores differed according to between-test time interval. Thirty adults aged 63 ± 7.9 years referred to cardiac rehabilitation participated in a repeated measures reliability trial. Participants completed three 6MWTs within a one-week period. Participants were randomly allocated to one of three groups: on the first day, Group A completed three walks, Group B completed two walks and Group C completed one walk. Relative reliability was expressed in a ratio (ICC(2,1) ), and absolute reliability was expressed in metres (95% confidence intervals) for group and individuals. The 6MWT demonstrated a high level of relative reliability (intraclass correlation coefficients [ICC] = 0.94) across the three walks. There was no statistically significant difference between the test scores of the three groups. However, there was an increase in distance walked from the first to the second to the third 6MWT. Absolute reliability indicated that a change of at least 44 m would be required to be interpreted as true change in a group, and at least 95 m to be interpreted as true change in an individual with 95% confidence. Three 6MWTs completed in relatively short timeframes were not sufficient for reliable results as there was an increase in the distance walked, and relatively large increases in distances would be required to be interpreted as change. It did not make any difference whether the tests were all completed on one day or over one week. This study highlighted problems that may arise when relying on reliability coefficients alone to interpret reliability. These results suggest that the 6MWT may not have sufficient reliability to be a suitable test to evaluate exercise tolerance in patients referred to cardiac rehabilitation. Copyright © 2011 John Wiley & Sons, Ltd.
Brooks, Brian L
2011-01-01
Knowing the prevalence of low neurocognitive scores for the WISC-IV Canadian normative sample (WISC-IV(CDN)) is an important supplement for clinical interpretation of test performance. On the WISC-IV(CDN), it is uncommon for children and adolescents to have 4 or more subtest scores or 2 or more Index scores ≤ 9th percentile when all scores on the battery are considered simultaneously. As the level of the child's intelligence increases or the number of years of parental education increases, the prevalence of low scores decreases. These results are consistent with existing studies of the base rates of low scores in children and adolescents on pediatric cognitive batteries, including the WISC-IV American normative sample. Tables provided are ready for clinical use.
A multiple reader scoring system for Nasal Potential Difference parameters.
Solomon, George M; Liu, Bo; Sermet-Gaudelus, Isabelle; Fajac, Isabelle; Wilschanski, Michael; Vermeulen, Francois; Rowe, Steven M
2017-09-01
Nasal Potential Difference (NPD) is a biomarker of CFTR activity used to diagnose CF and monitor experimental therapies. Limited studies have been performed to assess agreement between expert readers of NPD interpretation using a scoring algorithm. We developed a standardized scoring algorithm for "interpretability" and "confidence" for PD (potential difference) measures, and sought to determine the degree of agreement on NPD parameters between trained readers. There was excellent agreement for interpretability between NPD readers for CF and fair agreement for normal tracings but slight agreement of interpretability in indeterminate tracings. Amongst interpretable tracings, excellent correlation of mean scores for Ringer's Baseline PD, Δ amiloride , and Δ Cl-free+Isoproterenol was observed. There was slight agreement regarding confidence of the interpretable PD tracings, resulting in divergence of the Ringers and Δ amiloride , and ΔCl -free+Isoproterenol PDs between "high" and "low" confidence CF tracings. A multi-reader process with adjudication is important for scoring NPDs for diagnosis and in monitoring of CF clinical trials. Copyright © 2017 European Cystic Fibrosis Society. Published by Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Van Dijk, Rick; Christoffels, Ingrid; Postma, Albert; Hermans, Daan
2012-01-01
In two experiments we investigated the relationship between the working memory skills of sign language interpreters and the quality of their interpretations. In Experiment 1, we found that scores on 3-back tasks with signs and words were not related to the quality of interpreted narratives. In Experiment 2, we found that memory span scores for…
Evaluating Evidence Regarding Relationships with Criteria
ERIC Educational Resources Information Center
Balkin, Richard S.
2017-01-01
An overview of standards related to demonstrating evidence regarding relationships with criteria as it pertains to instrument development was presented, along with heuristic examples. Additional measures and a comprehensive design are necessary to establish evidence related to the use and interpretation of test scores for the validation of a…
ERIC Educational Resources Information Center
Arjoon, Janelle A.; Xu, Xiaoying; Lewis, Jennifer E.
2013-01-01
education community are relatively new. Because psychometric evidence dictates the validity of interpretations made from test scores, gathering and reporting validity and reliability evidence is of utmost importance. Therefore, the purpose of this study was to investigate what…
Black Self-Esteem and Desegregated Schools.
ERIC Educational Resources Information Center
Drury, Darrel W.
1980-01-01
Discusses a study to determine attitudes among Black and White students in 194 southern high schools regarding desegregation. Data are presented on differences between schools; test-score achievement; and variations in self-esteem among students in predominantly White, Black, and racially mixed schools. Findings are interpreted in light of…
Investigating the mental abilities of rural Zulu primary school children in South Africa.
Jinabhai, C C; Taylor, M; Rangongo, M F; Mkhize, N J; Anderson, S; Pillay, B J; Sullivan, K R
2004-02-01
Maximising the full potential of health and educational interventions in South African schools requires assessment of the current level of mental abilities of the school children as measured by cognitive and scholastic tests and the identification of any barriers to improved performance. This study reports on the application and interpretation of a selected battery of mental ability tests among Zulu school children and the methodological and analytical issues that need to be addressed. The test scores of 806 primary school children from a rural community are presented, based on four tests: Raven's Coloured Progressive Matrices (CPM), an Auditory Verbal Learning Test (AVLT), the Symbol Digit Modalities Test (SDMT) and Young's Group Mathematics Test (GMT). Significant gender differences were found in the test scores, and the mean scores of Zulu children in this study were lower than those reported in other studies. The results of this selected test battery provide data for the further development of appropriate test instruments for South African conditions. These results can contribute towards the development of a test battery for South African children that can be used to assess and improve their school performance.
Yamaguchi, Haruyasu; Maki, Yohko; Yamaguchi, Tomoharu
2011-12-01
Communicative disability is regarded as a prominent symptom of demented patients, and many studies have been devoted to analyze deficits of lexical-semantic operations in demented patients. However, it is often observed that even patients with preserved lexical-semantic skills might fail in interactive social communication. Whereas social interaction requires pragmatic language skills, pragmatic language competencies in demented subjects have not been well understood. We propose here a brief stress-free test to detect pragmatic language deficits, focusing on non-literal understanding of figurative expression. We hypothesized that suppression of the literal interpretation was required for figurative language interpretation. We examined 69 demented subjects, 13 subjects with mild cognitive impairment and 61 healthy controls aged 65 years or more. The subjects were asked the meaning of a familiar proverb categorized as a figurative expression. The answers were analyzed based on five factors, and scored from 0 to 5. To consider the influence of cognitive inhibition on proverb comprehension, the scores of the Stroop Colour-Word Test were compared concerning correct and incorrect answers for each factor, respectively. Furthermore, the characteristics of answers were considered in the light of excuse and confabulation qualitatively. The proverb comprehension scores gradually decreased significantly as dementia progressed. The literal interpretation of the proverb, which showed difficulties in figurative language comprehension, was related to disinhibition. The qualitative analysis showed that excuse and confabulation increased as the dementia stage progressed. Deficits in cognitive inhibition partly explains the difficulties in interactive social communication in dementia. With qualitative analysis, asking the meaning of a proverb can be a brief test applied in a clinical setting to evaluate the stage of dementia, and to illustrate disinhibition, confabulation and excuse, which might cause discommunication and psychosocial maladjustment in demented patients. © 2011 The Authors. Psychogeriatrics © 2011 Japanese Psychogeriatric Society.
Scoring in genetically modified organism proficiency tests based on log-transformed results.
Thompson, Michael; Ellison, Stephen L R; Owen, Linda; Mathieson, Kenneth; Powell, Joanne; Key, Pauline; Wood, Roger; Damant, Andrew P
2006-01-01
The study considers data from 2 UK-based proficiency schemes and includes data from a total of 29 rounds and 43 test materials over a period of 3 years. The results from the 2 schemes are similar and reinforce each other. The amplification process used in quantitative polymerase chain reaction determinations predicts a mixture of normal, binomial, and lognormal distributions dominated by the latter 2. As predicted, the study results consistently follow a positively skewed distribution. Log-transformation prior to calculating z-scores is effective in establishing near-symmetric distributions that are sufficiently close to normal to justify interpretation on the basis of the normal distribution.
Application of exercise ECG stress test in the current high cost modern-era healthcare system.
Vaidya, Gaurang Nandkishor
Exercise electrocardiogram (ECG) tests boasts of being more widely available, less resource intensive, lower cost and absence of radiation. In the presence of a normal baseline ECG, an exercise ECG test is able to generate a reliable and reproducible result almost comparable to Technitium-99m sestamibi perfusion imaging. Exercise ECG changes when combined with other clinical parameters obtained during the test has the potential to allow effective redistribution of scarce resources by excluding low risk patients with significant accuracy. As we look towards a future of rising healthcare costs, increased prevalence of cardiovascular disease and the need for proper allocation of limited resources; exercise ECG test offers low cost, vital and reliable disease interpretation. This article highlights the physiology of the exercise ECG test, patient selection, effective interpretation, describe previously reported scores and their clinical application in today's clinical practice. Copyright © 2017. Published by Elsevier B.V.
Briscoe, J; Rankin, P M
2009-01-01
Children with specific language impairment (SLI) often experience difficulties in the recall and repetition of verbal information. Archibald and Gathercole (2006) suggested that children with SLI are vulnerable across two separate components of a tripartite model of working memory (Baddeley and Hitch 1974). However, the hierarchical relationship between the 'slave' systems (temporary storage) and the central executive components places a particular challenge for interpreting working memory profiles within a tripartite model. This study aimed to examine whether a 'double-jeopardy' assumption is compatible with a hierarchical relationship between the phonological loop and central executive components of the working memory model in children with SLI. If a strong double-jeopardy assumption is valid for children with SLI, it was predicted that raw scores of working memory tests thought to tap phonological loop and central executive components of tripartite working memory would be lower than the scores of children matched for chronological age and those of children matched for language level, according to independent sources of constraint. In contrast, a hierarchical relationship would imply that a weakness in a slave component of working memory (the phonological loop) would also constrain performance on tests tapping a super-ordinate component (central executive). This locus of constraint would predict that scores of children with SLI on working memory tests that tap the central executive would be weaker relative to the scores of chronological age-matched controls only. Seven subtests of the Working Memory Test Battery for Children (Digit recall, Word recall, Non-word recall, Word matching, Listening recall, Backwards digit recall and Block recall; Pickering and Gathercole 2001) were administered to 14 children with SLI recruited via language resource bases and specialist schools, as well as two control groups matched on chronological age and vocabulary level, respectively. Mean group differences were ascertained by directly comparing raw scores on memory tests linked to different components of the tripartite model using a series of multivariate analyses. The majority of working memory scores of the SLI group were depressed relative to chronological age-matched controls, with the exception of spatial recall (block tapping) and word (order) matching tasks. Marked deficits in serial recall of words and digits were evident, with the SLI group scoring more poorly than the language-ability matched control group on these measures. Impairments of the SLI group on phonological loop tasks were robust, even when covariance with executive working memory scores was accounted for. There was no robust effect of group on complex working memory (central executive) tasks, despite a slight association between listening recall and phonological loop measures. A predominant feature of the working memory profile of SLI was a marked deficit on phonological loop tasks. Although scores on complex working memory tasks were also depressed, there was little evidence for a strong interpretation of double-jeopardy within working memory profiles for these children, rather these findings were consistent with an interpretation of a constraint on phonological loop for children with SLI that operated at all levels of a hierarchical tripartite model of working memory (Baddeley and Hitch 1974). These findings imply that low scores on complex working memory tasks alone do not unequivocally imply an independent deficit in central executive (domain-general) resources of working memory and should therefore be treated cautiously in a clinical context.
Binetruy, M; Mauny, F; Lavaux, M; Meyer, A; Sylvestre, G; Puyraveau, M; Berger, E; Magnin, E; Vandel, P; Galmiche, J; Chopard, G
Cognitive evaluation of young subjects is now widely carried out for non-traumatic diseases such as multiple sclerosis, HIV, or sleep disorders. This evaluation requires normative data based on healthy adult samples. However, most clinicians use a set of tests that were normed in an isolated manner from different samples using different cutoff criteria. Thus, the score of an individual may be considered either normal or impaired according to the norms used. It is well established that healthy adults obtained low-test scores when a battery of tests is administered. Thus, the knowledge of low base rates is required so as to minimize false diagnosis of cognitive impairment. The aim of this study was twofold (1) to provide normative data for RAPID-II battery in healthy adults, and (2) estimate the proportion of healthy adults having low scores across this battery. Norms for the 44 test scores of the RAPID-II test battery were developed using the overall sample of 335 individuals based on three categories of age (20 to 29, 30 to 39, and 40 to 49 years) and two educational levels: Baccalaureate or higher educational degree (high educational level), lower than baccalaureate (low educational level). The 5th, 25th, 50th, and 75th percentiles were calculated from the six age and education subsamples and used to define norms. The frequency of low scores on the RAPID-II battery was calculated by simultaneously examining the performance of 33 primary scores. A low score was defined as less than or equal to the 5th percentile drawn from the six age and education normative subsamples. In addition, the percentages of low scores were also determined when all possible combinations of two-test scores across the RAPID-II were considered in the overall normative sample. Our data showed that 59.4% subjects of the normative sample obtained at least one or more low score. With more than 9 test scores, this percentage was equal to 0% in the normative sample. Among all combinations of two-test scores, 96% had a false positive rate<2%. Low scores are very common in young healthy subjects and are more obvious when simultaneously analyzing test scores across a battery of tests and are thus not necessarily indicative of cognitive impairment. The combinations of two-test scores can be a useful tool to improve the interpretation of low scores. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
[Interpretation of proverbs and Alzheimer's disease].
Báez, S; Mendoza, L; Reyes, P; Matallana, D; Montañés, P
To evaluate the performance of patients with Alzheimer's disease (AD) in the mild-moderate stage in a verbal material abstraction task that involves interpreting the implicit meaning of proverbs and sayings. A qualitative-quantitative analysis was carried out of the performance of 30 patients with AD and 30 controls, paired by age, gender and level of education. Patients had significantly greater difficulties than the controls when it came to interpreting proverbs. A high correlation was found between subjects' years of schooling and the overall score on the proverb interpretation test. Results suggest that the processes that may be predominantly affected in patients with AD are the investigation of the conditions of the problem, together with selecting an alternative and formulating a cognitive plan to resolve the task. The results help to further our knowledge of the characteristics of performance of patients with AD in a test involving the interpretation of the implicit meaning of proverbs and also provide information about the processes that may be predominantly affected. Further research is needed, however, on this subject area in order to obtain more conclusive explanations.
Measuring Social Competence with the Wechsler Picture Arrangement and Comprehension Subtests.
ERIC Educational Resources Information Center
Campbell, Jonathan M.; McCord, David M.
1999-01-01
Tested the traditional assumption that the Wechsler Adult Intelligence Scale-Revised and the Wechsler Intelligence Scale for Children-Revised Picture Arrangement and Comprehension subtests are measures of social competence using scores from 136 children and adolescents. Cautions against interpreting either subtest as an indicator of social…
Measuring Growth with Vertical Scales
ERIC Educational Resources Information Center
Briggs, Derek C.
2013-01-01
A vertical score scale is needed to measure growth across multiple tests in terms of absolute changes in magnitude. Since the warrant for subsequent growth interpretations depends upon the assumption that the scale has interval properties, the validation of a vertical scale would seem to require methods for distinguishing interval scales from…
Alignment of Standards and Assessments as an Accountability Criterion.
ERIC Educational Resources Information Center
La Marca, Paul M.
2001-01-01
Provides an overview of the concept of alignment and the role it plays in assessment and accountability systems. Discusses some methodological issues affecting the study of alignment and explores the relationship between alignment and test score interpretation. Alignment is not only a methodological requirement but also an ethical requirement.…
Caetano, Ana Celia; Dias, Sara; Santa-Cruz, André; Rolanda, Carla
2018-01-01
Recently, the Obstructed Defecation Syndrome score (ODS score) was developed and validated by Renzi to assess clinical staging and to allow evaluation and comparison of the efficacy of treatment of this disorder. Our goal is to validate the Portuguese version of Renzi ODS score, according to the Consensus based Standards for the selection of the Health Measurement Instruments (COSMIN) checklist. Following guidelines for cross-cultural validity, Renzi ODS score was translated into the Portuguese language. Then, a group of patients and healthy controls were invited to fill in the Renzi ODS score at baseline, after 2 weeks and 3 months, respectively. We assessed internal consistency, reliability and measurement error, content and construct validity, responsiveness and interpretability. A total of 113 individuals (77 patients; 36 healthy controls) completed the questionnaire. Seventy and 30 patients repeated the Renzi ODS score after 2 weeks and 3 months respectively. Factor analysis confirmed the unidimensionality of the scale. Cronbach's α coefficient of 0.77 supported item's homogeneity. Weighted quadratic kappa of 0.89 established test-retest reliability. The smallest detectable change at the individual level was 2.66 and at the group level was 0.30. Renzi ODS score and the total (-0.32) and physical (-0.43) SF-36 scores correlated negatively. Patient and control's groups significantly differed (11 points). The change score of Renzi ODS score between baseline and 3 months correlated negatively with the clinical evolution (-0.86). ROC analysis showed minimal important change of 2.00 with AUC 0.97. Neither floor nor ceiling effects were observed. This work validated the Portuguese version of Renzi ODS score. We can now use this reliable, responsive, and interpretable (at the group level) tool to evaluate Portuguese ODS patients.
Rubinstein, Jack; Dhoble, Abhijeet; Ferenchick, Gary
2009-01-01
Background Most medical professionals are expected to possess basic electrocardiogram (EKG) interpretation skills. But, published data suggests that residents' and physicians' EKG interpretation skills are suboptimal. Learning styles differ among medical students; individualization of teaching methods has been shown to be viable and may result in improved learning. Puzzles have been shown to facilitate learning in a relaxed environment. The objective of this study was to assess efficacy of teaching puzzle in EKG interpretation skills among medical students. Methods This is a reader blinded crossover trial. Third year medical students from College of Human Medicine, Michigan State University participated in this study. Two groups (n = 9) received two traditional EKG interpretation skills lectures followed by a standardized exam and two extra sessions with the teaching puzzle and a different exam. Two other groups (n = 6) received identical courses and exams with the puzzle session first followed by the traditional teaching. EKG interpretation scores on final test were used as main outcome measure. Results The average score after only traditional teaching was 4.07 ± 2.08 while after only the puzzle session was 4.04 ± 2.36 (p = 0.97). The average improvement after the traditional session was followed up with a puzzle session was 2.53 ± 1.94 while the average improvement after the puzzle session was followed with the traditional session was 2.08 ± 1.73 (p = 0.67). The final EKG exam score for this cohort (n = 15) was 84.1 compared to 86.6 (p = 0.22) for a comparable sample of medical students (n = 15) at a different campus. Conclusion Teaching EKG interpretation with puzzles is comparable to traditional teaching and may be particularly useful for certain subgroups of students. Puzzle session are more interactive and relaxing, and warrant further investigations on larger scale. PMID:19144134
Item Response Theory Modeling of the Philadelphia Naming Test.
Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D
2015-06-01
In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating explanatory variables to item difficulty. This article describes the statistical model underlying the computer adaptive PNT presented in a companion article (Hula, Kellough, & Fergadiotis, 2015). Using archival data, we evaluated the fit of the PNT to 1- and 2-parameter logistic models and examined the precision of the resulting parameter estimates. We regressed the item difficulty estimates on three predictor variables: word length, age of acquisition, and contextual diversity. The 2-parameter logistic model demonstrated marginally better fit, but the fit of the 1-parameter logistic model was adequate. Precision was excellent for both person ability and item difficulty estimates. Word length, age of acquisition, and contextual diversity all independently contributed to variance in item difficulty. Item-response-theory methods can be productively used to analyze and quantify anomia severity in aphasia. Regression of item difficulty on lexical variables supported the validity of the PNT and interpretation of anomia severity scores in the context of current word-finding models.
Vartanian, L R; Powlishta, K K
2001-06-01
Self-consciousness during early adolescence has been explained as an outcome of adolescent egocentrism, in which adolescents create an imaginary audience (IA) of attentive, critical peers. The possibility that such self-consciousness might result from contact with peers who are more attentive and critical than those encountered during childhood or adulthood has not been considered. Study 1 tested whether young adults, who are not theoretically susceptible to IA, could be made to receive high scores on IA and self-consciousness measures by having them complete a procedure in 1 of 3 laboratory conditions-a critical audience, a noncritical audience, or no audience. However, participants in the critical-audience condition received significantly lower IA and self-consciousness scores than participants in the no-audience condition did. Study 2 tested whether the directions given to Study 1 participants might have been responsible for the unexpected findings. Results indicated that participants instructed to give mature-sounding responses received lower IA/self-consciousness scores than did those asked to report their honest opinions. Together, the results of Studies 1 and 2 indicated that survey measures of IA are subject to demand characteristics and highlighted the need to interpret with caution age differences in IA as traditionally assessed.
2014-01-01
Background Objective physical assessment of patients with lumbar spondylosis involves plain film radiographs (PFR) viewing and interpretation by the radiologists. Physiotherapists also routinely assess PFR within the scope of their practice. However, studies appraising the level of agreement of physiotherapists’ PFR interpretation with radiologists are not common in Ghana. Method Forty-one (41) physiotherapists took part in the cross-sectional survey. An assessment guide was developed from findings of the interpretation of three PFR of patients with lumbar spondylosis by a radiologist. The three PFR were selected from a pool of different radiographs based on clarity, common visible pathological features, coverage body segments and short post production period. Physiotherapists were required to view the same PFR after which they were assessed with the assessment guide according to the number of features identified correctly or incorrectly. The score range on the assessment form was 0–24, interpreted as follow: 0–8 points (low), 9–16 points (moderate) and 17–24 points (high) levels of agreement. Data were analyzed using one sample t-test and fisher’s exact test at α = 0.05. Results The mean score of interpretation for the physiotherapists was 12.7 ± 2.6 points compared to the radiologist’s interpretation of 24 points (assessment guide). The physiotherapists’ levels were found to be significantly associated with their academic qualification (p = 0.006) and sex (p = 0.001). However, their levels of agreement were not significantly associated with their age group (p = 0.098), work settings (p = 0.171), experience (p = 0.666), preferred PFR view (p = 0.088) and continuing education (p = 0.069). Conclusions The physiotherapists’ skills fall short of expectation for interpreting PFR of patients with lumbar spondylosis. The levels of agreement with radiologist’s interpretation have no link with year of clinial practice, age, work settings and continuing education. Thus, routine PFR viewing techniques should be made a priority in physiotherapists’ continuing professional education. PMID:24678695
Wolf, Timothy J; Dahl, Abigail; Auen, Colleen; Doherty, Meghan
2017-07-01
The objective of this study was to evaluate the inter-rater reliability, test-retest reliability, concurrent validity, and discriminant validity of the Complex Task Performance Assessment (CTPA): an ecologically valid performance-based assessment of executive function. Community control participants (n = 20) and individuals with mild stroke (n = 14) participated in this study. All participants completed the CTPA and a battery of cognitive assessments at initial testing. The control participants completed the CTPA at two different times one week apart. The intra-class correlation coefficient (ICC) for inter-rater reliability for the total score on the CTPA was .991. The ICCs for all of the sub-scores of the CTPA were also high (.889-.977). The CTPA total score was significantly correlated to Condition 4 of the DKEFS Color-Word Interference Test (p = -.425), and the Wechsler Test of Adult Reading (p = -.493). Finally, there were significant differences between control subjects and individuals with mild stroke on the total score of the CTPA (p = .007) and all sub-scores except interpretation failures and total items incorrect. These results are also consistent with other current executive function performance-based assessments and indicate that the CTPA is a reliable and valid performance-based measure of executive function.
Character pathology and neuropsychological test performance in remitted opiate dependence
Prosser, James M; Eisenberg, Daniel; Davey, Emily E; Steinfeld, Matthew; Cohen, Lisa J; London, Edythe D; Galynker, Igor I
2008-01-01
Background Cognitive deficits and personality pathology are prevalent in opiate dependence, even during periods of remission, and likely contribute to relapse. Understanding the relationship between the two in vulnerable, opiate-addicted patients may contribute to the design of better treatment and relapse prevention strategies. Methods The Millon Multiaxial Clinical Inventory (MCMI) and a series of neuropsychological tests were administered to three subject groups: 29 subjects receiving methadone maintenance treatment (MM), 27 subjects in protracted abstinence from methadone maintenance treatment (PA), and 29 healthy non-dependent comparison subjects. Relationships between MCMI scores, neuropsychological test results, and measures of substance use and treatment were examined using bivariate correlation and regression analysis. Results MCMI scores were greater in subjects with a history of opiate dependence than in comparison subjects. A significant negative correlation between MCMI scores and neuropsychological test performance was identified in all subjects. MCMI scores were stronger predictors of neuropsychological test performance than measures of drug use. Conclusion Formerly methadone-treated opiate dependent individuals in protracted opiate abstinence demonstrate a strong relationship between personality pathology and cognitive deficits. The cause of these deficits is unclear and most likely multi-factorial. This finding may be important in understanding and interpreting neuropsychological testing deficiencies in opiate-dependent subjects. PMID:19019247
[Cognitive Reserve Scale: testing the theoretical model and norms].
Leon-Estrada, I; Garcia-Garcia, J; Roldan-Tapia, L
2017-01-01
The cognitive reserve theory may contribute to explain cognitive performance differences among individuals with similar cognitive decline and among healthy ones. However, more psychometric analysis are needed to guarantee the usage of tests for assessing cognitive reserve. To study validity evidences in relation to the structure of the Cognitive Reserve Scale (CRS) and to create reference norms to interpret the scores. A total of 172 participants completed the scale and they were classified into two age groups: aged 36-64 years (n = 110) and 65-88 years (n = 62). The exploratory factor analysis using ESEM revealed that the data fitted the proposed model. Overall, the discriminative indices were acceptable (between 0.21 and 0.50) and congruence was observed in the periods of young adulthood, adulthood and late adulthood, in both age group. Besides, the index of reliability (Cronbach's alpha: 0.80) and the typical mean error test (mean: 51.40 ± 11.11) showed adequate values for this type of instrument. The CRS seemed to be set under the hypothetical theoretical model, and the scores might be interpreted by the norms showed. This study provided guarantees for the usage of the CRS in research.
Hong, Hye Jeong; Kim, Jin Sung; Seo, Wan Seok; Koo, Bon Hoon; Bai, Dai Seg; Jeong, Jin Young
2010-01-01
Objective We investigated executive functions (EFs), as evaluated by the Wisconsin Card Sorting Test (WCST), and other EF between lower grades (LG) and higher grades (HG) in elementary-school-age attention deficit hyperactivity disorder (ADHD) children. Methods We classified a sample of 112 ADHD children into 4 groups (composed of 28 each) based on age (LG vs. HG) and WCST performance [lower vs. higher performance on WCST, defined by the number of completed categories (CC)] Participants in each group were matched according to age, gender, ADHD subtype, and intelligence. We used the Wechsler intelligence Scale for Children 3rd edition to test intelligence and the Computerized Neurocognitive Function Test-IV, which included the WCST, to test EF. Results Comparisons of EFs scores in LG ADHD children showed statistically significant differences in performing digit spans backward, some verbal learning scores, including all memory scores, and Stroop test scores. However, comparisons of EF scores in HG ADHD children did not show any statistically significant differences. Correlation analyses of the CC and EF variables and stepwise multiple regression analysis in LG ADHD children showed a combination of the backward form of the Digit span test and Visual span test in lower-performance ADHD participants significantly predicted the number of CC (R2=0.273, p<0.001). Conclusion This study suggests that the design of any battery of neuropsychological tests for measuring EF in ADHD children should first consider age before interpreting developmental variations and neuropsychological test results. Researchers should consider the dynamics of relationships within EF, as measured by neuropsychological tests. PMID:20927306
O'Grady, Anthony; Allen, David; Happerfield, Lisa; Johnson, Nicola; Provenzano, Elena; Pinder, Sarah E; Tee, Lilian; Gu, Mai; Kay, Elaine W
2010-12-01
Immunohistochemistry (IHC) is used as the frontline assay to determine HER2 status in invasive breast cancer patients. The aim of the study was to compare the performance of the Leica Oracle HER2 Bond IHC System (Oracle) with the current most readily accepted Dako HercepTest (HercepTest), using both commercially validated and modified ASCO/CAP and UK HER2 IHC scoring guidelines. A total of 445 breast cancer samples from 3 international clinical HER2 referral centers were stained with the 2 test systems and scored in a blinded fashion by experienced pathologists. The overall agreement between the 2 tests in a 3×3 (negative, equivocal and positive) analysis shows a concordance of 86.7% and 86.3%, respectively when analyzed using commercially validated and modified ASCO/CAP and UK HER2 IHC scoring guidelines. There is a good concordance between the Oracle and the HercepTest. The advantages of a complete fully automated test such as the Oracle include standardization of key analytical factors and improved turn around time. The implementation of the modified ASCO/CAP and UK HER2 IHC scoring guidelines has minimal effect on either assay interpretation, showing that Oracle can be used as a methodology for accurately determining HER2 IHC status in formalin fixed, paraffin-embedded breast cancer tissue.
NASA Astrophysics Data System (ADS)
Clem, Douglas Wayne
Spatial ability refers to an individual's capacity to visualize and mentally manipulate three dimensional objects. Since sonographers manually manipulate 2D and 3D sonographic images to generate multi-viewed, logical, sequential renderings of an anatomical structure, it can be assumed that spatial ability is central to the perception and interpretation of these medical images. Using Ackerman's theory of ability determinants of skilled performance as a conceptual framework, this study explored the relationship of spatial ability and learning sonographic scanning. Beginning first year sonography students from four different educational institutions were administered a spatial abilities test prior to their initial scanning lab coursework. The students' spatial test scores were compared with their scanning competency performance scores. A significant relationship between the students' spatial ability scores and their scanning performance scores was found. This result suggests that the use of spatial ability tests for admission to sonography programs may improve candidate selection, as well as assist programs in adjusting instruction and curriculum for students who demonstrate low spatial ability.
Schuhbäck, A; Kolwelter, J; Achenbach, S
2016-08-01
Apart from the Diamond-Forrester classification, which is widely used particularly in the USA for the pretest probability of coronary artery disease, other scores also exist, such as an updated version of the classification table by Genders et al., the Morise score and the Duke clinical risk score. These scores estimate the probability of coronary artery disease, defined as the presence of at least one high-grade stenosis, based on symptom characteristics, age, gender and other parameters. All of the scores were derived from patient cohorts in which invasive coronary angiography had been performed for clinical reasons. It has subsequently been shown that these scores, especially those developed several decades ago, substantially overestimate the pretest probability of coronary artery disease. When these risk scores are applied to patients for whom a non-invasive work-up of suspected coronary artery disease is planned, for example by coronary computed tomography (CT) angiography, the expected prevalence of significant coronary stenosis will be overestimated. This, in turn, influences the test characteristics and the significance of the non-invasive examination (positive and negative predictive values) and needs to be taken into account when interpreting test results.
Wanat, Matthew; Fitousis, Kalliopi; Hall, Jeff; Rice, Lawrence
2013-06-01
The diagnosis of heparin-induced thrombocytopenia (HIT) may be challenging in critically ill patients, as heparin exposures are ubiquitous, and thrombocytopenia is common. Unwarranted ordering and incorrect interpretation of heparin antibody tests can expose a patient to adverse drug events and imposes a significant economic burden on our health care system. A prospective, observational study was performed over 4 months on all adult patients located in 5 intensive care units, with a heparin antibody test ordered. A platelet factor 4/heparin enzyme-linked immunosorbent assay (ELISA) test was ordered in 131 patients. In total, 110 patients had a low 4Ts score (0-3), and of these 103 had a negative ELISA result. In patients with a low 4Ts score, 0 (0%) of 110 had an optical density value >1.0. One hundred twenty-nine patients (98%) had another possible cause of thrombocytopenia identified. In critically ill patients, low 4Ts scores indicate a low probability of HIT, and heparin antibody testing in these patients is not useful.
Role of test motivation in intelligence testing.
Duckworth, Angela Lee; Quinn, Patrick D; Lynam, Donald R; Loeber, Rolf; Stouthamer-Loeber, Magda
2011-05-10
Intelligence tests are widely assumed to measure maximal intellectual performance, and predictive associations between intelligence quotient (IQ) scores and later-life outcomes are typically interpreted as unbiased estimates of the effect of intellectual ability on academic, professional, and social life outcomes. The current investigation critically examines these assumptions and finds evidence against both. First, we examined whether motivation is less than maximal on intelligence tests administered in the context of low-stakes research situations. Specifically, we completed a meta-analysis of random-assignment experiments testing the effects of material incentives on intelligence-test performance on a collective 2,008 participants. Incentives increased IQ scores by an average of 0.64 SD, with larger effects for individuals with lower baseline IQ scores. Second, we tested whether individual differences in motivation during IQ testing can spuriously inflate the predictive validity of intelligence for life outcomes. Trained observers rated test motivation among 251 adolescent boys completing intelligence tests using a 15-min "thin-slice" video sample. IQ score predicted life outcomes, including academic performance in adolescence and criminal convictions, employment, and years of education in early adulthood. After adjusting for the influence of test motivation, however, the predictive validity of intelligence for life outcomes was significantly diminished, particularly for nonacademic outcomes. Collectively, our findings suggest that, under low-stakes research conditions, some individuals try harder than others, and, in this context, test motivation can act as a third-variable confound that inflates estimates of the predictive validity of intelligence for life outcomes.
Role of test motivation in intelligence testing
Duckworth, Angela Lee; Quinn, Patrick D.; Lynam, Donald R.; Loeber, Rolf; Stouthamer-Loeber, Magda
2011-01-01
Intelligence tests are widely assumed to measure maximal intellectual performance, and predictive associations between intelligence quotient (IQ) scores and later-life outcomes are typically interpreted as unbiased estimates of the effect of intellectual ability on academic, professional, and social life outcomes. The current investigation critically examines these assumptions and finds evidence against both. First, we examined whether motivation is less than maximal on intelligence tests administered in the context of low-stakes research situations. Specifically, we completed a meta-analysis of random-assignment experiments testing the effects of material incentives on intelligence-test performance on a collective 2,008 participants. Incentives increased IQ scores by an average of 0.64 SD, with larger effects for individuals with lower baseline IQ scores. Second, we tested whether individual differences in motivation during IQ testing can spuriously inflate the predictive validity of intelligence for life outcomes. Trained observers rated test motivation among 251 adolescent boys completing intelligence tests using a 15-min “thin-slice” video sample. IQ score predicted life outcomes, including academic performance in adolescence and criminal convictions, employment, and years of education in early adulthood. After adjusting for the influence of test motivation, however, the predictive validity of intelligence for life outcomes was significantly diminished, particularly for nonacademic outcomes. Collectively, our findings suggest that, under low-stakes research conditions, some individuals try harder than others, and, in this context, test motivation can act as a third-variable confound that inflates estimates of the predictive validity of intelligence for life outcomes. PMID:21518867
From MCAT to M.D.: Predicting Success in Medical School.
ERIC Educational Resources Information Center
Jones, Robert F.
The effectiveness of Medical College Admission Test (MCAT) scores in predicting success during the first phase of medical education is investigated. The process by which medical students are educated and evaluated, the nature and purpose of the MCAT, and the MCAT Interpretive Studies Program developed by the Association of American Medical…
Test Review: A Review of the Five Factor Personality Inventory-Children
ERIC Educational Resources Information Center
Klingbeil, David A.
2009-01-01
This article presents a review of the Five Factor Personality Inventory-Children (FFPI-C), a quick and easily administered personality assessment for children and adolescents with clear and straightforward scoring and interpretation procedures. The FFPI-C is based on a theoretical model of personality developed through the work of Allport (Allport…
Harvey: The Impact of a Cardiovascular Teaching Simulator on Student Skill Acquisition.
ERIC Educational Resources Information Center
Woolliscroft, James O.; And Others
1987-01-01
A life-sized cardiovascular patient simulator was used in medical education in a standard sophomore physical skills test. Significant gains were found in overall student scores and in assessment of interpretation of carotid pulses and precordial auscultation. Students did not make significant gains in jugular venous pulse or precordial motion…
Black-White Summer Learning Gaps: Interpreting the Variability of Estimates across Representations
ERIC Educational Resources Information Center
Quinn, David M.
2015-01-01
The estimation of racial test score gap trends plays an important role in monitoring educational equality. Documenting gap trends is complex, however, and estimates can differ depending on the metric, modeling strategy, and psychometric assumptions. The sensitivity of summer learning gap estimates to these factors has been under-examined. Using…
ERIC Educational Resources Information Center
Quenk, Naomi L.
This book provides step-by-step guidance on the administration, scoring, and interpretation of the Myers-Briggs Type Indicator[R] (MBTI). The book also contains assessment of the test's strengths and weaknesses, advice on its clinical applications, and several case reports. The chapters are: (1) "Overview"; (2) "How To Administer…
Children's Human Figure Drawings: Clinical and Cultural Considerations.
ERIC Educational Resources Information Center
Thakur, P. S.
This paper considers the psychological aspects of children's drawings. The utility of the Draw a Person Test (DAPT) for different types of pscyhological research is discussed, and the non-intellectual and cultural factors of the DAPT are described. Suggestions on the administration, scoring, and interpretation of drawings are given. The next two…
Agreeing on Validity Arguments
ERIC Educational Resources Information Center
Sireci, Stephen G.
2013-01-01
Kane (this issue) presents a comprehensive review of validity theory and reminds us that the focus of validation is on test score interpretations and use. In reacting to his article, I support the argument-based approach to validity and all of the major points regarding validation made by Dr. Kane. In addition, I call for a simpler, three-step…
Some Innovative Methods to Improve Profiles Derivation
ERIC Educational Resources Information Center
Pei, Lai Kwan
2008-01-01
As the government aimed to provide appropriate education to all children (No Child Left Behind Act), it is important that the education providers can assess the performance of the students correctly so that they can provide the appropriate education for the students. Profile analysis is a very useful tool to interpret test scores and measure…
Do collaborative practical tests encourage student-centered active learning of gross anatomy?
Green, Rodney A; Cates, Tanya; White, Lloyd; Farchione, Davide
2016-05-06
Benefits of collaborative testing have been identified in many disciplines. This study sought to determine whether collaborative practical tests encouraged active learning of anatomy. A gross anatomy course included a collaborative component in four practical tests. Two hundred and seven students initially completed the test as individuals and then worked as a team to complete the same test again immediately afterwards. The relationship between mean individual, team, and difference (between team and individual) test scores to overall performance on the final examination (representing overall learning in the course) was examined using regression analysis. The overall mark in the course increased by 9% with a decreased failure rate. There was a strong relationship between individual score and final examination mark (P < 0.001) but no relationship for team score (P = 0.095). A longitudinal analysis showed that the test difference scores increased after Test 1 which may be indicative of social loafing and this was confirmed by a significant negative relationship between difference score on Test 4 (indicating a weaker student) and final examination mark (P < 0.001). It appeared that for this cohort, there was little peer-to-peer learning occurring during the collaborative testing and that weaker students gained the benefit from team marks without significant active learning taking place. This negative outcome may be due to insufficient encouragement of the active learning strategies that were expected to occur during the collaborative testing process. An improved understanding of the efficacy of collaborative assessment could be achieved through the inclusion of questionnaire based data to allow a better interpretation of learning outcomes. Anat Sci Educ 9: 231-237. © 2015 American Association of Anatomists. © 2015 American Association of Anatomists.
Woloshin, Steven; Schwartz, Lisa M; Welch, H Gilbert
2007-02-20
People need basic data interpretation skills to understand health risks and to weigh the harms and benefits of actions meant to reduce those risks. Although many studies document problems with understanding risk information, few assess ways to teach interpretation skills. To see whether a general education primer improves patients' medical data interpretation skills. Two randomized, controlled trials done in populations with high and low socioeconomic status (SES). The high SES trial included persons who attended a public lecture series at Dartmouth Medical School, Hanover, New Hampshire; and the low SES trial included veterans and their families from the waiting areas at the White River Junction Veterans Affairs Medical Center, White River Junction, Vermont. 334 adults in the high SES trial and 221 veterans and their families in the low SES trial were enrolled from October 2004 to August 2005. Completion rates for the primer and control groups in each trial were 95% versus 98% (high SES) and 85% versus 96% (low SES). The intervention in the primer groups was an educational booklet specifically developed to teach people the skills needed to understand risk. The control groups received a general health booklet developed by the U.S. Department of Health and Human Services Agency for Health Care Research and Quality. Score on a medical data interpretation test, a previously validated 100-point scale, in which 75 points or more is considered "passing." Secondary outcomes included 2 other 100-point validated scores (interest and confidence in interpreting medical statistics) and participants' ratings of the booklet's usefulness. In the high SES trial, 74% of participants in the primer group received a "passing grade" on the medical data interpretation test versus 56% in the control group (P = 0.001). Mean scores were 81 and 75, respectively (P = 0.0006). In the low SES trial, 44% versus 26% "passed" (P = 0.010): Mean scores were 69 and 62 in the primer and control groups, respectively (P = 0.008). The primer also significantly increased interest in medical statistics by 6 points in the high SES trial (a 4-point increase vs. a 2-point decrease from baseline) (P = 0.004) and by 8 points in the low SES trial (a 6-point increase vs. a 2-point decrease from baseline) (P = 0.004) compared with the control booklet. The primer, however, did not improve participants' confidence in interpreting medical statistics beyond the control booklet (a 2-point vs. a 4-point increase in the high SES trial [P = 0.36] and a 2-point versus a 6-point increase in the low SES trial [P = 0.166]). The primer was rated highly: 91% of participants in the high SES trial found it "helpful" or "very helpful," as did 95% of participants in the low SES trial. The primarily male low SES sample and the primarily female high SES sample limits generalizability. The authors did not assess whether better data interpretation skills improved decision-making. The primer improved medical data interpretation skills in people with high and low SES. ClinicalTrials.gov registration number: NCT00380432.
Mausbach, Brent T; Tiznado, Denisse; Cardenas, Veronica; Jeste, Dilip V; Patterson, Thomas L
2016-10-30
The UCSD Performance-based Skills Assessment (UPSA) is a widely used measure of functional capacity with strong reliability and validity. However there is a lack of psychometric data on Hispanics. The purpose of this study was to determine the impact of acculturation and education on UPSA performance among 62 Hispanic participants with schizophrenia or schizoaffective disorder and 46 healthy comparison subjects. Functional capacity was measured using the UPSA. Acculturation was measured using the Acculturation Rating Scale for Mexican Americans (ARSMA). Independent t-tests indicated that participants with schizophrenia had significantly lower UPSA total scores and scored lower on all UPSA sub-scales relative to the comparison group. Multiple regression also indicated that education and acculturation were significant predictors of UPSA total scores. These data provide a better understanding of UPSA scores in Hispanics with and without schizophrenia, and suggest that education and acculturation adjustments may be required to improve interpretation of test results. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
MMPI-2 and MMPI-A Computerized Interpretation: An Adjunct to Quality Mental Health Service.
ERIC Educational Resources Information Center
Phelps, LeAdelle
1994-01-01
Provides reviews of computerized scoring and interpretive systems for the Minnesota Multiphasic Personality Inventory (MMPI-2 and MMPI-A): Caldwell Report, the Psychological Assessment Resources MMPI-2 Interpretive System, and the National Computer Systems Programs. Concludes that when used appropriately, such scoring systems enhance a counselor's…
Effective use of interpreters by family nurse practitioner students: is didactic curriculum enough?
Phillips, Susanne J; Lie, Desiree; Encinas, Jennifer; Ahearn, Carol Sue; Tiso, Susan
2011-05-01
Nurse practitioners (NPs) care for patients with limited English proficiency (LEP). However, NP education for improving communication in interpreted encounters is not well reported. We report a single school study using standardized encounters within a clinical practice examination (CPX) to assess the adequacy of current curriculum. Entering family NP (FNP) students (n=26) participated in a baseline CPX case. They were assessed by standardized patients using the validated Interpreter Impact Rating Scale (IIRS) and Physician-Patient Interaction (PPI) scale, and by interpreters using the Interpreter Scale (IS).The case was re-administered to 31 graduating students following completion of existing curriculum. Primary outcome was aggregate change in skills comprising global IIRS, PPI and IS scores. Pre- and post-performance data were available for one class of 10 students. Secondary outcome was change in skill scores for this class. Mean aggregate global scores showed no significant improvement between scores at entry and graduation. For 10 students with pre- and post-performance data, there was no improvement in skill scores for any measure. Skill assessed on one measure worsened. FNP students show no improvement in skills in working with interpreters with the current curriculum. An enhanced curriculum is needed. ©2011 The Author(s) Journal compilation ©2011 American Academy of Nurse Practitioners.
A twin study of spatial and non-spatial delayed response performance in middle age.
Kremen, William S; Mai, Tuan; Panizzon, Matthew S; Franz, Carol E; Blankfeld, Howard M; Xian, Hong; Eisen, Seth A; Tsuang, Ming T; Lyons, Michael J
2011-06-01
Delayed alternation and object alternation are classic spatial and non-spatial delayed response tasks. We tested 632 middle-aged male veteran twins on variants of these tasks in order to compare test difficulty, measure their inter-correlation, test order effects, and estimate heritabilities (proportion of observed variance due to genetic influences). Non-spatial alternation (NSA), which may involve greater reliance on processing of subgoals, was significantly more difficult than spatial alternation (SA). Despite their similarities, NSA and SA scores were uncorrelated. NSA performance was worse when administered second; there was no SA order effect. NSA scores were modestly heritable (h(2)=.25; 26); SA was not. There was shared genetic variance between NSA scores and general intellectual ability (r(g)=.55; .67), but this also suggests genetic influences specific to NSA. Compared with findings from small, selected control samples, high "failure" rates in this community-based sample raise concerns about interpretation of brain dysfunction in elderly or patient samples. Copyright © 2011 Elsevier Inc. All rights reserved.
A Twin Study of Spatial and Non-Spatial Delayed Response Performance in Middle Age
Kremen, William S.; Mai, Tuan; Panizzon, Matthew S.; Franz, Carol E.; Blankfeld, Howard M.; Xian, Hong; Eisen, Seth A.; Tsuang, Ming T.; Lyons, Michael J.
2011-01-01
Delayed alternation and object alternation are classic spatial and non-spatial delayed response tasks. We tested 632 middle-aged male veteran twins on variants of these tasks in order to compare test difficulty, measure their inter-correlation, test order effects, and estimate heritabilities (proportion of observed variance due to genetic influences). Non-spatial alternation (NSA), which may involve greater reliance on processing of subgoals, was significantly more difficult than spatial alternation (SA). Despite their similarities, NSA and SA scores were uncorrelated. NSA performance was worse when administered second; there was no SA order effect. NSA scores were modestly heritable (h2=.25; 26); SA was not. There was shared genetic variance between NSA scores and general intellectual ability (rg=.55; .67), but this also suggests genetic influences specific to NSA. Compared with findings from small, selected control samples, high “failure” rates in this community-based sample raise concerns about interpretation of brain dysfunction in elderly or patient samples. PMID:21477911
ERIC Educational Resources Information Center
Ginther, April; Elder, Catherine
2014-01-01
In line with expanded conceptualizations of validity that encompass the interpretations and uses of test scores in particular policy contexts, this report presents results of a comparative analysis of institutional understandings and uses of 3 international English proficiency tests widely used for tertiary selection--the "TOEFL iBT"®…
ERIC Educational Resources Information Center
Ford, Jeremy W.; Missall, Kristen N.; Hosp, John L.; Kuhle, Jennifer L.
2016-01-01
Advances in maze selection curriculum-based measurement have led to several published tools with technical information for interpretation (e.g., norms, benchmarks, cut-scores, classification accuracy) that have increased their usefulness for universal screening. A range of scoring practices have emerged for evaluating student performance on maze…
Williams, Stacey L.; Polaha, Jodi
2014-01-01
The purpose of this paper was to examine the validity of score interpretations of an instrument developed to measure parents’ perceptions of stigma about seeking mental health services for their children. The validity of the score interpretations of the instrument was tested in two studies. Study 1 examined confirmatory factor analysis (CFA) employing a split half approach, and construct and criterion validity using the entire sample of parents in rural Appalachia whose children were experiencing psychosocial concerns (N=347), while Study 2 further examined CFA, construct and criterion validity, as well as predictive validity of the scores on the new scale using a general sample of parents in rural Appalachia (N=184). Results of exploratory and confirmatory factor analyses revealed support for a two factor model of parents’ perceived stigma, which represented both self and public forms of stigma associated with seeking mental health services for their children, and correlated with existing measures of stigma and other psychosocial variables. Further, the new self and public stigma scale significantly predicted parents’ willingness to seek services for children. PMID:24749752
Chen, Yi-Miau; Huang, Yi-Jing; Huang, Chien-Yu; Lin, Gong-Hong; Liaw, Lih-Jiun; Lee, Shih-Chieh; Hsieh, Ching-Lin
2017-10-01
The 3-point Berg Balance Scale (BBS-3P) and 3-point Postural Assessment Scale for Stroke Patients (PASS-3P) were simplified from the BBS and PASS to overcome the complex scoring systems. The BBS-3P and PASS-3P were more feasible in busy clinical practice and showed similarly sound validity and responsiveness to the original measures. However, the reliability of the BBS-3P and PASS-3P is unknown limiting their utility and the interpretability of scores. We aimed to examine the test-retest reliability and minimal detectable change (MDC) of the BBS-3P and PASS-3P in patients with stroke. Cross-sectional study. The rehabilitation departments of a medical center and a community hospital. A total of 51 chronic stroke patients (64.7% male). Both balance measures were administered twice 7 days apart. The test-retest reliability of both the BBS-3P and PASS-3P were examined by intraclass correlation coefficients (ICC). The MDC and its percentage over the total score (MDC%) of each measure was calculated for examining the random measurement errors. The ICC values of the BBS-3P and PASS-3P were 0.99 and 0.97, respectively. The MDC% (MDC) of the BBS-3P and PASS-3P were 9.1% (5.1 points) and 8.4% (3.0 points), respectively, indicating that both measures had small and acceptable random measurement errors. Our results showed that both the BBS-3P and the PASS-3P had good test-retest reliability, with small and acceptable random measurement error. These two simplified 3-level balance measures can provide reliable results over time. Our findings support the repeated administration of the BBS-3P and PASS-3P to monitor the balance of patients with stroke. The MDC values can help clinicians and researchers interpret the change scores more precisely.
What is the evidence for retrieval problems in the elderly?
White, N; Cunningham, W R
1982-01-01
To determine whether older adults experience particular problems with retrieval, groups of young and elderly adults were given free recall and recognition tests of supraspan lists of unrelated words. Analysis of number of words correctly recalled and recognized yielded a significant age by retention test interaction: greater age differences were observed for recall than for recognition. In a second analysis of words recalled and recognized, corrected for guessing, the interaction disappeared. It was concluded that previous interpretations that age by retention test interactions are indicative of retrieval problems of the elderly may have been confounded by methodological problems. Furthermore, it was suggested that researchers in aging and memory need to be explicit in identifying their underlying models of error processes when analyzing recognition scores: different error models may lead to different results and interpretations.
Van Damme, Benedicte; Stevens, Veerle; Van Tiggelen, Damien; Perneel, Christiaan; Crombez, Geert; Danneels, Lieven
2014-10-01
The influence of psychosocial components on back and abdominal endurance tests in patients with persistent non-specific low back pain should be investigated to ensure the correct interpretation of these measures. Three-hundred and thirty-two patients (291 men and 41 women) from 19 to 63years performed an abdominal and back muscle endurance test after completing some psychosocial questionnaires. During the endurance tests, surface electromyography signals of the internal obliques, the external obliques, the lumbar multifidus and the iliocostalis were recorded. Patients were dichotomized as underperformers and good performers, by comparing their real endurance time, to the expected time of endurance derived from the normalized median frequency slope. Independent t-tests were performed to examine the differences on the outcome of the questionnaires. In the back muscle endurance test, the underperformers had significantly lower (p<0.05) scores on some of the physical subscales of the SF-36. The underperformers group of the AE test scored significantly higher on the DRAM MZDI (p=0.018) and on the PCS scale (p=0.020) and showed also significantly lower scores on the SF-36 (p<0.05). Back muscle endurance tests are influenced by physical components, while abdominal endurance tests seem influenced by psychosocial components. Copyright © 2014 Elsevier Ltd. All rights reserved.
RELIABILITY CONCERNS IN THE REPEATED COMPUTERIZED ASSESSMENT OF ATTENTION IN CHILDREN
Zabel, T. Andrew; von Thomsen, Christian; Cole, Carolyn; Martin, Rebecca; Mahone, E. Mark
2010-01-01
Assessment of attentional processes via computerized assessment is frequently used to quantify intra-individual cognitive improvement or decline in response to treatment. However, assessment of intra-individual change is highly dependent on sufficient test reliability. We examined the test–retest reliability of selected variables from one popular computerized continuous performance test (CPT)—i.e., the Conners’ CPT – Second Edition (CPT-II). Participants were 39 healthy children (20 girls) ages 6–18 without intellectual impairment (mean PPVT-III SS = 102.6), LD, or psychiatric disorders (DICA-IV). Test–retest reliability over the 3–8 month interval (mean = 6 months) was acceptable (Intraclass Correlations [ICC] = .82 to .92) on comparison measures (Beery Test of Visual Perception, WISC-IV Block Design, PPVT-III). In contrast, test–retest reliability was only modest for CPT-II raw scores (ICCs ranging from .62 to .82) and T-scores (ICCs ranging from .33 to .65) for variables of interest (Omissions, Commissions, Variability, Hit Reaction Time, and Attentiveness). Using test–retest reliability information published in the CPT-II manual, 90% confidence intervals based on reliable change index (RCI) methodology were constructed to examine the significance of test–retest difference/change scores. Of the participants in this sample of typically developing youth, 30% generated intra-individual changes in T-scores on the Omissions and Attentiveness variables that exceeded the 90% confidence intervals and qualified as “statistically rare” changes in score. These results suggest a considerable degree of normal variability in CPT-II test scores over extended test–retest intervals, and suggest a need for caution when interpreting test score changes in neurologically unstable clinical populations. PMID:19452302
NASA Astrophysics Data System (ADS)
Jensen-Ruopp, Helga Spitko
A comparison of hands-on inquiry instruction with lecture instruction was presented to 134 Patterns and Process Biology students. Students participated in seven biology lessons that were selected from Biology Survey of Living Things (1992). A pre and post paper and pencil assessment was used as the data collecting instrument. The treatment group was taught using hands-on inquiry strategies while the non-treatment group was taught in the lecture method of instruction. The team teaching model was used as the mode of presentation to the treatment group and the non-treatment group. Achievement levels using specific criterion; novice (0% to 50%), developing proficiency (51% to 69%), accomplished (70% to 84) and exceptional or mastery level (85% to 100%) were used as a guideline to tabulate the results of the pre and post assessment. Rubric tabulation was done to interpret the testing results. The raw data was plotted using percentage change in test score totals versus reading level score by gender as well as percentage change in test score totals versus auditory vocabulary score by gender. Box Whisker plot comparative descriptive of individual pre and post test scores for the treatment and non-treatment group was performed. Analysis of covariance (ANCOVA) using MINITAB Statistical Software version 14.11 was run on data of the seven lessons, as well as on gender (male results individual and combined, and female results individual and combined) results. Normal Probability Plots for total scores as well as individual test scores were performed. The results suggest that hands-on inquiry based instruction when presented to special needs students including; at-risk; English as a second language limited, English proficiency and special education inclusive students' learning may enhance individual student achievement.
Cuddy, Monica M; Winward, Marcia L; Johnston, Mary M; Lipner, Rebecca S; Clauser, Brian E
2016-01-01
To add to the small body of validity research addressing whether scores from performance assessments of clinical skills are related to performance in supervised patient settings, the authors examined relationships between United States Medical Licensing Examination (USMLE) Step 2 Clinical Skills (CS) data gathering and data interpretation scores and subsequent performance in history taking and physical examination in internal medicine residency training. The sample included 6,306 examinees from 238 internal medicine residency programs who completed Step 2 CS for the first time in 2005 and whose performance ratings from their first year of residency training were available. Hierarchical linear modeling techniques were used to examine the relationships among Step 2 CS data gathering and data interpretation scores and history-taking and physical examination ratings. Step 2 CS data interpretation scores were positively related to both history-taking and physical examination ratings. Step 2 CS data gathering scores were not related to either history-taking or physical examination ratings after other USMLE scores were taken into account. Step 2 CS data interpretation scores provide useful information for predicting subsequent performance in history taking and physical examination in supervised practice and thus provide validity evidence for their intended use as an indication of readiness to enter supervised practice. The results show that there is less evidence to support the usefulness of Step 2 CS data gathering scores. This study provides important information for practitioners interested in Step 2 CS specifically or in performance assessments of medical students' clinical skills more generally.
Age-related invariance of abilities measured with the Wechsler Adult Intelligence Scale-IV.
Sudarshan, Navaneetham J; Bowden, Stephen C; Saklofske, Donald H; Weiss, Lawrence G
2016-11-01
Assessment of measurement invariance across populations is essential for meaningful comparison of test scores, and is especially relevant where repeated measurements are required for educational assessment or clinical diagnosis. Establishing measurement invariance legitimizes the assumption that test scores reflect the same psychological trait in different populations or across different occasions. Examination of Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV) U.S. standardization samples revealed that a first-order 5-factor measurement model was best fitting across 9 age groups from 16 years to 69 years. Strong metric invariance was found for 3 of 5 factors and partial intercept invariance for the remaining 2. Pairwise comparisons of adjacent age groups supported the inference that cognitive-trait group differences are manifested by group differences in the test scores. In educational and clinical settings these findings provide theoretical and empirical support to interpret changes in the index or subtest scores as reflecting changes in the corresponding cognitive abilities. Further, where clinically relevant, the subtest score composites can be used to compare changes in respective cognitive abilities. The model was supported in the Canadian standardization data with pooled age groups but the sample sizes were not adequate for detailed examination of separate age groups in the Canadian sample. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Assessment issues in the testing of children at school entry.
Rock, Donald A; Stenner, A Jackson
2005-01-01
The authors introduce readers to the research documenting racial and ethnic gaps in school readiness. They describe the key tests, including the Peabody Picture Vocabulary Test (PPVT), the Early Childhood Longitudinal Study (ECLS), and several intelligence tests, and describe how they have been administered to several important national samples of children. Next, the authors review the different estimates of the gaps and discuss how to interpret these differences. In interpreting test results, researchers use the statistical term "standard deviation" to compare scores across the tests. On average, the tests find a gap of about 1 standard deviation. The ECLS-K estimate is the lowest, about half a standard deviation. The PPVT estimate is the highest, sometimes more than 1 standard deviation. When researchers adjust those gaps statistically to take into account different outside factors that might affect children's test scores, such as family income or home environment, the gap narrows but does not disappear. Why such different estimates of the gap? The authors consider explanations such as differences in the samples, racial or ethnic bias in the tests, and whether the tests reflect different aspects of school "readiness," and conclude that none is likely to explain the varying estimates. Another possible explanation is the Spearman Hypothesis-that all tests are imperfect measures of a general ability construct, g; the more highly a given test correlates with g, the larger the gap will be. But the Spearman Hypothesis, too, leaves questions to be investigated. A gap of 1 standard deviation may not seem large, but the authors show clearly how it results in striking disparities in the performance of black and white students and why it should be of serious concern to policymakers.
ERIC Educational Resources Information Center
Dunn, Thomas G.; And Others
The feasibility of completely automating the Minnesota Multiphasic Personality Inventory (MMPI) was tested, and item response latencies were compared with other MMPI item characteristics. A total of 26 scales were successfully scored automatically for 165 subjects. The program also typed a Mayo Clinic interpretive report on a computer terminal,…
ERIC Educational Resources Information Center
Rabbitt, Patrick
2011-01-01
Salthouse (2011) argued that (a) variance between individuals on cognitive test scores remains constant between 20 and 90 years of age and (b) widely recognized problems of deducing functional relationships from patterns of correlations between measurements become especially severe for neuropsychological indices, especially for gross indices of…
ERIC Educational Resources Information Center
Gebril, Atta; Plakans, Lia
2013-01-01
As a growing number of testing programs use integrated writing tasks, more validation research is needed to inform stakeholders about score use and interpretation. The current study investigates the relationship between writing proficiency and discourse features in an integrated reading-writing task. At a Middle Eastern university, 136…
Characteristics of EEG Interpreters Associated With Higher Interrater Agreement.
Halford, Jonathan J; Arain, Amir; Kalamangalam, Giridhar P; LaRoche, Suzette M; Leonardo, Bonilha; Basha, Maysaa; Azar, Nabil J; Kutluay, Ekrem; Martz, Gabriel U; Bethany, Wolf J; Waters, Chad G; Dean, Brian C
2017-03-01
The goal of the project is to determine characteristics of academic neurophysiologist EEG interpreters (EEGers), which predict good interrater agreement (IRA) and to determine the number of EEGers needed to develop an ideal standardized testing and training data set for epileptiform transient (ET) detection algorithms. A three-phase scoring method was used. In phase 1, 19 EEGers marked the location of ETs in two hundred 30-second segments of EEG from 200 different patients. In phase 2, EEG events marked by at least 2 EEGers were annotated by 18 EEGers on a 5-point scale to indicate whether they were ETs. In phase 3, a third opinion was obtained from EEGers on any inconsistencies between phase 1 and phase 2 scoring. The IRA for the 18 EEGers was only fair. A select group of the EEGers had good IRA and the other EEGers had low IRA. Board certification by the American Board of Clinical Neurophysiology was associated with better IRA performance but other board certifications, years of fellowship training, and years of practice were not. As the number of EEGers used for scoring is increased, the amount of change in the consensus opinion decreases steadily and is quite low as the group size approaches 10. The IRA among EEGers varies considerably. The EEGers must be tested before use as scorers for ET annotation research projects. The American Board of Clinical Neurophysiology certification is associated with improved performance. The optimal size for a group of experts scoring ETs in EEG is probably in the 6 to 10 range.
Rantz, Marilyn J; Aud, Myra A; Zwygart-Stauffacher, Mary; Mehr, David R; Petroski, Gregory F; Owen, Steven V; Madsen, Richard W; Flesner, Marcia; Conn, Vicki; Maas, Meridean
2008-01-01
Field test results are reported for the Observable Indicators of Nursing Home Care Quality Instrument-Assisted Living Version, an instrument designed to measure the quality of care in assisted living facilities after a brief 30-minute walk-through. The OIQ-AL was tested in 207 assisted-living facilities in two states using classical test theory, generalizability theory, and exploratory factor analysis. The 34-item scale has a coherent six-factor structure that conceptually describes the multidimensional concept of care quality in assisted living. The six factors can be logically clustered into process (Homelike and Caring, 21 items) and structure (Access and Choice; Lighting; Plants and Pets; Outdoor Spaces) subscales and for a total quality score. Classical test theory results indicate most subscales and the total quality score from the OIQ-AL have acceptable interrater, test-retest, and strong internal consistency reliabilities. Generalizability theory analyses reveal that dependability of scores from the instrument are strong, particularly by including a second observer who conducts a site visit and independently completes an instrument, or by a single observer conducting two site visits and completing instruments during each visit. Scoring guidelines based on the total sample of observations (N = 358) help guide those who want to use the measure to interpret both subscale and total scores. Content validity was supported by two expert panels of people experienced in the assisted-living field, and a content validity index calculated for the first version of the scale is high (3.43 on a four-point scale). The OIQ-AL gives reliable and valid scores for researchers, and may be useful for consumers, providers, and others interested in measuring quality of care in assisted-living facilities.
Interpreting patient decisional conflict scores: behavior and emotions in decisions about treatment.
Knops, Anouk M; Goossens, Astrid; Ubbink, Dirk T; Legemate, Dink A; Stalpers, Lukas J; Bossuyt, Patrick M
2013-01-01
Patient decision aids facilitate treatment decisions. They are often evaluated in terms of their effect on decisional conflict, as measured by the Decisional Conflict Scale (DCS). It is unclear to what extent lower DCS scores are accompanied by observable patient behavior or emotions. To help interpret DCS scores. In a Dutch university hospital, statements on behaviors or emotions during decision making were collected from asymptomatic aneurysm patients and healthy employees. Subsequently, they rated the intensity of decisional conflict that each statement expresses on a 1 to 10 scale. Selected statements were prospectively tested in aneurysm patients and cancer patients facing treatment dilemmas. Associations between patients' DCS scores and reported behavior and emotions were analyzed using logistic regression analysis. Participants provided 363 statements on behaviors and emotions during decision making, of which 28 were mentioned more than 4 times. Nine forms of behavior and emotions were selected as they were graded with the least variable median ratings of intensity of decisional conflict. Among 100 patients facing a treatment dilemma, each point increase in DCS lowered their odds for "immediately making the decision" (odds ratio [OR], 0.96; 95% confidence interval [CI], 0.93-0.98), whereas the odds of "fretting regularly" (OR, 1.05; 95% CI, 1.02-1.08) and "feeling nervous when thinking of the decision" (OR, 1.04; 95% CI, 1.01-1.06) where higher. A decrease in decisional conflict scores leads to less decision postponing behavior, fretting, and nervousness. Research should focus on which DCS scores are needed to make deliberate decisions and which scores hinder patients in decision making.
Errors in radiographic interpretation made by veterinary students.
Lamb, C R; Pfeiffer, D U; Mantis, P
2007-01-01
As a means of identifying student weaknesses in radiographic interpretation that could be used as foci for teaching, a cohort of 96 students joining the final-year radiology rotation were randomly allocated to one of three radiographic interpretation quizzes, each based on radiographs of small-animal patients together with the signalment and a brief, relevant history. Students' quiz scores were analyzed by multiple logistic regression, using an outcome variable with the score for each item as numerator and maximum possible mark as denominator. Students' median quiz score was 49% of the maximum (range 23-80%). Students were more likely to gain a mark for items based on abnormal radiographs than for those based on normal radiographs (odds ratio 3.4, p < 0.001). Skeletal radiographs were associated with lower scores (OR 0.75, p = 0.03). The fewest marks were awarded for interpretation of a radiograph of a normal canine stifle and interpretation of a radiograph of a normal canine pelvis; these items were misinterpreted as abnormal by 86% and 80% of the students, respectively. Students' tendency to over-interpret normal radiographs may reflect a lack of knowledge of radiographic anatomy or an unrealistically high expectation that the radiographs are abnormal.
Validation of educational assessments: a primer for simulation and beyond.
Cook, David A; Hatala, Rose
2016-01-01
Simulation plays a vital role in health professions assessment. This review provides a primer on assessment validation for educators and education researchers. We focus on simulation-based assessment of health professionals, but the principles apply broadly to other assessment approaches and topics. Validation refers to the process of collecting validity evidence to evaluate the appropriateness of the interpretations, uses, and decisions based on assessment results. Contemporary frameworks view validity as a hypothesis, and validity evidence is collected to support or refute the validity hypothesis (i.e., that the proposed interpretations and decisions are defensible). In validation, the educator or researcher defines the proposed interpretations and decisions, identifies and prioritizes the most questionable assumptions in making these interpretations and decisions (the "interpretation-use argument"), empirically tests those assumptions using existing or newly-collected evidence, and then summarizes the evidence as a coherent "validity argument." A framework proposed by Messick identifies potential evidence sources: content, response process, internal structure, relationships with other variables, and consequences. Another framework proposed by Kane identifies key inferences in generating useful interpretations: scoring, generalization, extrapolation, and implications/decision. We propose an eight-step approach to validation that applies to either framework: Define the construct and proposed interpretation, make explicit the intended decision(s), define the interpretation-use argument and prioritize needed validity evidence, identify candidate instruments and/or create/adapt a new instrument, appraise existing evidence and collect new evidence as needed, keep track of practical issues, formulate the validity argument, and make a judgment: does the evidence support the intended use? Rigorous validation first prioritizes and then empirically evaluates key assumptions in the interpretation and use of assessment scores. Validation science would be improved by more explicit articulation and prioritization of the interpretation-use argument, greater use of formal validation frameworks, and more evidence informing the consequences and implications of assessment.
Ross, Thomas P
2014-12-01
The reliability and validity of standard and qualitative scores for the Ruff Figural Fluency Test (RFFT; Ruff, 1988) was examined in 102 healthy undergraduates. Participants (M age = 21.79; SD = 3.7; age = 80% Caucasian) were administered the RFFT and measures assessing executive functions (EF) and other cognitive domains. Inter-scorer reliability was excellent (0.9 range) for most RFFT indices. Test-retest coefficients (M interval = 7 weeks) ranged from 0.64 for the error ratio score to 0.87 for unique designs. RFFT indices correlated with Block Design performance and nonverbal measures of working memory, but were unrelated to measures of verbal fluency, verbal learning, or working memory for verbal material. RFFT novel design output correlated with most measures of EF supporting the convergent validity of this measure. In contrast, correlations between measures of EF and qualitative scores were absent or weak. RFFT score interpretation is discussed in light of relevant models of EF and directions for future research are presented. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Sattler, J M
1979-05-01
Hardy, Welcher, Mellitis, and Kagan altered standard WISC administrative and scoring procedures and, from the resulting higher subtest scores, concluded that IQs based on standardized tests are inappropriate measures for inner-city children. Careful examination of their study reveals many methodological inadequacies and problematic interpretations. Three of these are as follows: (a) failure to use any external criterion to evaluate the validity of their testing-of-limits procedures; (b) the possibility of examiner and investigator bias; and (c) lack of any comparison group that might demonstrate that poor children would be helped more than others by the probes recommended. Their report creates misleading doubts about existing intelligence tests and does a disservice to inner-city children who need the benefits of the judicious use of diagnostic procedures, which include standardized intelligence tests. Consequently, their assertion concerning the inappropriateness of standardized test results for inner-city children is not only premature and misleading, but it is unwarranted as well.
Reaffirming normal: the high risk of pathologizing healthy adults when interpreting the MMPI-2-RF.
Odland, Anthony P; Lammy, Andrew B; Perle, Jonathan G; Martin, Phillip K; Grote, Christopher L
2015-01-01
Monte Carlo simulations were utilized to determine the proportion of the normal population expected to have scale elevations on the MMPI-2-RF when multiple scores are interpreted. Results showed that when all 40 MMPI-2-RF scales are simultaneously considered, approximately 70% of normal adults are likely to have at least one scale elevation at or above 65 T, and as many as 20% will have five or more elevated scales. When the Restructured Clinical (RC) Scales are under consideration, 34% of normal adults have at least one elevated score. Interpretation of the Specific Problem Scales and Personality Psychopathology Five Scales--Revised also yielded higher than expected rates of significant scores, with as many as one in four normal adults possibly being miscategorized as having features of a personality disorder by the latter scales. These findings are consistent with the growing literature on rates of apparently abnormal scores in the normal population due to multiple score interpretation. Findings are discussed in relation to clinical assessment, as well as in response to recent work suggesting that the MMPI-2-RF's multiscale composition does not contribute to high rates of elevated scores.
Heyanka, Daniel J; Holster, Jessica L; Golden, Charles J
2013-08-01
Knowledge of patterns of neuropsychological performance among normal, healthy individuals is integral to the practice of clinical neuropsychology, because clinicians may not always account for intraindividual variability (IIV) before coming to diagnostic conclusions. The IIV was assessed among a sample of 46 healthy individuals with high average intelligence and educational attainment, utilizing a battery of neuropsychological tests, including the Wechsler Adult Intelligence Scale, Fourth Edition (WAIS-IV) and Wechsler Memory Scale, Fourth Edition (WMS-IV). The data indicated substantial variability in neurocognitive abilities. All participants were found to demonstrate scores considered impaired by at least 2 standard deviations (SDs). Despite adjusting for outliers, no participant produced a "normal" testing profile with an intraindividual maximum discrepancy (MD) of less than 1 SD in either direction. When WAIS-IV Full Scale IQ (FSIQ) was considered, participants generally demonstrated cognitive test scores ranging from 2 SDs less than to 1.5 SDs greater than their FSIQ. Furthermore, after demographic corrections, the majority (59%) of participants demonstrated at least 1 impaired cognitive test score, as defined by being 1 to 1.5 SDs below the mean. Overall, results substantiate the need for clinicians to consider FSIQ and educational attainment in interpretation of neuropsychological testing results, given the relevant commonality of "abnormal" test scores within this population. This may ultimately reduce the likelihood of making false-positive conclusions of impairment when educational attainment and intelligence are high, thus improving diagnostic accuracy.
Wachholz, Thalita Bianchi de Oliveira; Yassuda, Mônica Sanches
2011-01-01
It is now known that cognitive functions tend to decline with age. Executive functions (EF) are among the first abilities to decline with aging. A subcomponent of the EF is abstract reasoning. The Test of Proverbs is an instrument that can be used to evaluate the capacity of abstract reasoning. Objective To examine the association of performance in interpretation of proverbs, with education and with episodic memory and EF tasks. Methods A total of 67 individuals aged between 60 and 75 years were evaluated, and divided into three categories of education: 1-4 years, 5-8 years, and 9 or more years of schooling. The instruments used were a sociodemographic questionnaire (gender, age, marital status, education, income, previous occupation, current occupation and health perception), the Mini Mental State Examination, Brief Cognitive Screening Battery; Geriatric Depression Scale; Forward and Backward Digit Span (WAIS-III), and the Test of Proverbs. Results A high impact of education was seen on the interpretation of proverbs, with lower performance among the elderly with less education. A significant association between performance on the Test of Proverbs and scores on the MMSE, GDS, and verbal fluency tests was found. There was a modest association with incidental memory. Conclusions The capacity to interpret proverbs is strongly associated with education and with performance on other EF tasks. PMID:29213717
Díaz-Orueta, Unai; Blanco-Campal, Alberto; Burke, Teresa
2018-05-01
ABSTRACTBackground:A detailed neuropsychological assessment plays an important role in the diagnostic process of Mild Cognitive Impairment (MCI). However, available brief cognitive screening tests for this clinical population are administered and interpreted based mainly, or exclusively, on total achievement scores. This score-based approach can lead to erroneous clinical interpretations unless we also pay attention to the test taking behavior or to the type of errors committed during test performance. The goal of the current study is to perform a rapid review of the literature regarding cognitive screening tools for dementia in primary and secondary care; this will include revisiting previously published systematic reviews on screening tools for dementia, extensive database search, and analysis of individual references cited in selected studies. A subset of representative screening tools for dementia was identified that covers as many cognitive functions as possible. How these screening tools overlap with each other (in terms of the cognitive domains being measured and the method used to assess them) was examined and a series of process-based approach (PBA) modifications for these overlapping features was proposed, so that the changes recommended in relation to one particular cognitive task could be extrapolated to other screening tools. It is expected that future versions of cognitive screening tests, modified using a PBA, will highlight the benefits of attending to qualitative features of test performance when trying to identify subtle features suggestive of MCI and/or dementia.
Souillard-Mandar, William; Davis, Randall; Rudin, Cynthia; Au, Rhoda; Libon, David J.; Swenson, Rodney; Price, Catherine C.; Lamar, Melissa; Penney, Dana L.
2015-01-01
The Clock Drawing Test – a simple pencil and paper test – has been used for more than 50 years as a screening tool to differentiate normal individuals from those with cognitive impairment, and has proven useful in helping to diagnose cognitive dysfunction associated with neurological disorders such as Alzheimer’s disease, Parkinson’s disease, and other dementias and conditions. We have been administering the test using a digitizing ballpoint pen that reports its position with considerable spatial and temporal precision, making available far more detailed data about the subject’s performance. Using pen stroke data from these drawings categorized by our software, we designed and computed a large collection of features, then explored the tradeoffs in performance and interpretability in classifiers built using a number of different subsets of these features and a variety of different machine learning techniques. We used traditional machine learning methods to build prediction models that achieve high accuracy. We operationalized widely used manual scoring systems so that we could use them as benchmarks for our models. We worked with clinicians to define guidelines for model interpretability, and constructed sparse linear models and rule lists designed to be as easy to use as scoring systems currently used by clinicians, but more accurate. While our models will require additional testing for validation, they offer the possibility of substantial improvement in detecting cognitive impairment earlier than currently possible, a development with considerable potential impact in practice. PMID:27057085
Souillard-Mandar, William; Davis, Randall; Rudin, Cynthia; Au, Rhoda; Libon, David J; Swenson, Rodney; Price, Catherine C; Lamar, Melissa; Penney, Dana L
2016-03-01
The Clock Drawing Test - a simple pencil and paper test - has been used for more than 50 years as a screening tool to differentiate normal individuals from those with cognitive impairment, and has proven useful in helping to diagnose cognitive dysfunction associated with neurological disorders such as Alzheimer's disease, Parkinson's disease, and other dementias and conditions. We have been administering the test using a digitizing ballpoint pen that reports its position with considerable spatial and temporal precision, making available far more detailed data about the subject's performance. Using pen stroke data from these drawings categorized by our software, we designed and computed a large collection of features, then explored the tradeoffs in performance and interpretability in classifiers built using a number of different subsets of these features and a variety of different machine learning techniques. We used traditional machine learning methods to build prediction models that achieve high accuracy. We operationalized widely used manual scoring systems so that we could use them as benchmarks for our models. We worked with clinicians to define guidelines for model interpretability, and constructed sparse linear models and rule lists designed to be as easy to use as scoring systems currently used by clinicians, but more accurate. While our models will require additional testing for validation, they offer the possibility of substantial improvement in detecting cognitive impairment earlier than currently possible, a development with considerable potential impact in practice.
Interpreting studies of cognitive function following cardiac surgery: a guide for surgical teams.
Rubens, Fraser D; Boodhwani, Munir; Nathan, Howard
2007-05-01
Patients with coronary disease and related health care providers are faced with confusing and often conflicting information with regards to the neurocognitive impact of different strategies for coronary revascularization. Studies involving the measurement of postoperative cognitive deficit (POCD) have significant limitations that may ultimately impact on their interpretation and clinical relevance. In this review, we have described the origin of these tests and delineated the rationale for the design of testing that is commonly used in cardiac surgery patients. In general, neurocognitive tests assess domains of memory/new learning, psychomotor speed/dexterity and attentional capacity/mental control. Pre- and post-intervention tests in each domain can be evaluated either by the measurement of mean change scores (Group Comparison Model) for the entire group as continuous data, or by using categorical or continuous data to examine patterns of individual decline (Individual Comparison Model). This latter approach requires a specific definition of what constitutes a decline, which can be criticized as being arbitrary. There are limitations to each of these approaches that necessitate that critical information in trial design is available to the reviewer to facilitate interpretation. For example, the impact of factors such as test/re-test reliability and practice effect can be mitigated by the use of an appropriately chosen control population. Liberal parlance of neurocognitive outcome as a rationale for therapeutic choice must be tempered by wise interpretation of these tests. It is only through the understanding of their limitations and the implications of trial design that we can translate these results to provide the best therapeutic options for our patients in unbiased manner.
Developing an oropharyngeal cancer (OPC) knowledge and behaviors survey.
Dodd, Virginia J; Riley Iii, Joseph L; Logan, Henrietta L
2012-09-01
To use the community participation research model to (1) develop a survey assessing knowledge about mouth and throat cancer and (2) field test and establish test-retest reliability with newly developed instrument. Cognitive interviews with primarily rural African American adults to assess their perception and interpretation of survey items. Test-retest reliability was established with a racially diverse rural population. Test-retest reliabilities ranged from .79 to .40 for screening awareness and .74 to .19 for knowledge. Coefficients increased for composite scores. Community participation methodology provided a culturally appropriate survey instrument that demonstrated acceptable levels of reliability.
Dennis, Eslie; Banks, Peter; Murata, Lauren B; Sanchez, Stephanie A; Pennington, Christie; Hockersmith, Linda; Miller, Rachel; Lambe, Jess; Feng, Janine; Kapadia, Monesh; Clements, June; Loftin, Isabell; Singh, Shalini; Das-Gupta, Ashis; Lloyd, William; Bloom, Kenneth
2016-10-01
Companion diagnostics assay interpretation can select patients with the greatest targeted therapy benefits. We present the results from a prospective study demonstrating that pathologists can effectively learn immunohistochemical assay-interpretation skills from digital image-based electronic training (e-training). In this study, e-training was used to train board-certified pathologists to evaluate non-small cell lung carcinoma for eligibility for treatment with onartuzumab, a MET-inhibiting agent. The training program mimicked the live training that was previously validated in clinical trials for onartuzumab. A digital interface was developed for pathologists to review high-resolution, static images of stained slides. Sixty-four pathologists practicing in the United States enrolled while blinded to the type of training. After training, both groups completed a mandatory final test using glass slides. The results indicated both training modalities to be effective. Overall, 80.6% of e-trainees and 72.7% of live trainees achieved passing scores (at least 85%) on the final test. All study participants reported that their training experience was "good" and that they had received sufficient information to determine the adequacy of case slide staining to score each case. This study established that an e-training program conducted under highly controlled conditions can provide pathologists with the skills necessary to interpret a complex assay and that these skills can be equivalent to those achieved with face-to-face training using conventional microscopy. Programs of this type are scalable for global distribution and offer pathologists the potential for readily accessible and robust training in new companion diagnostic assays linked to novel, targeted, adjuvant therapies for cancer patients. Copyright © 2016 Elsevier Inc. All rights reserved.
A New Lease of Life for Thomson's Bonds Model of Intelligence
ERIC Educational Resources Information Center
Bartholomew, David J.; Deary, Ian J.; Lawn, Martin
2009-01-01
Modern factor analysis is the outgrowth of Spearman's original "2-factor" model of intelligence, according to which a mental test score is regarded as the sum of a general factor and a specific factor. As early as 1914, Godfrey Thomson realized that the data did not require this interpretation and he demonstrated this by proposing what became…
Findings vs. Interpretation in "The Long-Term Impacts of Teachers" by Chetty et al.
ERIC Educational Resources Information Center
Adler, Moshe
2013-01-01
The authors of the study "The Long-Term Impact of Teachers" claim that their study shows that increases in teacher value-added lead to significant and lasting increases in test scores and significant increases in income that will last throughout adulthood. Instead, I show that these claims are false because they are contradicted by the…
ERIC Educational Resources Information Center
Han, Kyung T.; Wells, Craig S.; Sireci, Stephen G.
2012-01-01
Item parameter drift (IPD) occurs when item parameter values change from their original value over time. IPD may pose a serious threat to the fairness and validity of test score interpretations, especially when the goal of the assessment is to measure growth or improvement. In this study, we examined the effect of multidirectional IPD (i.e., some…
The Two- and Three-Dimensional Models of the HK-WISC: A Confirmatory Factor Analysis.
ERIC Educational Resources Information Center
Chan, David W.; Lin, Wen-Ying
1996-01-01
Confirmatory analyses on the Hong Kong Wechsler Intelligence Scale for Children (HK-WISC) provided support for composite score interpretation based on the two- and three-dimensional models across age levels. Test sample was comprised of 1,100 children, ranging in age from 5 to 15 years at all 11 age levels specified by the HK-WISC. (KW)
ERIC Educational Resources Information Center
LaFlair, Geoffrey T.; Staples, Shelley
2017-01-01
Investigations of the validity of a number of high-stakes language assessments are conducted using an argument-based approach, which requires evidence for inferences that are critical to score interpretation (Chapelle, Enright, & Jamieson, 2008b; Kane, 2013). The current study investigates the extrapolation inference for a high-stakes test of…
[Measurement properties of self-report questionnaires published in Korean nursing journals].
Lee, Eun-Hyun; Kim, Chun-Ja; Kim, Eun Jung; Chae, Hyun-Ju; Cho, Soo-Yeon
2013-02-01
The purpose of this study was to evaluate measurement properties of self-report questionnaires for studies published in Korean nursing journals. Of 424 Korean nursing articles initially identified, 168 articles met the inclusion criteria. The methodological quality of the measurements used in the studies and interpretability were assessed using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist. It consists of items on internal consistency, reliability, measurement error, content validity, construct validity including structural validity, hypothesis testing, cross-cultural validity, and criterion validity, and responsiveness. For each item of the COSMIN checklist, measurement properties are rated on a four-point scale: excellent, good, fair, and poor. Each measurement property is scored with worst score counts. All articles used the classical test theory for measurement properties. Internal consistency (72.6%), construct validity (56.5%), and content validity (38.2%) were most frequently reported properties being rated as 'excellent' by COSMIN checklist, whereas other measurement properties were rarely reported. A systematic review of measurement properties including interpretability of most instruments warrants further research and nursing-focused checklists assessing measurement properties should be developed to facilitate intervention outcomes across Korean studies.
The Flynn effect and memory function.
Baxendale, Sallie
2010-08-01
The Flynn effect refers to the steady increase in IQ that appears to date back at least to the inception of modern-day IQ tests. This study examined the possible Flynn effects on clinical memory tests involving the learning and recall of verbal and nonverbal material. Comparisons of the age-related norms on the list learning and design learning tasks from the Adult Memory and Information Processing Battery (AMIPB), published in 1985, and its successor, the BIRT (Brain Injury Rehabilitation Trust) Memory and Information Processing Battery (BMIPB) published in 2007, indicate that there is a significant Flynn effect on tests of memory function. This effect appears to be material specific with statistically significant improvements in all scores on tests involving the learning and recall of visual material in every age range evident over a 22-year period. Verbal memory abilities appear to be relatively stable with no significant differences between the scores in the majority of age ranges. The ramifications for the clinical interpretation of these tests are discussed.
de Vroege, Lars; Emons, Wilco H M; Sijtsma, Klaas; van der Feltz-Cornelis, Christina M
2018-01-01
The Bermond-Vorst Alexithymia Questionnaire (BVAQ) has been validated in student samples and small clinical samples, but not in the general population; thus, representative general-population norms are lacking. We examined the factor structure of the BVAQ in Longitudinal Internet Studies for the Social Sciences panel data from the Dutch general population ( N = 974). Factor analyses revealed a first-order five-factor model and a second-order two-factor model. However, in the second-order model, the factor interpreted as analyzing ability loaded on both the affective factor and the cognitive factor. Further analyses showed that the first-order test scores are more reliable than the second-order test scores. External and construct validity were addressed by comparing BVAQ scores with a clinical sample of patients suffering from somatic symptom and related disorder (SSRD) ( N = 235). BVAQ scores differed significantly between the general population and patients suffering from SSRD, suggesting acceptable construct validity. Age was positively associated with alexithymia. Males showed higher levels of alexithymia. The BVAQ is a reliable alternative measure for measuring alexithymia.
Methods for Constructing and Assessing Propensity Scores
Garrido, Melissa M; Kelley, Amy S; Paris, Julia; Roza, Katherine; Meier, Diane E; Morrison, R Sean; Aldridge, Melissa D
2014-01-01
Objectives To model the steps involved in preparing for and carrying out propensity score analyses by providing step-by-step guidance and Stata code applied to an empirical dataset. Study Design Guidance, Stata code, and empirical examples are given to illustrate (1) the process of choosing variables to include in the propensity score; (2) balance of propensity score across treatment and comparison groups; (3) balance of covariates across treatment and comparison groups within blocks of the propensity score; (4) choice of matching and weighting strategies; (5) balance of covariates after matching or weighting the sample; and (6) interpretation of treatment effect estimates. Empirical Application We use data from the Palliative Care for Cancer Patients (PC4C) study, a multisite observational study of the effect of inpatient palliative care on patient health outcomes and health services use, to illustrate the development and use of a propensity score. Conclusions Propensity scores are one useful tool for accounting for observed differences between treated and comparison groups. Careful testing of propensity scores is required before using them to estimate treatment effects. PMID:24779867
McGaghie, William C; Cohen, Elaine R; Wayne, Diane B
2011-01-01
United States Medical Licensing Examination (USMLE) scores are frequently used by residency program directors when evaluating applicants. The objectives of this report are to study the chain of reasoning and evidence that underlies the use of USMLE Step 1 and 2 scores for postgraduate medical resident selection decisions and to evaluate the validity argument about the utility of USMLE scores for this purpose. This is a research synthesis using the critical review approach. The study first describes the chain of reasoning that underlies a validity argument about using test scores for a specific purpose. It continues by summarizing correlations of USMLE Step 1 and 2 scores and reliable measures of clinical skill acquisition drawn from nine studies involving 393 medical learners from 2005 to 2010. The integrity of the validity argument about using USMLE Step 1 and 2 scores for postgraduate residency selection decisions is tested. The research synthesis shows that USMLE Step 1 and 2 scores are not correlated with reliable measures of medical students', residents', and fellows' clinical skill acquisition. The validity argument about using USMLE Step 1 and 2 scores for postgraduate residency selection decisions is neither structured, coherent, nor evidence based. The USMLE score validity argument breaks down on grounds of extrapolation and decision/interpretation because the scores are not associated with measures of clinical skill acquisition among advanced medical students, residents, and subspecialty fellows. Continued use of USMLE Step 1 and 2 scores for postgraduate medical residency selection decisions is discouraged.
Kuijpers, Rowella C. W. M.; Otten, Roy; Vermulst, Ad A.; Bitfoi, Adina; Goelitz, Dietmar; Koç, Ceren; Mihova, Zlatka; Pez, Ondine; Carta, Mauro; Keyes, Katherine; Lesinskiene, Sigita; Engels, Rutger C. M. E.; Kovess, Viviane
2015-01-01
Large-scale international surveys are important to globally evaluate, monitor, and promote children's mental health. However, use of young children's self-reports in these studies is still controversial. The Dominic Interactive, a computerized DSM-IV–based child mental health self-report questionnaire, has unique characteristics that may make it preeminently appropriate for usage in cross-country comparisons. This study aimed to determine scale score reliabilities (omega) of the Dominic Interactive in a sample of 8,135 primary school children, ages 6–11 years old, in 7 European countries, to confirm the proposed 7-scale factor structure, and to test for measurement invariance of scale and item scores across countries. Omega reliability values for scale scores were good to high in every country, and the factor structure was confirmed for all countries. A thorough examination of measurement invariance provided evidence for cross-country test score comparability of 5 of the 7 scales and partial scale score invariance of 2 anxiety scales. Possible explanations for this partial invariance include cross-country differences in conceptualizing items and defining what is socially and culturally acceptable anxiety. The convincing evidence for validity of score interpretation makes the Dominic Interactive an indispensable tool for cross-country screening purposes. PMID:26237209
Maity, Arnab; Carroll, Raymond J; Mammen, Enno; Chatterjee, Nilanjan
2009-01-01
Motivated from the problem of testing for genetic effects on complex traits in the presence of gene-environment interaction, we develop score tests in general semiparametric regression problems that involves Tukey style 1 degree-of-freedom form of interaction between parametrically and non-parametrically modelled covariates. We find that the score test in this type of model, as recently developed by Chatterjee and co-workers in the fully parametric setting, is biased and requires undersmoothing to be valid in the presence of non-parametric components. Moreover, in the presence of repeated outcomes, the asymptotic distribution of the score test depends on the estimation of functions which are defined as solutions of integral equations, making implementation difficult and computationally taxing. We develop profiled score statistics which are unbiased and asymptotically efficient and can be performed by using standard bandwidth selection methods. In addition, to overcome the difficulty of solving functional equations, we give easy interpretations of the target functions, which in turn allow us to develop estimation procedures that can be easily implemented by using standard computational methods. We present simulation studies to evaluate type I error and power of the method proposed compared with a naive test that does not consider interaction. Finally, we illustrate our methodology by analysing data from a case-control study of colorectal adenoma that was designed to investigate the association between colorectal adenoma and the candidate gene NAT2 in relation to smoking history.
NASA Astrophysics Data System (ADS)
Lindseth, Frank; Nordrik Hallan, Marte; Schiller Tønnessen, Martin; Smistad, Erik; Vâpenstad, Cecilie
2017-03-01
Introduction: Medical imaging technology has revolutionized health care over the past 30 years. This is especially true for ultrasound, a modality that an increasing amount of medical personal is starting to use. Purpose: The purpose of this study was to develop and evaluate a platform for improving medical image interpretation skills regardless of time and space and without the need for expensive imaging equipment or a patient to scan. Methods, results and conclusions: A stable web application with the needed functionality for image interpretation training and evaluation has been implemented. The system has been extensively tested internally and used during an international course in ultrasound-guided neurosurgery. The web application was well received and got very good System Usability Scale (SUS) scores.
Machida, Haruhiko; Lin, Xiao-Zhu; Fukui, Rika; Shen, Yun; Suzuki, Shigeru; Tanaka, Isao; Ishikawa, Takuya; Tate, Etsuko; Ueno, Eiko
2015-02-01
We retrospectively investigated the effect of the motion correction algorithm (MCA) on image quality and interpretability by heart rate (HR) in coronary CT angiography (CCTA). For 105 patients (6 HR groups) undergoing CCTA, 2 readers independently graded the image quality of the 4 major coronary arteries reconstructed with and without MCA at diastole with HR ≤64 bpm and at systole and diastole ≥65 bpm using a 5-point scale. For each HR group and cardiac phase, we compared per-vessel and per-segment image quality using Wilcoxon signed rank test and percentages of interpretable image quality (scores 3-5) among without MCA at diastole with HR ≤64 bpm, as a reference, with MCA at diastole ≤69 bpm and at systole 70-79 bpm using the chi-square test. The motion correction algorithm reconstruction provided similar or better image quality and interpretability in all groups, with 96-100 % per-vessel (P = 0.008 for the right coronary artery; otherwise, P > 0.05) and 99 % per-segment interpretable image quality (P = 0.0002) at diastole with HR ≤69 bpm and at systole 70-79 bpm compared to the reference (88-100 and 97 %, respectively). MCA reconstruction preserved image quality and interpretability of CCTA with HR ≤79 bpm.
Williams, Stacey L; Polaha, Jodi
2014-09-01
The purpose of our research was to examine the validity of score interpretations of an instrument developed to measure parents' perceptions of stigma about seeking mental health services for their children. The validity of the score interpretations of the instrument was tested in 2 studies. Study 1 employed confirmatory factor analysis (CFA), using a split half approach, and construct and criterion validity on data from the entire sample of parents in rural Appalachia whose children were experiencing psychosocial concerns (N = 347), while Study 2 employed CFA, construct and criterion validity, and predictive validity of the scores on data from a general sample of parents in rural Appalachia (N = 184). Results of exploratory and confirmatory factor analyses revealed support for a 2-factor model of parents' perceived stigma, which represented both self and public forms of stigma associated with seeking mental health services for their children, and correlated with existing measures of stigma and other psychosocial variables. Further, the new self and public stigma scale significantly predicted parents' willingness to seek services for children. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Im, Sun; Suntrup-Krueger, Sonja; Colbow, Sigrid; Sauer, Sonja; Claus, Inga; Meuth, Sven G; Dziewas, Rainer; Warnecke, Tobias
2018-05-26
Diagnosis of pharyngeal dysphagia caused by myasthenia gravis (MG) based on clinical examination alone is often challenging. Flexible endoscopic evaluation of swallowing (FEES) combined with Tensilon (edrophonium) application, referred to as the FEES-Tensilon Test, was developed to improve diagnostic accuracy and to detect the main symptoms of pharyngeal dysphagia in MG. Here we investigated inter- and intra-rater reliability of the FEES-Tensilon Test and analyzed the main endoscopic findings. Four experienced raters reviewed a total of 20 FEES-Tensilon-Test videos in randomized order. Residue severity was graded at 4 different pharyngeal spaces before and after Tensilon administration. All interpretations were performed twice per rater, 4 weeks apart (a total of 160 scorings). Intra-rater test-retest reliability and inter-rater reliability levels were calculated. The most frequent FEES findings in MG patients before Tensilon application were prominent residues of semi solids spread all over the hypopharynx in varying locations. The reliability level in the interpretation of the FEES-Tensilon test was excellent regardless of the raters' profession or years of experience with FEES. All 4 raters showed high inter- and intra- reliability levels in interpreting the FEES-Tensilon Test based on residue clearance (kappa=0.922, 0.981). Degree of residue normalization in the vallecular space after Tensilon application showed the highest inter- and intra-rater reliability level (kappa=0.863, 0.957) followed by the epiglottis (kappa=0.813, 0.946) and pyriform sinuses (kappa=0.836, 0.929). Interpretation of the FEES-Tensilon Test based on residue severity and degree of Tensilon clearance, especially in the vallecular space, is consistent and reliable. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
School Quality and the Development of Cognitive Skills between Age Four and Six
Borghans, Lex; Golsteyn, Bart H. H.; Zölitz, Ulf
2015-01-01
This paper studies the extent to which young children develop their cognitive ability in high and low quality schools. We use a representative panel data set containing cognitive test scores of 4-6 year olds in Dutch schools. School quality is measured by the school’s average achievement test score at age 12. Our results indicate that children in high-quality schools develop their skills substantially faster than those in low-quality schools. The results remain robust to the inclusion of initial ability, parental background, and neighborhood controls. Moreover, using proximity to higher-achieving schools as an instrument for school choice corroborates the results. The robustness of the results points toward a causal interpretation, although it is not possible to erase all doubt about unobserved confounding factors. PMID:26182123
Pourmand, Ali; Tanski, Mary; Davis, Steven; Shokoohi, Hamid; Lucas, Raymond; Zaver, Fareen
2015-01-01
Asynchronous online training has become an increasingly popular educational format in the new era of technology-based professional development. We sought to evaluate the impact of an online asynchronous training module on the ability of medical students and emergency medicine (EM) residents to detect electrocardiogram (ECG) abnormalities of an acute myocardial infarction (AMI). We developed an online ECG training and testing module on AMI, with emphasis on recognizing ST elevation myocardial infarction (MI) and early activation of cardiac catheterization resources. Study participants included senior medical students and EM residents at all post-graduate levels rotating in our emergency department (ED). Participants were given a baseline set of ECGs for interpretation. This was followed by a brief interactive online training module on normal ECGs as well as abnormal ECGs representing an acute MI. Participants then underwent a post-test with a set of ECGs in which they had to interpret and decide appropriate intervention including catheterization lab activation. 148 students and 35 EM residents participated in this training in the 2012-2013 academic year. Students and EM residents showed significant improvements in recognizing ECG abnormalities after taking the asynchronous online training module. The mean score on the testing module for students improved from 5.9 (95% CI [5.7-6.1]) to 7.3 (95% CI [7.1-7.5]), with a mean difference of 1.4 (95% CI [1.12-1.68]) (p<0.0001). The mean score for residents improved significantly from 6.5 (95% CI [6.2-6.9]) to 7.8 (95% CI [7.4-8.2]) (p<0.0001). An online interactive module of training improved the ability of medical students and EM residents to correctly recognize the ECG evidence of an acute MI.
Inui, Yoshitaka; Ito, Kengo; Kato, Takashi
2017-01-01
The value of fluorine-18-fluorodeoxyglucose positron emission tomography (18F-FDG-PET) and magnetic resonance imaging (MRI) for predicting conversion of mild cognitive impairment (MCI) to Alzheimer's disease (AD) in longer-term is unclear. To evaluate longer-term prediction of MCI to AD conversion using 18F-FDG-PET and MRI in a multicenter study. One-hundred and fourteen patients with MCI were followed for 5 years. They underwent clinical and neuropsychological examinations, 18F-FDG-PET, and MRI at baseline. PET images were visually classified into predefined dementia patterns. PET scores were calculated as a semi quantitative index. For structural MRI, z-scores in medial temporal area were calculated by automated volume-based morphometry (VBM). Overall, 72% patients with amnestic MCI progressed to AD during the 5-year follow-up. The diagnostic accuracy of PET scores over 5 years was 60% with 53% sensitivity and 84% specificity. Visual interpretation of PET images predicted conversion to AD with an overall 82% diagnostic accuracy, 94% sensitivity, and 53% specificity. The accuracy of VBM analysis presented little fluctuation through 5 years and it was highest (73%) at the 5-year follow-up, with 79% sensitivity and 63% specificity. The best performance (87.9% diagnostic accuracy, 89.8% sensitivity, and 82.4% specificity) was with a combination identified using multivariate logistic regression analysis that included PET visual interpretation, educational level, and neuropsychological tests as predictors. 18F-FDG-PET visual assessment showed high performance for predicting conversion to AD from MCI, particularly in combination with neuropsychological tests. PET scores showed high diagnostic specificity. Structural MRI focused on the medial temporal area showed stable predictive value throughout the 5-year course.
Interpretation of hip fracture patterns using areal bone mineral density in the proximal femur.
Hey, Hwee Weng Dennis; Sng, Weizhong Jonathan; Lim, Joel Louis Zongwei; Tan, Chuen Seng; Gan, Alfred Tau Liang; Ng, Jun Han Charles; Kagda, Fareed H Y
2015-12-01
Bone mineral density scans are currently interpreted based on an average score of the entire proximal femur. Improvements in technology now allow us to measure bone density in specific regions of the proximal femur. The study attempts to explain the pathophysiology of neck of femur (NOF) and intertrochanteric/basi-cervical (IT) fractures by correlating areal BMD (aBMD) scores with fracture patterns, and explore possible predictors for these fracture patterns. This is a single institution retrospective study on all patients who underwent hip surgeries from June 2010 to August 2012. A total of 106 patients (44 IT/basi-cervical, 62 NOF fractures) were studied. The data retrieved include patient characteristics and aBMD scores measured at different regions of the contralateral hip within 1 month of the injury. Demographic and clinical characteristic differences between IT and NOF fractures were analyzed using Fisher's Exact test and two-sample t test. Relationship between aBMD scores and fracture patterns was assessed using multivariable regression modeling. After adjusted multivariable analysis, T-Troc and T-inter scores were significantly lower in intertrochanteric/basi-cervical fractures compared to neck of femur fractures (P = 0.022 and P = 0.026, respectively). Both intertrochanteric/basi-cervical fractures (mean T.Tot -1.99) and neck of femur fractures (mean T.Tot -1.64) were not found to be associated with a mean T.tot less than -2.5. However, the mean aBMD scores were consistently less than -2.5 for both intertrochanteric/basi-cervical fractures and neck of femur fractures. Gender and calcium intake at the time of injury were associated with specific hip fracture patterns (P = 0.002 and P = 0.011, respectively). Hip fracture patterns following low energy trauma may be influenced by the pattern of reduced bone density in different areas of the hip. Intertrochanteric/basi-cervical fractures were associated with significantly lower T-Troc and T-Inter scores compared to neck of femur fractures, suggesting that the fracture traversed through the areas with the lowest bone density in the proximal femur. In the absence of reduced T.Troc and T.Inter, neck of femur fractures occurred more commonly. T-Total scores may underestimate the severity of osteoporosis/osteopenia and measuring T-score at the neck of femur may better reflect the severity of osteoporosis and likelihood of a fragility fracture.
A web-based normative calculator for the uniform data set (UDS) neuropsychological test battery.
Shirk, Steven D; Mitchell, Meghan B; Shaughnessy, Lynn W; Sherman, Janet C; Locascio, Joseph J; Weintraub, Sandra; Atri, Alireza
2011-11-11
With the recent publication of new criteria for the diagnosis of preclinical Alzheimer's disease (AD), there is a need for neuropsychological tools that take premorbid functioning into account in order to detect subtle cognitive decline. Using demographic adjustments is one method for increasing the sensitivity of commonly used measures. We sought to provide a useful online z-score calculator that yields estimates of percentile ranges and adjusts individual performance based on sex, age and/or education for each of the neuropsychological tests of the National Alzheimer's Coordinating Center Uniform Data Set (NACC, UDS). In addition, we aimed to provide an easily accessible method of creating norms for other clinical researchers for their own, unique data sets. Data from 3,268 clinically cognitively-normal older UDS subjects from a cohort reported by Weintraub and colleagues (2009) were included. For all neuropsychological tests, z-scores were estimated by subtracting the raw score from the predicted mean and then dividing this difference score by the root mean squared error term (RMSE) for a given linear regression model. For each neuropsychological test, an estimated z-score was calculated for any raw score based on five different models that adjust for the demographic predictors of SEX, AGE and EDUCATION, either concurrently, individually or without covariates. The interactive online calculator allows the entry of a raw score and provides five corresponding estimated z-scores based on predictions from each corresponding linear regression model. The calculator produces percentile ranks and graphical output. An interactive, regression-based, normative score online calculator was created to serve as an additional resource for UDS clinical researchers, especially in guiding interpretation of individual performances that appear to fall in borderline realms and may be of particular utility for operationalizing subtle cognitive impairment present according to the newly proposed criteria for Stage 3 preclinical Alzheimer's disease.
Evaluation of abnormal liver function tests.
Agrawal, Swastik; Dhiman, Radha K; Limdi, Jimmy K
2016-04-01
Incidentally detected abnormality in liver function tests is a common situation encountered by physicians across all disciplines. Many of these patients do not have primary liver disease as most of the commonly performed markers are not specific for the liver and are affected by myriad factors unrelated to liver disease. Also, many of these tests like liver enzyme levels do not measure the function of the liver, but are markers of liver injury, which is broadly of two types: hepatocellular and cholestatic. A combination of a careful history and clinical examination along with interpretation of pattern of liver test abnormalities can often identify type and aetiology of liver disease, allowing for a targeted investigation approach. Severity of liver injury is best assessed by composite scores like the Model for End Stage Liver Disease rather than any single parameter. In this review, we discuss the interpretation of the routinely performed liver tests along with the indications and utility of quantitative tests. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Interpreting Assessment Scores of Nonliterate Learners with Ethnographic Data.
ERIC Educational Resources Information Center
Griffin, Suzanne M.
Research findings are reported that suggest that valid interpretation of assessment scores on illiterate and preliterate learners requires the use of ethnographic data. Data from observation notes, photos, and audiotapes indicated that learners' understanding of their tasks affected their performance in assessment situations. Previous findings…
Effects of arginine vasopressin on musical working memory.
Granot, Roni Y; Uzefovsky, Florina; Bogopolsky, Helena; Ebstein, Richard P
2013-01-01
Previous genetic studies showed an association between variations in the gene coding for the 1a receptor of the neuro-hormone arginine vasopressin (AVP) and musical working memory (WM). The current study set out to test the influence of intranasal administration (INA) of AVP on musical as compared to verbal WM using a double blind crossover (AVP-placebo) design. Two groups of 25 males were exposed to 20 IU of AVP in one session, and 20 IU of saline water (placebo) in a second session, 1 week apart. In each session subjects completed the tonal subtest from Gordon's "Musical Aptitude Profile," the interval subtest from the "Montreal Battery for Evaluation of Amusias (MBEA)," and the forward and backward digit span tests. Scores in the digit span tests were not influenced by AVP. In contrast, in the music tests there was an AVP effect. In the MBEA test, scores for the group receiving placebo in the first session (PV) were higher than for the group receiving vasopressin in the first session (VP) (p < 0.05) with no main Session effect nor Group × Session interaction. In the Gordon test there was a main Session effect (p < 0.05) with scores higher in the second as compared to the first session, a marginal main Group effect (p = 0.093) and a marginal Group × Session interaction (p = 0.88). In addition we found that the group that received AVP in the first session scored higher on scales indicative of happiness, and alertness on the positive and negative affect scale, (PANAS). Only in this group and only in the music test these scores were significantly correlated with memory scores. Together the results reflect a complex interaction between AVP, musical memory, arousal, and contextual effects such as session, and base levels of memory. The results are interpreted in light of music's universal use as a means to modulate arousal on the one hand, and AVP's influence on mood, arousal, and social interactions on the other.
Batty, G David; Der, Geoff; Deary, Ian J
2006-09-01
Numerous studies have reported that maternal cigarette smoking during pregnancy is related to lower IQ scores in the offspring. Confounding is a crucial issue in interpreting this association. In the US National Longitudinal Survey of Youth 1979, IQ was ascertained serially during childhood using the Peabody Individual Achievement Test, the total score for which comprises results on 3 subtests: mathematics, reading comprehension, and reading recognition. Maternal IQ was assessed by using the Armed Forces Qualification Test. There were 5578 offspring (born to 3145 mothers) with complete information for maternal smoking habits, total Peabody Individual Achievement Test score, and covariates. The offspring of mothers who smoked > or = 1 pack of cigarettes per day during pregnancy had an IQ score (Peabody Individual Achievement Test total) that was, on average, 2.87 points lower than children born to nonsmoking mothers. Separate control for maternal education (0.27-IQ-point decrement) and, to a lesser degree, maternal IQ (1.51-IQ-point decrement) led to marked attenuation of the maternal-smoking-offspring-IQ relation. A similar pattern of results was seen when Peabody Individual Achievement Test subtest results were the outcomes of interest. The only exception was the Peabody Individual Achievement Test mathematics score, in which adjusting for maternal IQ essentially led to complete attenuation of the maternal-smoking-offspring-IQ gradient (0.66-IQ-point decrement). The impact of controlling for physical, behavioral, and other social indices was much less pronounced than for maternal education or IQ. These findings suggest that previous studies that did not adjust for maternal education and/or IQ may have overestimated the association of maternal smoking with offspring cognitive ability.
Assessment of numeracy in sports and exercise science students at an Australian university
NASA Astrophysics Data System (ADS)
Green, Simon; McGlynn, Susan; Stuart, Deidre; Fahey, Paul; Pettigrew, Jim; Clothier, Peter
2018-05-01
The effect of high school study of mathematics on numeracy performance of sports and exercise science (SES) students is not clear. To investigate this further, we tested the numeracy skills of 401 students enrolled in a Bachelor of Health Sciences degree in SES using a multiple-choice survey consisting of four background questions and 39 numeracy test questions. Background questions (5-point scale) focused on highest level of mathematics studied at high school, self-perception of mathematics proficiency, perceived importance of mathematics to SES and likelihood of seeking help with mathematics. Numeracy questions focused on rational number, ratios and rates, basic algebra and graph interpretation. Numeracy performance was based on answers to these questions (1 mark each) and represented by the total score (maximum = 39). Students from first (n = 212), second (n = 78) and third (n = 111) years of the SES degree completed the test. The distribution of numeracy test scores for the entire cohort was negatively skewed with a median (IQR) score of 27(11). We observed statistically significant associations between test scores and the highest level of mathematics studied (P < 0.05), being lowest in students who studied Year 10 Mathematics (20 (9)), intermediate in students who studied Year 12 General Mathematics (26 (8)) and highest in two groups of students who studied higher-level Year 12 Mathematics (31 (9), 31 (6)). There were statistically significant associations between test scores and level of self-perception of mathematics proficiency and also likelihood of seeking help with mathematics (P < 0.05) but not with perceived importance of mathematics to SES. These findings reveal that the level of mathematics studied in high school is a critical factor determining the level of numeracy performance in SES students.
Tarescavage, Anthony M; Wygant, Dustin B; Gervais, Roger O; Ben-Porath, Yossef S
2013-01-01
The current study examined the over-reporting Validity Scales of the MMPI-2 Restructured Form (MMPI-2-RF; Ben-Porath & Tellegen, 2008/2011) in relation to the Slick, Sherman, and Iverson (1999) criteria for the diagnosis of Malingered Neurocognitive Dysfunction in a sample of 916 consecutive non-head injury disability claimants. The classification of Malingered Neurocognitive Dysfunction was based on scores from several cognitive symptom validity tests and response bias indicators built into traditional neuropsychological tests. Higher scores on MMPI-2-RF Validity Scales, particularly the Response Bias Scale (Gervais, Ben-Porath, Wygant, & Green, 2007), were associated with probable and definite Malingered Neurocognitive Dysfunction. The MMPI-2-RF's Validity Scales classification accuracy of Malingered Neurocognitive Dysfunction improved when multiple scales were interpreted. Additionally, higher scores on MMPI-2-RF substantive scales measuring distress, internalizing dysfunction, thought dysfunction, and social avoidance were associated with probable and definite Malingered Neurocognitive Dysfunction. Implications for clinical practice and future directions are noted.
Martin, Phillip K; Schroeder, Ryan W
2014-06-01
The Designs subtest allows for accumulation of raw score points by chance alone, creating the potential for artificially inflated performances, especially in older patients. A random number generator was used to simulate the random selection and placement of cards by 100 test naive participants, resulting in a mean raw score of 36.26 (SD = 3.86). This resulted in relatively high-scaled scores in the 45-54, 55-64, and 65-69 age groups on Designs II. In the latter age group, in particular, the mean simulated performance resulted in a scaled score of 7, with scores 1 SD below and above the performance mean translating to scaled scores of 5 and 8, respectively. The findings indicate that clinicians should use caution when interpreting Designs II performance in these age groups, as our simulations demonstrated that low average to average range scores occur frequently when patients are relying solely on chance performance. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Glassmire, David M; Toofanian Ross, Parnian; Kinney, Dominique I; Nitch, Stephen R
2016-06-01
Two studies were conducted to identify and cross-validate cutoff scores on the Wechsler Adult Intelligence Scale-Fourth Edition Digit Span-based embedded performance validity (PV) measures for individuals with schizophrenia spectrum disorders. In Study 1, normative scores were identified on Digit Span-embedded PV measures among a sample of patients (n = 84) with schizophrenia spectrum diagnoses who had no known incentive to perform poorly and who put forth valid effort on external PV tests. Previously identified cutoff scores resulted in unacceptable false positive rates and lower cutoff scores were adopted to maintain specificity levels ≥90%. In Study 2, the revised cutoff scores were cross-validated within a sample of schizophrenia spectrum patients (n = 96) committed as incompetent to stand trial. Performance on Digit Span PV measures was significantly related to Full Scale IQ in both studies, indicating the need to consider the intellectual functioning of examinees with psychotic spectrum disorders when interpreting scores on Digit Span PV measures. © The Author(s) 2015.
Naeger, D M; Chang, S D; Kolli, P; Shah, V; Huang, W; Thoeni, R F
2011-01-01
Objective The study compared the sensitivity, specificity, confidence and interpretation time of readers of differing experience in diagnosing acute appendicitis with contrast-enhanced CT using neutral vs positive oral contrast agents. Methods Contrast-enhanced CT for right lower quadrant or right flank pain was performed in 200 patients with neutral and 200 with positive oral contrast including 199 with proven acute appendicitis and 201 with other diagnoses. Test set disease prevalence was 50%. Two experienced gastrointestinal radiologists, one fellow and two first-year residents blindly assessed all studies for appendicitis (2000 readings) and assigned confidence scores (1=poor to 4=excellent). Receiver operating characteristic (ROC) curves were generated. Total interpretation time was recorded. Each reader's interpretation with the two agents was compared using standard statistical methods. Results Average reader sensitivity was found to be 96% (range 91–99%) with positive and 95% (89–98%) with neutral oral contrast; specificity was 96% (92–98%) and 94% (90–97%). For each reader, no statistically significant difference was found between the two agents (sensitivities p-values >0.6; specificities p-values>0.08), in the area under the ROC curve (range 0.95–0.99) or in average interpretation times. In cases without appendicitis, positive oral contrast demonstrated improved appendix identification (average 90% vs 78%) and higher confidence scores for three readers. Average interpretation times showed no statistically significant differences between the agents. Conclusion Neutral vs positive oral contrast does not affect the accuracy of contrast-enhanced CT for diagnosing acute appendicitis. Although positive oral contrast might help to identify normal appendices, we continue to use neutral oral contrast given its other potential benefits. PMID:20959365
Garcia, Isabel Fialho Fontenele; Tiuganji, Carina Tiemi; Simões, Maria do Socorro Morais Pereira; Lunardi, Adriana Claudia
2018-06-01
To test the measurement properties (reliability, interpretability, and validity) of the Life-Space Assessment questionnaire for older adults with chronic obstructive pulmonary disease. Clinimetric study. Pneumology service, ambulatory care, São Paulo, SP, Brazil. Consecutive sample of older adults ( n = 62; 38 (61%) men, 24 (39%) women) with chronic obstructive pulmonary disease. Not applicable. Life-Space Assessment questionnaire assesses five space levels visited by the older adult in four weeks prior to the assessment. We tested the following measurement properties of this questionnaire: reliability (reproducibility assessed by a type-2,1 intraclass correlation coefficient (ICC 2,1 ); internal consistency assessed by the Cronbach's alpha; measurement error by determining the standard error of measurement (SEM)), interpretability (minimum detectable change with 90% confidence (MDC 90 ); ceiling and floor effects by calculating the proportion of participants who achieved the minimum and maximum scores), and validity by Pearson's correlation test between the Life-Space Assessment questionnaire scores and number of daily steps assessed by accelerometry. Reproducibility (ICC 2,1 ) was 0.90 (95% confidence interval (CI): 0.84-0.94), and internal consistency (Cronbach's α) was 0.80 (range = 0.76-0.80 for each item deleted). SEM was 3.65 points (3%), the MDC 90 was 0.20 points, and we observed no ceiling (2%) or floor (6%) effects. We observed an association between the score of the Life-Space Assessment questionnaire and daily steps ( r = 0.43; P = 0.01). Life-Space Assessment questionnaire shows adequate measurement properties for the assessment of life-space mobility in older adults with chronic obstructive pulmonary disease.
ERIC Educational Resources Information Center
Nelis, Sharon M.; Rae, Gordon; Liddell, Christine
2006-01-01
The factor structure of the Family Emotional Involvement and Criticism Scale (FEICS) is tested in a sample of Irish adolescents. Participants were 661 adolescents with a mean age of 15.9 years (SD = 1.26). Interpretation of both the exploratory and confirmatory factor analysis of the FEICS show support for the two-factor structure of the FEICS…
da Silva, Vinicius Zacarias Maldaner; de Araújo Neto, Jose Aires; Cipriano Jr., Gerson; Pinedo, Mariela; Needham, Dale M.; Zanni, Jennifer M.; Guimarães, Fernando Silva
2017-01-01
Objective The aim of the present study was to translate and cross-culturally adapt the Functional Status Score for the intensive care unit (FSS-ICU) into Brazilian Portuguese. Methods This study consisted of the following steps: translation (performed by two independent translators), synthesis of the initial translation, back-translation (by two independent translators who were unaware of the original FSS-ICU), and testing to evaluate the target audience's understanding. An Expert Committee supervised all steps and was responsible for the modifications made throughout the process and the final translated version. Results The testing phase included two experienced physiotherapists who assessed a total of 30 critical care patients (mean FSS-ICU score = 25 ± 6). As the physiotherapists did not report any uncertainties or problems with interpretation affecting their performance, no additional adjustments were made to the Brazilian Portuguese version after the testing phase. Good interobserver reliability between the two assessors was obtained for each of the 5 FSS-ICU tasks and for the total FSS-ICU score (intraclass correlation coefficients ranged from 0.88 to 0.91). Conclusion The adapted version of the FSS-ICU in Brazilian Portuguese was easy to understand and apply in an intensive care unit environment. PMID:28444070
Electrocardiogram reading: a randomized study comparing 2 e‑learning methods for medical students.
Kopeć, Grzegorz; Waligóra, Marcin; Pacia, Michał; Chmielak, Wojciech; Stępień, Agnieszka; Janiec, Sebastian; Magoń, Wojciech; Jonas, Kamil; Podolec, Piotr
2018-02-28
INTRODUCTION Interpretation of the electrocardiogram (ECG) is an essential skill in most medical specialties; however, the best method of teaching how to read ECGs has not been determined. OBJECTIVES The aim of the study was to compare the effectiveness of collaborative (C‑eL) and self (S‑eL) e‑learning of ECG reading among medical students. PATIENTS AND METHODS A total of 60 fifth‑year medical students were randomly assigned to the C‑eL and S‑eL groups. S‑eL students received 15 ECG recordings with a comprehensive description by email (one every 48 hours), while C‑eL students received the same ECG recordings without description. C‑eL students were expected to analyze each ECG together within the subgroups using an internet platform and to submit the interpretation within 48 hours. Afterwards, they received a description of each ECG. C‑eL students' activity was assessed based on the number of words written on the internet platform during discussion. A final test consisted of 10 theoretical questions and 10 ECG recordings. The final score was a sum of points obtained for the interpretation of ECG recordings. The main endpoint of the study was the number of students whose final score was 56% or higher. RESULTS The final test was completed by 53 students (88.3%). The main endpoint was achieved in 20 C‑eL students (77%) and in 13 S‑eL students (48.1%), P = 0.03. The final score was 6.4 (interquartile range [IQR], 5.8-7.6) in the C‑eL group and 5.6 (IQR, 4.2-7.2) in the S‑eL group, P = 0.04. It correlated with the results of the theoretical test and students' activity during C‑eL (r = 0.42, P = 0.002 and r = 0.4, P = 0.04, respectively). CONCLUSIONS C‑eL of ECG reading among fifth‑year medical students is superior to S‑eL.
Hurks, P P M; Hendriksen, J G M; Dek, J E; Kooij, A P
2013-01-01
Intelligence tests are included in millions of assessments of children and adults each year (Watkins, Glutting, & Lei, 2007a , Applied Neuropsychology, 14, 13). Clinicians often interpret large amounts of subtest scatter, or large differences between the highest and lowest scaled subtest scores, on an intelligence test battery as an index for abnormality or cognitive impairment. The purpose of the present study is to characterize "normal" patterns of variability among subtests of the Dutch Wechsler Preschool and Primary Scale of Intelligence - Third Edition (WPPSI-III-NL; Wechsler, 2010 ). Therefore, the frequencies of WPPSI-III-NL scaled subtest scatter were reported for 1039 healthy children aged 4:0-7:11 years. Results indicated that large differences between highest and lowest scaled subtest scores (or subtest scatter) were common in this sample. Furthermore, degree of subtest scatter was related to: (a) the magnitude of the highest scaled subtest score, i.e., more scatter was seen in children with the highest WPPSI-III-NL scaled subtest scores, (b) Full Scale IQ (FSIQ) scores, i.e., higher FSIQ scores were associated with an increase in subtest scatter, and (c) sex differences, with boys showing a tendency to display more scatter than girls. In conclusion, viewing subtest scatter as an index for abnormality in WPPSI-III-NL scores is an oversimplification as this fails to recognize disparate subtest heterogeneity that occurs within a population of healthy children aged 4:0-7:11 years.
AlMoghrabi, Nouran; Huijding, Jorg; Franken, Ingmar H A
2018-03-01
Cognitive theories of aggression propose that biased information processing is causally related to aggression. To test these ideas, the current study investigated the effects of a novel cognitive bias modification paradigm (CBM-I) designed to target interpretations associated with aggressive behavior. Participants aged 18-33 years old were randomly assigned to either a single session of positive training (n = 40) aimed at increasing prosocial interpretations or negative training (n = 40) aimed at increasing hostile interpretations. The results revealed that the positive training resulted in an increase in prosocial interpretations while the negative training seemed to have no effect on interpretations. Importantly, in the positive condition, a positive change in interpretations was related to lower anger and verbal aggression scores after the training. In this condition, participants also reported an increase in happiness. In the negative training no such effects were found. However, the better participants performed on the negative training, the more their interpretations were changed in a negative direction and the more aggression they showed on the behavioral aggression task. Participants were healthy university students. Therefore, results should be confirmed within a clinical population. These findings provide support for the idea that this novel CBM-I paradigm can be used to modify interpretations, and suggests that these interpretations are related to mood and aggressive behavior. Copyright © 2017 Elsevier Ltd. All rights reserved.
An Automated, High-Throughput Method for Interpreting the Tandem Mass Spectra of Glycosaminoglycans
NASA Astrophysics Data System (ADS)
Duan, Jiana; Jonathan Amster, I.
2018-05-01
The biological interactions between glycosaminoglycans (GAGs) and other biomolecules are heavily influenced by structural features of the glycan. The structure of GAGs can be assigned using tandem mass spectrometry (MS2), but analysis of these data, to date, requires manually interpretation, a slow process that presents a bottleneck to the broader deployment of this approach to solving biologically relevant problems. Automated interpretation remains a challenge, as GAG biosynthesis is not template-driven, and therefore, one cannot predict structures from genomic data, as is done with proteins. The lack of a structure database, a consequence of the non-template biosynthesis, requires a de novo approach to interpretation of the mass spectral data. We propose a model for rapid, high-throughput GAG analysis by using an approach in which candidate structures are scored for the likelihood that they would produce the features observed in the mass spectrum. To make this approach tractable, a genetic algorithm is used to greatly reduce the search-space of isomeric structures that are considered. The time required for analysis is significantly reduced compared to an approach in which every possible isomer is considered and scored. The model is coded in a software package using the MATLAB environment. This approach was tested on tandem mass spectrometry data for long-chain, moderately sulfated chondroitin sulfate oligomers that were derived from the proteoglycan bikunin. The bikunin data was previously interpreted manually. Our approach examines glycosidic fragments to localize SO3 modifications to specific residues and yields the same structures reported in literature, only much more quickly.
Diagnostic grand rounds: a new teaching concept to train diagnostic reasoning.
Stieger, Stefan; Praschinger, Andrea; Kletter, Kurt; Kainberger, Franz
2011-06-01
Diagnostic reasoning is a core skill in teaching and learning in undergraduate curricula. Diagnostic grand rounds (DGRs) as a subform of grand rounds are intended to train the students' skills in the selection of appropriate tests and in the interpretation of test results. The aim of this study was to test DGRs for their ability to improve diagnostic reasoning by using a pre-post-test design. During one winter term, all 398 fifth-year students (36.1% male, 63.9% female) solved 23 clinical cases presented in 8 DGRs. In an online questionnaire, a Diagnostic Thinking Inventory (DTI) with 41 items was evaluated for flexibility in thinking and structure of knowledge in memory. Results were correlated with those from a summative multiple-choice knowledge test and of the learning objectives in a logbook. The students' DTI scores in the post-test were significantly higher than those reported in the pre-test. DTI scores at either testing time did not correlate with medical knowledge as assessed by a multiple-choice knowledge test. Abilities acquired during clinical clerkships as documented in a logbook could only account for a small proportion of the increase in the flexibility subscale score. This effect still remained significant after accounting for potential confounders. Establishing DGRs proofed to be an effective way of successfully improving both students' diagnostic reasoning and the ability to select the appropriate test method in routine clinical practice. Copyright © 2009 Elsevier Ireland Ltd. All rights reserved.
McQueen, Peter; Gates, Lucy; Marshall, Michelle; Doherty, Michael; Arden, Nigel; Bowen, Catherine
2017-01-01
The prevalence of foot osteoarthritis (OA) is much less understood than hip, knee and hand OA. The foot is anatomically complex and different researchers have investigated different joints with lack of methodological standardisation across studies. The La Trobe Foot Atlas (LFA) is the first to address these issues in providing quantitative assessment of radiographic foot OA, but has not been tested externally. The aim of this study was to evaluate three different interpretive approaches to using the LFA for grading OA when scoring is difficult due to indistinct views of interosseous space and joint contour. Foot radiographs of all remaining participants ( n = 218) assessed in the Chingford Women Study 23 year visit (mean (SD) for age: 75.5 years (5.1)) were scored using the LFA defined protocol (Technique 1). Two revised scoring strategies were applied to the radiographs in addition to the standard LFA analyses. Technique 2 categorised joints that were difficult to grade as 'missing'. Technique 3 included joints that were difficult to grade as an over estimated score. Radiographic OA prevalence was defined for the foot both collectively and separately for individual joints. When radiographs were scored using the LFA (Technique 1), radiographic foot OA was present in 89.9%. For Technique 2 the presence of radiographic foot OA was 83.5% and for Technique 3 it was 97.2%. At the individual joint level, using Technique 1, the presence of radiographic foot OA was higher with a wider range (18.3-74.3%) than Technique 2 (17.9-46.3%) and lower with a wider range (18.3-74.3%) than Technique 3 (39.9-79.4%). The three different ways of interpreting the LFA scoring system when grading of individual joints is technically difficult and result in very different estimates of foot OA prevalence at both the individual joint and global foot level. Agreement on the best strategy is required to improve comparability between studies.
ERIC Educational Resources Information Center
Villafañe, Sachel M.; Lewis, Jennifer E.
2016-01-01
Decisions about instruction, research, or policy often require the interpretation of student assessment scores. Increasingly, attitudinal variables are included in an assessment strategy, and it is important to ensure that interpretations of students' attitudinal status are based on instrument scores that apply similarly for diverse students. In…
ERIC Educational Resources Information Center
Kane, Michael T.
2016-01-01
How we choose to use a term depends on what we want to do with it. If "validity" is to be used to support a score interpretation, validation would require an analysis of the plausibility of that interpretation. If validity is to be used to support score uses, validation would require an analysis of the appropriateness of the proposed…
Wrzus, Cornelia; Egloff, Boris; Riediger, Michaela
2017-08-01
Implicit association tests (IATs) are increasingly used to indirectly assess people's traits, attitudes, or other characteristics. In addition to measuring traits or attitudes, IAT scores also reflect differences in cognitive abilities because scores are based on reaction times (RTs) and errors. As cognitive abilities change with age, questions arise concerning the usage and interpretation of IATs for people of different age. To address these questions, the current study examined how cognitive abilities and cognitive processes (i.e., quad model parameters) contribute to IAT results in a large age-heterogeneous sample. Participants (N = 549; 51% female) in an age-stratified sample (range = 12-88 years) completed different IATs and 2 tasks to assess cognitive processing speed and verbal ability. From the IAT data, D2-scores were computed based on RTs, and quad process parameters (activation of associations, overcoming bias, detection, guessing) were estimated from individual error rates. Substantial IAT scores and quad processes except guessing varied with age. Quad processes AC and D predicted D2-scores of the content-specific IAT. Importantly, the effects of cognitive abilities and quad processes on IAT scores were not significantly moderated by participants' age. These findings suggest that IATs seem suitable for age-heterogeneous studies from adolescence to old age when IATs are constructed and analyzed appropriately, for example with D-scores and process parameters. We offer further insight into how D-scoring controls for method effects in IATs and what IAT scores capture in addition to implicit representations of characteristics. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Self, D J; Schrader, D E; Baldwin, D C; Wolinsky, F D
1993-01-01
Medicine endorses a code of ethics and encourages a high moral character among doctors. This study examines the influence of medical education on the moral reasoning and development of medical students. Kohlberg's Moral Judgment Interview was given to a sample of 20 medical students (41.7% of students in that class). The students were tested at the beginning and at the end of their medical course to determine whether their moral reasoning scores had increased to the same extent as other people who extend their formal education. It was found that normally expected increases in moral reasoning scores did not occur over the 4 years of medical education for these students, suggesting that their educational experience somehow inhibited their moral reasoning ability rather than facilitating it. With a range of moral reasoning scores between 315 and 482, the finding of a mean increase from first year to fourth year of 18.5 points was not statistically significant at the P < or = 0.05 level. Statistical analysis revealed no significant correlations at the P < or = 0.05 level between the moral reasoning scores and age, gender, Medical College Admission Test scores, or grade point average scores. Along with a brief description of Kohlberg's cognitive moral development theory, some interpretations and explanations are given for the findings of the study.
Validity and reliability of Nintendo Wii Fit balance scores.
Wikstrom, Erik A
2012-01-01
Interactive gaming systems have the potential to help rehabilitate patients with musculoskeletal conditions. The Nintendo Wii Balance Board, which is part of the Wii Fit game, could be an effective tool to monitor progress during rehabilitation because the board and game can provide objective measures of balance. However, the validity and reliability of Wii Fit balance scores remain unknown. To determine the concurrent validity of balance scores produced by the Wii Fit game and the intrasession and intersession reliability of Wii Fit balance scores. Descriptive laboratory study. Sports medicine research laboratory. Forty-five recreationally active participants (age = 27.0 ± 9.8 years, height = 170.9 ± 9.2 cm, mass = 72.4 ± 11.8 kg) with a heterogeneous history of lower extremity injury. Participants completed a single-limb-stance task on a force plate and the Star Excursion Balance Test (SEBT) during the first test session. Twelve Wii Fit balance activities were completed during 2 test sessions separated by 1 week. Postural sway in the anteroposterior (AP) and mediolateral (ML) directions and the AP, ML, and resultant center-of-pressure (COP) excursions were calculated from the single-limb stance. The normalized reach distance was recorded for the anterior, posteromedial, and posterolateral directions of the SEBT. Wii Fit balance scores that the game software generated also were recorded. All 96 of the calculated correlation coefficients among Wii Fit activity outcomes and established balance outcomes were interpreted as poor (r < 0.50). Intrasession reliability for Wii Fit balance activity scores ranged from good (intraclass correlation coefficient [ICC] = 0.80) to poor (ICC = 0.39), with 8 activities having poor intrasession reliability. Similarly, 11 of the 12 Wii Fit balance activity scores demonstrated poor intersession reliability, with scores ranging from fair (ICC = 0.74) to poor (ICC = 0.29). Wii Fit balance activity scores had poor concurrent validity relative to COP outcomes and SEBT reach distances. In addition, the included Wii Fit balance activity scores generally had poor intrasession and intersession reliability.
Lupus anticoagulant: a multicenter study for a standardized and harmonized reporting.
Poz, Alessandra; Pradella, Paola; Azzarini, Gabriella; Santarossa, Liliana; Bardin, Cristina; Zardo, Lorena; Giacomello, Roberta
2016-03-01
Laboratory assessment of Lupus anticoagulant (LAC) is very challenging because of inter and intralaboratory variability, which makes it difficult to standardize and harmonize results expression. Five hospital laboratories in North-eastern Italy shared their efforts and their experience in a cross-laboratory study, conducting the diagnostic process as homogeneously as possible and providing a better interpretation for LAC positivity. Hundred normal samples from healthy subjects (20 from each center) were processed to confirm negative upper limits and calculate positivity cutoffs of LAC integrated assays, that is dilute Russell's viper venom time (dRVVT) and silica clotting time (SCT). Moreover, 311 samples previously diagnosed by the laboratories as positive for LAC were analyzed to characterize different positivity levels for each assay. As far as the analysis of healthy subjects is concerned, negative upper limits are set at 1.17 and 1.19 for dRVVT and SCT screen ratio, respectively. Positivity cutoffs are set at 1.20 for dRVVT and 1.23 for SCT, expressed as Test Ratio calculated on screen and confirm integrated tests. Positive results for each integrated assay are subsequently divided into three subgroups: weak, moderate and strong; the results obtained are presented as a score proposal that can provide LAC interpretation. The combined use of both dRVVT and SCT assays and the definition of different positivity levels may lead to clearer, more objective LAC reporting. An interpretative table for LAC-proposed score provides LAC-positive results and it is now adopted by all centers involved in the study.
Radiologist Uncertainty and the Interpretation of Screening
Carney, Patricia A.; Elmore, Joann G.; Abraham, Linn A.; Gerrity, Martha S.; Hendrick, R. Edward; Taplin, Stephen H.; Barlow, William E.; Cutter, Gary R.; Poplack, Steven P.; D’Orsi, Carl J.
2011-01-01
Objective To determine radiologists’ reactions to uncertainty when interpreting mammography and the extent to which radiologist uncertainty explains variability in interpretive performance. Methods The authors used a mailed survey to assess demographic and clinical characteristics of radiologists and reactions to uncertainty associated with practice. Responses were linked to radiologists’ actual interpretive performance data obtained from 3 regionally located mammography registries. Results More than 180 radiologists were eligible to participate, and 139 consented for a response rate of 76.8%. Radiologist gender, more years interpreting, and higher volume were associated with lower uncertainty scores. Positive predictive value, recall rates, and specificity were more affected by reactions to uncertainty than sensitivity or negative predictive value; however, none of these relationships was statistically significant. Conclusion Certain practice factors, such as gender and years of interpretive experience, affect uncertainty scores. Radiologists’ reactions to uncertainty do not appear to affect interpretive performance. PMID:15155014
Analytic study of the Tadoma method: background and preliminary results.
Norton, S J; Schultz, M C; Reed, C M; Braida, L D; Durlach, N I; Rabinowitz, W M; Chomsky, C
1977-09-01
Certain deaf-blind persons have been taught, through the Tadoma method of speechreading, to use vibrotactile cues from the face and neck to understand speech. This paper reports the results of preliminary tests of the speechreading ability of one adult Tadoma user. The tests were of four major types: (1) discrimination of speech stimuli; (2) recognition of words in isolation and in sentences; (3) interpretation of prosodic and syntactic features in sentences; and (4) comprehension of written (Braille) and oral speech. Words in highly contextual environments were much better perceived than were words in low-context environments. Many of the word errors involved phonemic substitutions which shared articulatory features with the target phonemes, with a higher error rate for vowels than consonants. Relative to performance on word-recognition tests, performance on some of the discrimination tests was worse than expected. Perception of sentences appeared to be mildly sensitive to rate of talking and to speaker differences. Results of the tests on perception of prosodic and syntactic features, while inconclusive, indicate that many of the features tested were not used in interpreting sentences. On an English comprehension test, a higher score was obtained for items administered in Braille than through oral presentation.
Neuropsychological assessment of refugees: Methodological and cross-cultural barriers.
Veliu, Bahrie; Leathem, Janet
2017-01-01
Cross-cultural research in neuropsychological assessment has primarily focused on Hispanic and African American populations. Less is known about the impact of language, culture, education, socioeconomic factors, and life experiences on assessment for other cultural groups. We highlight the methodological and cross-cultural barriers encountered at each stage of the neuropsychological assessment of Arabic- and Burmese-speaking refugees, who were culturally and linguistically diverse (CALD). A total of 18 refugees (13 men/five women; in their 20-50s) who were victims of torture in their countries of origin, some with post-traumatic stress disorder (PTSD) and now residents in New Zealand, were seen for neuropsychological assessment. Measures were officially translated, back translated, and administered with the assistance of professional interpreters. Multiple challenges arose in terms of administration (e.g., use of interpreters, interactions with the tester, assessment environment, assessment experience, and motivation), scoring, and interpretation (e.g., age appropriate scoring, estimation of prior function, estimation of brain injury severity, obtaining collateral information), the tests themselves, and ecological validity. There are more challenges in the neuropsychological assessment of people who are CALD than can be managed by adhering to current guidelines. The best approach is to find a balance between maintaining assessment integrity and working creatively and sensitively with this group.
Salajegheh, Ali; Jahangiri, Alborz; Dolan-Evans, Elliot; Pakneshan, Sahar
2016-02-03
The ability to interpret an X-Ray is a vital skill for graduating medical students which guides clinicians towards accurate diagnosis and treatment of the patient. However, research has suggested that radiological interpretation skills are less than satisfactory in not only medical students, but also in residents and consultants. This study investigated the effectiveness of e-learning for the development of X-ray interpretation skills in pre-clinical medical students. Competencies in clinical X-Ray interpretation were assessed by comparison of pre- and post-intervention scores and one year follow up assessment, where the e-learning course was the 'intervention'. Our results demonstrate improved knowledge and skills in X-ray interpretation in students. Assessment of the post training students showed significantly higher scores than the scores of control group of students undertaking the same assessment at the same time. The development of the Internet and advances in multimedia technologies has paved the way for computer-assisted education. As more rural clinical schools are established the electronic delivery of radiology teaching through websites will become a necessity. The use of e-learning to deliver radiology tuition to medical students represents an exciting alternative and is an effective method of developing competency in radiological interpretation for medical students.
NASA Astrophysics Data System (ADS)
Keown, Sandra L.
This study was devised to determine effects of the use of interactive thematic organizers and concept maps in middle school science classes during a unit study on minerals. The design, a pretest-posttest control group, consisted of matched groups (three experimental groups and one comparison group). It also included a student survey assessing qualitative aspects of the investigation. The 67 6th-grade students and one science teacher who participated in the study were from an independent K-12 school. Students represented a normal, well-distributed range of abilities. Group I (control) proceeded with their usual method of studying a unit---reading aloud the text and answering workbook questions. Group II worked with interactive thematic organizers, designed to activate prior knowledge and help students make inferences about target concepts in three treatments. Group III created three interactive concept maps, which represented both understandings and misconceptions. Concept maps were reviewed and repaired as students completed each treatment. Group IV participated in both thematic organizer and concept map treatments. Statistical analyses were determined through a pretest and a delayed recall posttest essay for all four groups. Two scores were assigned---one quantitative raw score of correct explicit answers and one rubric score based on the quality of interpretive responses. Group II also received scores for thematic organizer responses. Group III received rubric scores for concept maps. Group IV received all possible scores. Paired t-tests reported comparisons of scores across the treatment groups. A linear regression indicated whether or not concept map misconceptions affected posttest scores. Finally, an ANCOVA reported statistical significance across the four treatment groups. Findings of data analysis indicated statistically significant improvement in posttest scores among students in the three experimental groups. Students who participated in both treatments represented the highest scores among the four groups. Results of the ANCOVA indicated there was statistically significant difference in scores among the four treatments. Recommendations were made to further investigate development of interactive thematic organizers with student-chosen hyperlinks to concepts, as well as a recommendation that researchers investigate teacher understandings of interpretive purpose and form in the creation of thematic organizers.
Roberts, William L; McKinley, Danette W; Boulet, John R
2010-05-01
Due to the high-stakes nature of medical exams it is prudent for test agencies to critically evaluate test data and control for potential threats to validity. For the typical multiple station performance assessments used in medicine, it may take time for examinees to become comfortable with the test format and administrative protocol. Since each examinee in the rotational sequence starts with a different task (e.g., simulated clinical encounter), those who are administered non-scored pretest material on their first station may have an advantage compared to those who are not. The purpose of this study is to investigate whether pass/fail rates are different across the sequence of pretest encounters administered during the testing day. First-time takers were grouped by the sequential order in which they were administered the pretest encounter. No statistically significant difference in fail rates was found between examinees who started with the pretest encounter and those who encountered the pretest encounter later in the sequence. Results indicate that current examination administration protocols do not present a threat to the validity of test score interpretations.
van Dijk, Jacqueline F M; van Wijck, Albert J M; Kappen, Teus H; Peelen, Linda M; Kalkman, Cor J; Schuurmans, Marieke J
2012-01-01
Numeric pain scores have become important in clinical practice to assess postoperative pain and to help develop guidelines for treating pain. Professionals need the patients' pain scores to administer analgesic medication. However, do professionals interpret the pain scores in line with the actual perception of pain by the patients? The study aim was to assess which Numerical Rating Scale (NRS) pain score was considered bearable on a Verbal Rating Scale (VRS) by patients and professionals. This prospective study examined the relationship between the Numerical Rating Scale and a Verbal Rating Scale. The patients (n=10,434) rated their pain the day after surgery on the 11-point NRS (0=no pain and 10=worst imaginable pain) and a VRS comprising five descriptors: "no pain"; "little pain"; "painful but bearable"; "considerable pain"; and "terrible pain". The first three categories together ("no pain", "little pain" and "painful but bearable") were considered "bearable" and the last two categories ("considerable pain" and "terrible pain") were deemed as "unbearable" pain. The professionals (n=303) were asked to relate the numbers of the NRS to the words of the VRS. Most patients considered NRS 4-6 as "bearable" pain. Among professionals, anesthesiologists, Post Anaesthesia Care nurses, and ward nurses interpreted NRS scores in the same way as the patients. Only the Acute Pain Nurses interpreted the scores differently; they considered NRS of 5 and higher to be not bearable. Some care providers and patients differ in their interpretation of the postoperative NRS scores. A risk of overtreatment might arise when health care providers rigidly follow guidelines that prescribe strong analgesics for pain scores above 3 or 4 without probing the patient's preference for pharmacological treatment. Copyright © 2011 Elsevier Ltd. All rights reserved.
Risk scores for outcome in bacterial meningitis: Systematic review and external validation study.
Bijlsma, Merijn W; Brouwer, Matthijs C; Bossuyt, Patrick M; Heymans, Martijn W; van der Ende, Arie; Tanck, Michael W T; van de Beek, Diederik
2016-11-01
To perform an external validation study of risk scores, identified through a systematic review, predicting outcome in community-acquired bacterial meningitis. MEDLINE and EMBASE were searched for articles published between January 1960 and August 2014. Performance was evaluated in 2108 episodes of adult community-acquired bacterial meningitis from two nationwide prospective cohort studies by the area under the receiver operating characteristic curve (AUC), the calibration curve, calibration slope or Hosmer-Lemeshow test, and the distribution of calculated risks. Nine risk scores were identified predicting death, neurological deficit or death, or unfavorable outcome at discharge in bacterial meningitis, pneumococcal meningitis and invasive meningococcal disease. Most studies had shortcomings in design, analyses, and reporting. Evaluation showed AUCs of 0.59 (0.57-0.61) and 0.74 (0.71-0.76) in bacterial meningitis, 0.67 (0.64-0.70) in pneumococcal meningitis, and 0.81 (0.73-0.90), 0.82 (0.74-0.91), 0.84 (0.75-0.93), 0.84 (0.76-0.93), 0.85 (0.75-0.95), and 0.90 (0.83-0.98) in meningococcal meningitis. Calibration curves showed adequate agreement between predicted and observed outcomes for four scores, but statistical tests indicated poor calibration of all risk scores. One score could be recommended for the interpretation and design of bacterial meningitis studies. None of the existing scores performed well enough to recommend routine use in individual patient management. Copyright © 2016 The British Infection Association. Published by Elsevier Ltd. All rights reserved.
Goos, Matthias; Schubach, Fabian; Seifert, Gabriel; Boeker, Martin
2016-08-17
Health professionals often manage medical problems in critical situations under time pressure and on the basis of vague information. In recent years, dual process theory has provided a framework of cognitive processes to assist students in developing clinical reasoning skills critical especially in surgery due to the high workload and the elevated stress levels. However, clinical reasoning skills can be observed only indirectly and the corresponding constructs are difficult to measure in order to assess student performance. The script concordance test has been established in this field. A number of studies suggest that the test delivers a valid assessment of clinical reasoning. However, different scoring methods have been suggested. They reflect different interpretations of the underlying construct. In this work we want to shed light on the theoretical framework of script theory and give an idea of script concordance testing. We constructed a script concordance test in the clinical context of "acute abdomen" and compared previously proposed scores with regard to their validity. A test comprising 52 items in 18 clinical scenarios was developed, revised along the guidelines and administered to 56 4(th) and 5(th) year medical students at the end of a blended-learning seminar. We scored the answers using five different scoring methods (distance (2×), aggregate (2×), single best answer) and compared the scoring keys, the resulting final scores and Cronbach's α after normalization of the raw scores. All scores except the single best answers calculation achieved acceptable reliability scores (>= 0.75), as measured by Cronbach's α. Students were clearly distinguishable from the experts, whose results were set to a mean of 80 and SD of 5 by the normalization process. With the two aggregate scoring methods, the students' means values were between 62.5 (AGGPEN) and 63.9 (AGG) equivalent to about three expert SD below the experts' mean value (Cronbach's α : 0.76 (AGGPEN) and 0.75 (AGG)). With the two distance scoring methods the students' mean was between 62.8 (DMODE) and 66.8 (DMEAN) equivalent to about two expert SD below the experts' mean value (Cronbach's α: 0.77 (DMODE) and 0.79 (DMEAN)). In this study the single best answer (SBA) scoring key yielded the worst psychometric results (Cronbach's α: 0.68). Assuming the psychometric properties of the script concordance test scores are valid, then clinical reasoning skills can be measured reliably with different scoring keys in the SCT presented here. Psychometrically, the distance methods seem to be superior, wherein inherent statistical properties of the scales might play a significant role. For methodological reasons, the aggregate methods can also be used. Despite the limitations and complexity of the underlying scoring process and the calculation of reliability, we advocate for SCT because it allows a new perspective on the measurement and teaching of cognitive skills.
Using needs-based frameworks for evaluating new technologies: an application to genetic tests.
Rogowski, Wolf H; Schleidgen, Sebastian
2015-02-01
Given the multitude of newly available genetic tests in the face of limited healthcare budgets, the European Society of Human Genetics assessed how genetic services can be prioritized fairly. Using (health) benefit maximizing frameworks for this purpose has been criticized on the grounds that rather than maximization, fairness requires meeting claims (e.g. based on medical need) equitably. This study develops a prioritization score for genetic tests to facilitate equitable allocation based on need-based claims. It includes attributes representing health need associated with hereditary conditions (severity and progression), a genetic service's suitability to alleviate need (evidence of benefit and likelihood of positive result) and costs to meet the needs. A case study for measuring the attributes is provided and a suggestion is made how need-based claims can be quantified in a priority function. Attribute weights can be informed by data from discrete-choice experiments. Further work is needed to measure the attributes across the multitude of genetic tests and to determine appropriate weights. The priority score is most likely to be considered acceptable if developed within a decision process which meets criteria of procedural fairness and if the priority score is interpreted as "strength of recommendation" rather than a fixed cut-off value. Copyright © 2014. Published by Elsevier Ireland Ltd.
Roorda, Leo D; Green, John R; Houwink, Annemieke; Bagley, Pam J; Smith, Jane; Molenaar, Ivo W; Geurts, Alexander C
2012-06-01
To enable improved interpretation of the total score and faster scoring of the Rivermead Mobility Index (RMI) by studying item ordering or hierarchy and formulating start-and-stop rules in patients after stroke. Cohort study. Rehabilitation center in the Netherlands; stroke rehabilitation units and the community in the United Kingdom. Item hierarchy of the RMI was studied in an initial group of patients (n=620; mean age ± SD, 69.2±12.5y; 297 [48%] men; 304 [49%] left hemisphere lesion, and 269 [43%] right hemisphere lesion), and the adequacy of the item hierarchy-based start-and-stop rules was checked in a second group of patients (n=237; mean age ± SD, 60.0±11.3y; 139 [59%] men; 103 [44%] left hemisphere lesion, and 93 [39%] right hemisphere lesion) undergoing rehabilitation after stroke. Not applicable. Mokken scale analysis was used to investigate the fit of the double monotonicity model, indicating hierarchical item ordering. The percentages of patients with a difference between the RMI total score and the scores based on the start-and-stop rules were calculated to check the adequacy of these rules. The RMI had good fit of the double monotonicity model (coefficient H(T)=.87). The interpretation of the total score improved. Item hierarchy-based start-and-stop rules were formulated. The percentages of patients with a difference between the RMI total score and the score based on the recommended start-and-stop rules were 3% and 5%, respectively. Ten of the original 15 items had to be scored after applying the start-and-stop rules. Item hierarchy was established, enabling improved interpretation and faster scoring of the RMI. Copyright © 2012 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Cheville, Andrea L; Wang, Chun; Ni, Pengsheng; Jette, Alan M; Basford, Jeffrey R
2014-11-01
Item response theory-based patient-reported outcomes such as the Activity Measure for Post Acute Care Computerized Adaptive Test are gaining use because of their flexibility and ease of administration. Their psychometric properties are being explored, but little is known about how respondent characteristics may impact precision. The goal of this study was, therefore, to assess the effects of age, sex, and symptom intensity on respondents' test taking behaviors and scores. Three hundred eleven adults with late-stage lung cancer were consecutively enrolled between April 2008 and April 2009. Demographics and comorbidities were abstracted from their electronic medical records. The participants were followed on a 3- to 4-wk basis by telephonic interviews that involved administration of the Activity Measure for Post Acute Care Computerized Adaptive Test, followed by numerical rating scales scoring of their pain, fatigue, and dyspnea. In more than 2538 computerized adaptive test (CAT) sessions, three findings were prominent. First, the women and the older patients took longer to complete CAT sessions, were more likely to skip items, and produced scores with larger standard errors. Second, the respondents with higher levels of dyspnea and fatigue, but not pain, completed their CAT sessions more rapidly and were less likely to skip items. Third, fatigue and dyspnea interact with age but not sex to influence CAT duration and skip count. The findings of this study suggest that certain common clinical populations, for example, women, geriatric patients, and patients with intense symptoms, differ systematically in the time they are willing to devote to testing and the precision of their responses. The latter finding, unstable precision, is unlikely to be CAT specific and has implications for the interpretation of the scores of the Activity Measure for Post Acute Care Computerized Adaptive Test and other patient-reported outcomes.
Assessing Student Understanding of Physical Hydrology
NASA Astrophysics Data System (ADS)
Castillo, A. J.; Marshall, J.; Cardenas, M. B.
2012-12-01
Our objective is to characterize and assess upper division and graduate student thinking by developing and testing an assessment tool for a physical hydrology class. The class' learning goals are: (1) Quantitative process-based understanding of hydrologic processes, (2) Experience with different methods in hydrology, (3) Learning, problem solving, communication skills. These goals were translated into two measurable tasks asked of students in a questionnaire: (1) Describe the significant processes in the hydrological cycle and (2) Describe laws governing these processes. A third question below assessed the students' ability to apply their knowledge: You have been hired as a consultant by __ to (1) assess how urbanization and the current drought have affected a local spring and (2) predict what the effects will be in the future if the drought continues. What information would you need to gather? What measurements would you make? What analyses would you perform? Student and expert responses to the questions were then used to develop a rubric to score responses. Using the rubric, 3 researchers independently blind-coded the full set of pre and post artifacts, resulting in 89% inter-rater agreement on the pre-tests and 83% agreement on the post-tests. We present student scores to illustrate the use of the rubric and to characterize student thinking prior to and following a traditional course. Most students interpreted Q1 in terms of physical processes affecting the water cycle, the primary organizing framework for hydrology, as intended. On the pre-test, one student scored 0, indicating no response, on this question. Twenty students scored 1, indicating rudimentary understanding, 2 students scored a 2, indicating a basic understanding, and no student scored a 3. Student scores on this question improved on the post-test. On the 22 post-tests that were blind scored, 11 students demonstrated some recognition of concepts, 9 students showed a basic understanding, and 2 students had a full understanding of the processes linked to hydrology. Half the students had provided evidence of the desired understanding; however, half still demonstrated only a rudimentary understanding. Results on Q2 were similar. On the pre-test, 2 students scored 0, 21 students scored 1, indicating rudimentary understanding, 2 students scored a 2, and no student scored a 3. On the post-test, again approximately half the students achieved the desired understanding: 9 students showed some recognition of concepts, 12 students demonstrated a basic understanding; only one student exhibited full understanding. On Q3, no student scored 0, 9 scored 1, 15 scored 2 and 1 student scored 3. On the post-test, one student scored 1, 16 students scored 2, and 5 students scored 3. Students were significantly better at responding to Q3 (the application) as opposed to Q1 and Q2, which were more abstract. Research has shown that students are often better able to solve contextualized problems when they are unable to deal with more abstract tasks. This result has limitations including the small number of participants, all from one institution, and the fact that the rubric was still under development. Nevertheless, the high inter-rater agreement by a group of experts is significant; the rubric we developed is a potentially useful tool for assessment of learning and understanding physical hydrology. Supported by NSF CAREER grant (EAR-0955750).
Electrocardiographic interpretation skills of cardiology residents: are they competent?
Sibbald, Matthew; Davies, Edward G; Dorian, Paul; Yu, Eric H C
2014-12-01
Achieving competency at electrocardiogram (ECG) interpretation among cardiology subspecialty residents has traditionally focused on interpreting a target number of ECGs during training. However, there is little evidence to support this approach. Further, there are no data documenting the competency of ECG interpretation skills among cardiology residents, who become de facto the gold standard in their practice communities. We tested 29 Cardiology residents from all 3 years in a large training program using a set of 20 ECGs collected from a community cardiology practice over a 1-month period. Residents interpreted half of the ECGs using a standard analytic framework, and half using their own approach. Residents were scored on the number of correct and incorrect diagnoses listed. Overall diagnostic accuracy was 58%. Of 6 potentially life-threatening diagnoses, residents missed 36% (123 of 348) including hyperkalemia (81%), long QT (52%), complete heart block (35%), and ventricular tachycardia (19%). Residents provided additional inappropriate diagnoses on 238 ECGs (41%). Diagnostic accuracy was similar between ECGs interpreted using an analytic framework vs ECGs interpreted without an analytic framework (59% vs 58%; F(1,1333) = 0.26; P = 0.61). Cardiology resident proficiency at ECG interpretation is suboptimal. Despite the use of an analytic framework, there remain significant deficiencies in ECG interpretation among Cardiology residents. A more systematic method of addressing these important learning gaps is urgently needed. Copyright © 2014 Canadian Cardiovascular Society. Published by Elsevier Inc. All rights reserved.
Improved therapy-success prediction with GSS estimated from clinical HIV-1 sequences.
Pironti, Alejandro; Pfeifer, Nico; Kaiser, Rolf; Walter, Hauke; Lengauer, Thomas
2014-01-01
Rules-based HIV-1 drug-resistance interpretation (DRI) systems disregard many amino-acid positions of the drug's target protein. The aims of this study are (1) the development of a drug-resistance interpretation system that is based on HIV-1 sequences from clinical practice rather than hard-to-get phenotypes, and (2) the assessment of the benefit of taking all available amino-acid positions into account for DRI. A dataset containing 34,934 therapy-naïve and 30,520 drug-exposed HIV-1 pol sequences with treatment history was extracted from the EuResist database and the Los Alamos National Laboratory database. 2,550 therapy-change-episode baseline sequences (TCEB) were assigned to test set A. Test set B contains 1,084 TCEB from the HIVdb TCE repository. Sequences from patients absent in the test sets were used to train three linear support vector machines to produce scores that predict drug exposure pertaining to each of 20 antiretrovirals: the first one uses the full amino-acid sequences (DEfull), the second one only considers IAS drug-resistance positions (DEonlyIAS), and the third one disregards IAS drug-resistance positions (DEnoIAS). For performance comparison, test sets A and B were evaluated with DEfull, DEnoIAS, DEonlyIAS, geno2pheno[resistance], HIVdb, ANRS, HIV-GRADE, and REGA. Clinically-validated cut-offs were used to convert the continuous output of the first four methods into susceptible-intermediate-resistant (SIR) predictions. With each method, a genetic susceptibility score (GSS) was calculated for each therapy episode in each test set by converting the SIR prediction for its compounds to integer: S=2, I=1, and R=0. The GSS were used to predict therapy success as defined by the EuResist standard datum definition. Statistical significance was assessed using a Wilcoxon signed-rank test. A comparison of the therapy-success prediction performances among the different interpretation systems for test set A can be found in Table 1, while those for test set B are found in Figure 1. Therapy-success prediction of first-line therapies with DEnoIAS performed better than DEonlyIAS (p<10-16). Therapy success prediction benefits from the consideration of all available mutations. The increase in performance was largest in first-line therapies with transmitted drug-resistance mutations.
Anterior Chest Wall in Axial Spondyloarthritis: Imaging, Interpretation, and Differential Diagnosis.
Rennie, Winston J; Jans, Lennart; Jurik, Anne Grethe; Sudoł-Szopińska, Iwona; Schueller-Weidekamm, Claudia; Eshed, Iris
2018-04-01
Anterior chest wall (ACW) inflammation is not an uncommon finding in patients with axial spondyloarthritis (ax-SpA) and reportedly occurs in 26% of these patients. Radiologists may only be familiar with spinal and peripheral joint imaging, possibly due to the inherent challenges of ACW imaging on some cross-sectional imaging modalities. Knowledge of relevant joint anatomy and the location of sites of inflammation allows the interpreting radiologist to better plan appropriate imaging tests and imaging planes. Accurate assessment of disease burden, sometimes in the absence of clinical findings, may alert the treating rheumatologist, allowing a better estimation of disease burden, increased accuracy of potential imaging scoring systems, and optimize assessment and response to treatment. This article reviews salient anatomy and various imaging modalities to optimize diagnosis, important differential diagnoses, and the interpretation of ACW imaging findings in ax-SpA. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.
Albonico, Andrea; Malaspina, Manuela; Daini, Roberta
2017-09-01
The Benton Facial Recognition Test (BFRT) and Cambridge Face Memory Test (CFMT) are two of the most common tests used to assess face discrimination and recognition abilities and to identify individuals with prosopagnosia. However, recent studies highlighted that participant-stimulus match ethnicity, as much as gender, has to be taken into account in interpreting results from these tests. Here, in order to obtain more appropriate normative data for an Italian sample, the CFMT and BFRT were administered to a large cohort of young adults. We found that scores from the BFRT are not affected by participants' gender and are only slightly affected by participant-stimulus ethnicity match, whereas both these factors seem to influence the scores of the CFMT. Moreover, the inclusion of a sample of individuals with suspected face recognition impairment allowed us to show that the use of more appropriate normative data can increase the BFRT efficacy in identifying individuals with face discrimination impairments; by contrast, the efficacy of the CFMT in classifying individuals with a face recognition deficit was confirmed. Finally, our data show that the lack of inversion effect (the difference between the total score of the upright and inverted versions of the CFMT) could be used as further index to assess congenital prosopagnosia. Overall, our results confirm the importance of having norms derived from controls with a similar experience of faces as the "potential" prosopagnosic individuals when assessing face recognition abilities.
Cox, Simon R; MacPherson, Sarah E; Ferguson, Karen J; Nissan, Jack; Royle, Natalie A; MacLullich, Alasdair M J; Wardlaw, Joanna M; Deary, Ian J
2014-09-01
Both general fluid intelligence ( g f ) and performance on some 'frontal tests' of cognition decline with age. Both types of ability are at least partially dependent on the integrity of the frontal lobes, which also deteriorate with age. Overlap between these two methods of assessing complex cognition in older age remains unclear. Such overlap could be investigated using inter-test correlations alone, as in previous studies, but this would be enhanced by ascertaining whether frontal test performance and g f share neurobiological variance. To this end, we examined relationships between g f and 6 frontal tests (Tower, Self-Ordered Pointing, Simon, Moral Dilemmas, Reversal Learning and Faux Pas tests) in 90 healthy males, aged ~ 73 years. We interpreted their correlational structure using principal component analysis, and in relation to MRI-derived regional frontal lobe volumes (relative to maximal healthy brain size). g f correlated significantly and positively (.24 ≤ r ≤ .53) with the majority of frontal test scores. Some frontal test scores also exhibited shared variance after controlling for g f . Principal component analysis of test scores identified units of g f -common and g f -independent variance. The former was associated with variance in the left dorsolateral (DL) and anterior cingulate (AC) regions, and the latter with variance in the right DL and AC regions. Thus, we identify two biologically-meaningful components of variance in complex cognitive performance in older age and suggest that age-related changes to DL and AC have the greatest cognitive impact.
Meijer, Rob R; Egberink, Iris J L; Emons, Wilco H M; Sijtsma, Klaas
2008-05-01
We illustrate the usefulness of person-fit methodology for personality assessment. For this purpose, we use person-fit methods from item response theory. First, we give a nontechnical introduction to existing person-fit statistics. Second, we analyze data from Harter's (1985) Self-Perception Profile for Children (Harter, 1985) in a sample of children ranging from 8 to 12 years of age (N = 611) and argue that for some children, the scale scores should be interpreted with care and caution. Combined information from person-fit indexes and from observation, interviews, and self-concept theory showed that similar score profiles may have a different interpretation. For some children in the sample, item scores did not adequately reflect their trait level. Based on teacher interviews, this was found to be due most likely to a less developed self-concept and/or problems understanding the meaning of the questions. We recommend investigating the scalability of score patterns when using self-report inventories to help the researcher interpret respondents' behavior correctly.
ERIC Educational Resources Information Center
Dragon, Wendy R.; Ben-Porath, Yossef S.; Handel, Richard W.
2012-01-01
This article examined the impact of unscorable item responses on the psychometric validity and practical interpretability of scores on the Restructured Clinical (RC) Scales of the Minnesota Multiphasic Personality Inventory-2/Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2/MMPI-2-RF). In analyses conducted with five…
Interpreting Quality of Life after Brain Injury Scores: Cross-Walk with the Short Form-36.
Wilson, Lindsay; Marsden-Loftus, Isaac; Koskinen, Sanna; Bakx, Wilbert; Bullinger, Monika; Formisano, Rita; Maas, Andrew; Neugebauer, Edmund; Powell, Jane; Sarajuuri, Jaana; Sasse, Nadine; von Steinbuechel, Nicole; von Wild, Klaus; Truelle, Jean-Luc
2017-01-01
The Quality of Life after Brain Injury (QOLIBRI) instruments are traumatic brain injury (TBI)-specific assessments of health-related quality of life (HRQoL), with established validity and reliability. The purpose of the study is to help improve the interpretability of the two QOLIBRI summary scores (the QOLIBRI Total score and the QOLBRI Overall Scale [OS] score). An analysis was conducted of 761 patients with TBI who took part in the QOLIBRI validation studies. A cross-walk between QOLIBRI scores and the SF-36 Mental Component Summary norm-based scoring system was performed using geometric mean regression analysis. The exercise supports a previous suggestion that QOLIBRI Total scores <60 indicate low or impaired HRQoL and indicate that the corresponding score on the QOLIBRI-OS is <52. The percentage of cases in the sample that fell into the "impaired HRQoL" category was 36% for the Mental Component Summary, 38% for the QOLIBRI Total, and 39% for the QOLIBRI-OS. Relationships between the QOLIBRI scales and the Glasgow Outcome Scale-Extended (GOSE), as a measure of global function, are presented in the form of means and standard deviations that allow comparison with other studies, and data on age and sex are presented for the QOLIBRI-OS. While bearing in mind the potential imprecision of the comparison, the findings provide a framework for evaluating QOLIBRI summary scores in relation to generic HRQoL that improves their interpretability.
What dementia reveals about proverb interpretation and its neuroanatomical correlates.
Kaiser, Natalie C; Lee, Grace J; Lu, Po H; Mather, Michelle J; Shapira, Jill; Jimenez, Elvira; Thompson, Paul M; Mendez, Mario F
2013-08-01
Neuropsychologists frequently include proverb interpretation as a measure of executive abilities. A concrete interpretation of proverbs, however, may reflect semantic impairments from anterior temporal lobes, rather than executive dysfunction from frontal lobes. The investigation of proverb interpretation among patients with different dementias with varying degrees of temporal and frontal dysfunction may clarify the underlying brain-behavior mechanisms for abstraction from proverbs. We propose that patients with behavioral variant frontotemporal dementia (bvFTD), who are characteristically more impaired on proverb interpretation than those with Alzheimer's disease (AD), are disproportionately impaired because of anterior temporal-mediated semantic deficits. Eleven patients with bvFTD and 10 with AD completed the Delis-Kaplan Executive Function System (D-KEFS) Proverbs Test and a series of neuropsychological measures of executive and semantic functions. The analysis included both raw and age-adjusted normed data for multiple choice responses on the D-KEFS Proverbs Test using independent samples t-tests. Tensor-based morphometry (TBM) applied to 3D T1-weighted MRI scans mapped the association between regional brain volume and proverb performance. Computations of mean Jacobian values within select regions of interest provided a numeric summary of regional volume, and voxel-wise regression yielded 3D statistical maps of the association between tissue volume and proverb scores. The patients with bvFTD were significantly worse than those with AD in proverb interpretation. The worse performance of the bvFTD patients involved a greater number of concrete responses to common, familiar proverbs, but not to uncommon, unfamiliar ones. These concrete responses to common proverbs correlated with semantic measures, whereas concrete responses to uncommon proverbs correlated with executive functions. After controlling for dementia diagnosis, TBM analyses indicated significant correlations between impaired proverb interpretation and the anterior temporal lobe region (left>right). Among two dementia groups, those with bvFTD, demonstrated a greater number of concrete responses to common proverbs compared to those with AD, and this performance correlated with semantic deficits and the volume of the left anterior lobe, the hub of semantic knowledge. The findings of this study suggest that common proverb interpretation is greatly influenced by semantic dysfunction and that the use of proverbs for testing executive functions needs to include the interpretation of unfamiliar proverbs. Published by Elsevier Ltd.
Engaging Immigrant and Refugee Women in Breast Health Education.
Gondek, Matthew; Shogan, May; Saad-Harfouche, Frances G; Rodriguez, Elisa M; Erwin, Deborah O; Griswold, Kim; Mahoney, Martin C
2015-09-01
This project assessed the impact of a community-based educational program on breast cancer knowledge and screening among Buffalo (NY) immigrant and refugee females. Program participants completed language-matched pre- and post-test assessments during a single session educational program; breast cancer screening information was obtained from the mobile mammography unit to which participants were referred. Pre- and post-test knowledge scores were compared to assess changes in responses to each of the six individual knowledge items, as well as overall. Mammogram records were reviewed to identify Breast Imaging Reporting and Data System (BI-RADS) scores. The proportion of correct responses to each of the six knowledge items increased significantly on the post-program assessments; 33 % of women >40 years old completed mammograms. The findings suggest that a health education program for immigrant and refugee women, delivered in community-based settings and involving interpreters, can enhance breast cancer knowledge and lead to improvements in mammography completion.
Effects of arginine vasopressin on musical working memory
Granot, Roni Y.; Uzefovsky, Florina; Bogopolsky, Helena; Ebstein, Richard P.
2013-01-01
Previous genetic studies showed an association between variations in the gene coding for the 1a receptor of the neuro-hormone arginine vasopressin (AVP) and musical working memory (WM). The current study set out to test the influence of intranasal administration (INA) of AVP on musical as compared to verbal WM using a double blind crossover (AVP—placebo) design. Two groups of 25 males were exposed to 20 IU of AVP in one session, and 20 IU of saline water (placebo) in a second session, 1 week apart. In each session subjects completed the tonal subtest from Gordon's “Musical Aptitude Profile,” the interval subtest from the “Montreal Battery for Evaluation of Amusias (MBEA),” and the forward and backward digit span tests. Scores in the digit span tests were not influenced by AVP. In contrast, in the music tests there was an AVP effect. In the MBEA test, scores for the group receiving placebo in the first session (PV) were higher than for the group receiving vasopressin in the first session (VP) (p < 0.05) with no main Session effect nor Group × Session interaction. In the Gordon test there was a main Session effect (p < 0.05) with scores higher in the second as compared to the first session, a marginal main Group effect (p = 0.093) and a marginal Group × Session interaction (p = 0.88). In addition we found that the group that received AVP in the first session scored higher on scales indicative of happiness, and alertness on the positive and negative affect scale, (PANAS). Only in this group and only in the music test these scores were significantly correlated with memory scores. Together the results reflect a complex interaction between AVP, musical memory, arousal, and contextual effects such as session, and base levels of memory. The results are interpreted in light of music's universal use as a means to modulate arousal on the one hand, and AVP's influence on mood, arousal, and social interactions on the other. PMID:24151474
Gardner, Ryan M; Yengo-Kahn, Aaron; Bonfield, Christopher M; Solomon, Gary S
2017-02-01
Baseline and post-concussion neurocognitive testing is useful in managing concussed athletes. Attention deficit hyperactivity disorder (ADHD) and stimulant medications are recognized as potential modifiers of performance on neurocognitive testing by the Concussion in Sport Group. Our goal was to assess whether individuals with ADHD perform differently on post-concussion testing and if this difference is related to the use of stimulants. Retrospective case-control study in which 4373 athletes underwent baseline and post-concussion testing using the ImPACT battery. 277 athletes self-reported a history of ADHD, of which, 206 reported no stimulant treatment and 69 reported stimulant treatment. Each group was matched with participants reporting no history of ADHD or stimulant use on several biopsychosocial characteristics. Non-parametric tests were used to assess ImPACT composite score differences between groups. Participants with ADHD had worse verbal memory, visual memory, visual motor speed, and reaction time scores than matched controls at baseline and post-concussion, all with p ≤ .001 and |r|≥ 0.100. Athletes without stimulant treatment had lower verbal memory, visual memory, visual motor speed, and reaction time scores than controls at baseline (p ≤ 0.01, |r|≥ 0.100 [except verbal memory, r = -0.088]) and post-concussion (p = 0.000, |r|> 0.100). Athletes with stimulant treatment had lower verbal memory (Baseline: p = 0.047, r = -0.108; Post-concussion: p = 0.023, r = -0.124) and visual memory scores (Baseline: p = 0.013, r = -0.134; Post-concussion: p = 0.003, r = -0.162) but equivalent visual motor speed and reaction time scores versus controls at baseline and post-concussion. ADHD-specific baseline and post-concussion neuropsychological profiles, as well as stimulant medication status, may need to be considered when interpreting ImPACT test results. Further investigation into the effects of ADHD and stimulant use on recovery from sport-related concussion (SRC) is warranted.
Daskivich, Timothy; Luu, Michael; Noah, Benjamin; Fuller, Garth; Anger, Jennifer; Spiegel, Brennan
2018-05-09
Health care consumers are increasingly using online ratings to select providers, but differences in the distribution of scores across specialties and skew of the data have the potential to mislead consumers about the interpretation of ratings. The objective of our study was to determine whether distributions of consumer ratings differ across specialties and to provide specialty-specific data to assist consumers and clinicians in interpreting ratings. We sampled 212,933 health care providers rated on the Healthgrades consumer ratings website, representing 29 medical specialties (n=128,678), 15 surgical specialties (n=72,531), and 6 allied health (nonmedical, nonnursing) professions (n=11,724) in the United States. We created boxplots depicting distributions and tested the normality of overall patient satisfaction scores. We then determined the specialty-specific percentile rank for scores across groupings of specialties and individual specialties. Allied health providers had higher median overall satisfaction scores (4.5, interquartile range [IQR] 4.0-5.0) than physicians in medical specialties (4.0, IQR 3.3-4.5) and surgical specialties (4.2, IQR 3.6-4.6, P<.001). Overall satisfaction scores were highly left skewed (normal between -0.5 and 0.5) for all specialties, but skewness was greatest among allied health providers (-1.23, 95% CI -1.280 to -1.181), followed by surgical (-0.77, 95% CI -0.787 to -0.755) and medical specialties (-0.64, 95% CI -0.648 to -0.628). As a result of the skewness, the percentages of overall satisfaction scores less than 4 were only 23% for allied health, 37% for surgical specialties, and 50% for medical specialties. Percentile ranks for overall satisfaction scores varied across specialties; percentile ranks for scores of 2 (0.7%, 2.9%, 0.8%), 3 (5.8%, 16.6%, 8.1%), 4 (23.0%, 50.3%, 37.3%), and 5 (63.9%, 89.5%, 86.8%) differed for allied health, medical specialties, and surgical specialties, respectively. Online consumer ratings of health care providers are highly left skewed, fall within narrow ranges, and differ by specialty, which precludes meaningful interpretation by health care consumers. Specialty-specific percentile ranks may help consumers to more meaningfully assess online physician ratings. ©Timothy Daskivich, Michael Luu, Benjamin Noah, Garth Fuller, Jennifer Anger, Brennan Spiegel. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 09.05.2018.
Liu, Ying-Buh; Yang, Stephen S; Hsieh, Cheng-Hsing; Lin, Chia-Da; Chang, Shang-Jen
2014-05-01
To evaluate the inter-observer, intra-observer and intra-individual reliability of uroflowmetry and post-void residual urine (PVR) tests in adult men. Healthy volunteers aged over 40 years were enrolled. Every participant underwent two sets of uroflowmetry and PVR tests with a 2-week interval between the tests. The uroflowmetry tests were interpreted by four urologists independently. Uroflowmetry curves were classified as bell-shaped, bell-shaped with tail, obstructive, restrictive, staccato, interrupted and tower-shaped and scored from 1 (highly abnormal) to 5 (absolutely normal). The agreements between the observers, interpretations and tests within individuals were analyzed using kappa statistics and intraclass correlation coefficients. Generalizability theory with decision analysis was used to determine how many observers, tests, and interpretations were needed to obtain an acceptable reliability (> 0.80). Of 108 volunteers, we randomly selected the uroflowmetry results from 25 participants for the evaluation of reliability. The mean age of the studied adults was 55.3 years. The intra-individual and intra-observer reliability on uroflowmetry tests ranged from good to very good. However, the inter-observer reliability on normalcy and specific type of flow pattern were relatively lower. In generalizability theory, three observers were needed to obtain an acceptable reliability on normalcy of uroflow pattern if the patient underwent uroflowmetry tests twice with one observation. The intra-individual and intra-observer reliability on uroflowmetry tests were good while the inter-observer reliability was relatively lower. To improve inter-observer reliability, the definition of uroflowmetry should be clarified by the International Continence Society. © 2013 Wiley Publishing Asia Pty Ltd.
Reporting quality of multivariable logistic regression in selected Indian medical journals.
Kumar, R; Indrayan, A; Chhabra, P
2012-01-01
Use of multivariable logistic regression (MLR) modeling has steeply increased in the medical literature over the past few years. Testing of model assumptions and adequate reporting of MLR allow the reader to interpret results more accurately. To review the fulfillment of assumptions and reporting quality of MLR in selected Indian medical journals using established criteria. Analysis of published literature. Medknow.com publishes 68 Indian medical journals with open access. Eight of these journals had at least five articles using MLR between the years 1994 to 2008. Articles from each of these journals were evaluated according to the previously established 10-point quality criteria for reporting and to test the MLR model assumptions. SPSS 17 software and non-parametric test (Kruskal-Wallis H, Mann Whitney U, Spearman Correlation). One hundred and nine articles were finally found using MLR for analyzing the data in the selected eight journals. The number of such articles gradually increased after year 2003, but quality score remained almost similar over time. P value, odds ratio, and 95% confidence interval for coefficients in MLR was reported in 75.2% and sufficient cases (>10) per covariate of limiting sample size were reported in the 58.7% of the articles. No article reported the test for conformity of linear gradient for continuous covariates. Total score was not significantly different across the journals. However, involvement of statistician or epidemiologist as a co-author improved the average quality score significantly (P=0.014). Reporting of MLR in many Indian journals is incomplete. Only one article managed to score 8 out of 10 among 109 articles under review. All others scored less. Appropriate guidelines in instructions to authors, and pre-publication review of articles using MLR by a qualified statistician may improve quality of reporting.
Cox, Simon R.; MacPherson, Sarah E.; Ferguson, Karen J.; Nissan, Jack; Royle, Natalie A.; MacLullich, Alasdair M.J.; Wardlaw, Joanna M.; Deary, Ian J.
2014-01-01
Both general fluid intelligence (gf) and performance on some ‘frontal tests’ of cognition decline with age. Both types of ability are at least partially dependent on the integrity of the frontal lobes, which also deteriorate with age. Overlap between these two methods of assessing complex cognition in older age remains unclear. Such overlap could be investigated using inter-test correlations alone, as in previous studies, but this would be enhanced by ascertaining whether frontal test performance and gf share neurobiological variance. To this end, we examined relationships between gf and 6 frontal tests (Tower, Self-Ordered Pointing, Simon, Moral Dilemmas, Reversal Learning and Faux Pas tests) in 90 healthy males, aged ~ 73 years. We interpreted their correlational structure using principal component analysis, and in relation to MRI-derived regional frontal lobe volumes (relative to maximal healthy brain size). gf correlated significantly and positively (.24 ≤ r ≤ .53) with the majority of frontal test scores. Some frontal test scores also exhibited shared variance after controlling for gf. Principal component analysis of test scores identified units of gf-common and gf-independent variance. The former was associated with variance in the left dorsolateral (DL) and anterior cingulate (AC) regions, and the latter with variance in the right DL and AC regions. Thus, we identify two biologically-meaningful components of variance in complex cognitive performance in older age and suggest that age-related changes to DL and AC have the greatest cognitive impact. PMID:25278641
Torres-Mejía, Gabriela; Smith, Robert A; Carranza-Flores, María de la Luz; Bogart, Andy; Martínez-Matsushita, Louis; Miglioretti, Diana L; Kerlikowske, Karla; Ortega-Olvera, Carolina; Montemayor-Varela, Ernesto; Angeles-Llerenas, Angélica; Bautista-Arredondo, Sergio; Sánchez-González, Gilberto; Martínez-Montañez, Olga G; Uscanga-Sánchez, Santos R; Lazcano-Ponce, Eduardo; Hernández-Ávila, Mauricio
2015-05-16
An alternative approach to the traditional model of radiologists interpreting screening mammography is necessary due to the shortage of radiologists to interpret screening mammograms in many countries. We evaluated the performance of 15 Mexican radiographers, also known as radiologic technologists, in the interpretation of screening mammography after a 6 months training period in a screening setting. Fifteen radiographers received 6 months standardized training with radiologists in the interpretation of screening mammography using the Breast Imaging Reporting and Data System (BI-RADS) system. A challenging test set of 110 cases developed by the Breast Cancer Surveillance Consortium was used to evaluate their performance. We estimated sensitivity, specificity, false positive rates, likelihood ratio of a positive test (LR+) and the area under the subject-specific Receiver Operating Characteristic (ROC) curve (AUC) for diagnostic accuracy. A mathematical model simulating the consequences in costs and performance of two hypothetical scenarios compared to the status quo in which a radiologist reads all screening mammograms was also performed. Radiographer's sensitivity was comparable to the sensitivity scores achieved by U.S. radiologists who took the test but their false-positive rate was higher. Median sensitivity was 73.3 % (Interquartile range, IQR: 46.7-86.7 %) and the median false positive rate was 49.5 % (IQR: 34.7-57.9 %). The median LR+ was 1.4 (IQR: 1.3-1.7 %) and the median AUC was 0.6 (IQR: 0.6-0.7). A scenario in which a radiographer reads all mammograms first, and a radiologist reads only those that were difficult for the radiographer, was more cost-effective than a scenario in which either the radiographer or radiologist reads all mammograms. Given the comparable sensitivity achieved by Mexican radiographers and U.S. radiologists on a test set, screening mammography interpretation by radiographers appears to be a possible adjunct to radiologists in countries with shortages of radiologists. Further studies are required to assess the effectiveness of different training programs in order to obtain acceptable screening accuracy, as well as the best approaches for the use of non-physician readers to interpret screening mammography.
Executive performance in older Portuguese adults with low education.
Pavão Martins, Isabel; Maruta, Carolina; Freitas, Vanda; Mares, Inês
2013-01-01
Evaluation of executive functions is essential in clinical diagnosis, yet there are limited data regarding the performance of participants with low education. We present results on several measures of executive functions obtained in community-dwelling adults with an overall low education and study the effect of this variable in each test. A sample of 479 adults (64% female, mean age 66.4 years) was assessed by a battery comprising 13 measures of executive function (Trail Making Test; Symbol Search; Matrix reasoning; Semantic and phonemic verbal fluencies; Stroop test; and digit spans). Tests' psychometric properties and the effects of age, gender, and education were studied across education levels within each age group. Tests showed good psychometric properties. Education explained more variance than age in the majority of measures, with lower educational levels being significantly associated to worse scores. Tables are presented with mean scores, standard deviation, and the value of extreme percentiles for younger (50-65, N = 232) and older (>65 years, N = 247) × education (0-3, 4, 5-9, and >9 years) subgroups. Education-adjusted norms are necessary for an adequate interpretation of test results. The present data may be useful for clinicians caring for populations with low literacy.
Christian, Josef; Kröll, Josef; Schwameder, Hermann
2017-06-01
Common summary measures of gait quality such as the Gait Profile Score (GPS) are based on the principle of measuring a distance from the mean pattern of a healthy reference group in a gait pattern vector space. The recently introduced Classifier Oriented Gait Score (COGS) is a pathology specific score that measures this distance in a unique direction, which is indicated by a linear classifier. This approach has potentially improved the discriminatory power to detect subtle changes in gait patterns but does not incorporate a profile of interpretable sub-scores like the GPS. The main aims of this study were to extend the COGS by decomposing it into interpretable sub-scores as realized in the GPS and to compare the discriminative power of the GPS and COGS. Two types of gait impairments were imitated to enable a high level of control of the gait patterns. Imitated impairments were realized by restricting knee extension and inducing leg length discrepancy. The results showed increased discriminatory power of the COGS for differentiating diverse levels of impairment. Comparison of the GPS and COGS sub-scores and their ability to indicate changes in specific variables supports the validity of both scores. The COGS is an overall measure of gait quality with increased power to detect subtle changes in gait patterns and might be well suited for tracing the effect of a therapeutic treatment over time. The newly introduced sub-scores improved the interpretability of the COGS, which is helpful for practical applications. Copyright © 2017 Elsevier B.V. All rights reserved.
Applications of "Integrated Data Viewer'' (IDV) in the classroom
NASA Astrophysics Data System (ADS)
Nogueira, R.; Cutrim, E. M.
2006-06-01
Conventionally, weather products utilized in synoptic meteorology reduce phenomena occurring in four dimensions to a 2-dimensional form. This constitutes a road-block for non-atmospheric-science majors who need to take meteorology as a non-mathematical and complementary course to their major programs. This research examines the use of Integrated Data Viewer-IDV as a teaching tool, as it allows a 4-dimensional representation of weather products. IDV was tested in the teaching of synoptic meteorology, weather analysis, and weather map interpretation to non-science students in the laboratory sessions of an introductory meteorology class at Western Michigan University. Comparison of student exam scores according to the laboratory teaching techniques, i.e., traditional lab manual and IDV was performed for short- and long-term learning. Results of the statistical analysis show that the Fall 2004 students in the IDV-based lab session retained learning. However, in the Spring 2005 the exam scores did not reflect retention in learning when compared with IDV-based and MANUAL-based lab scores (short term learning, i.e., exam taken one week after the lab exercise). Testing the long-term learning, seven weeks between the two exams in the Spring 2005, show no statistically significant difference between IDV-based group scores and MANUAL-based group scores. However, the IDV group obtained exam score average slightly higher than the MANUAL group. Statistical testing of the principal hypothesis in this study, leads to the conclusion that the IDV-based method did not prove to be a better teaching tool than the traditional paper-based method. Future studies could potentially find significant differences in the effectiveness of both manual and IDV methods if the conditions had been more controlled. That is, students in the control group should not be exposed to the weather analysis using IDV during lecture.
Gadbury-Amyot, Cynthia C; McCracken, Michael S; Woldt, Janet L; Brennan, Robert L
2014-05-01
The purpose of this study was to empirically investigate the validity and reliability of portfolio assessment in two U.S. dental schools using a unified framework for validity. In the process of validation, it is not the test that is validated but rather the claims (interpretations and uses) about test scores that are validated. Kane's argument-based validation framework provided the structure for reporting results where validity claims are followed by evidence to support the argument. This multivariate generalizability theory study found that the greatest source of variance was attributable to faculty raters, suggesting that portfolio assessment would benefit from two raters' evaluating each portfolio independently. The results are generally supportive of holistic scoring, but analytical scoring deserves further research. Correlational analyses between student portfolios and traditional measures of student competence and readiness for licensure resulted in significant correlations between portfolios and National Board Dental Examination Part I (r=0.323, p<0.01) and Part II scores (r=0.268, p<0.05) and small and non-significant correlations with grade point average and scores on the Western Regional Examining Board (WREB) exam. It is incumbent upon the users of portfolio assessment to determine if the claims and evidence arguments set forth in this study support the proposed claims for and decisions about portfolio assessment in their respective institutions.
Kopcinovic, Lara Milevoj; Vogrinc, Zeljka; Kocijan, Irena; Culej, Jelena; Aralica, Merica; Jokic, Anja; Antoncic, Dragana; Bozovic, Marija
2016-10-15
We hypothesized that extravascular body fluid (EBF) analysis in Croatia is not harmonized and aimed to investigate preanalytical, analytical and postanalytical procedures used in EBF analysis in order to identify key aspects that should be addressed in future harmonization attempts. An anonymous online survey created to explore laboratory testing of EBF was sent to secondary, tertiary and private health care Medical Biochemistry Laboratories (MBLs) in Croatia. Statements were designed to address preanalytical, analytical and postanalytical procedures of cerebrospinal, pleural, peritoneal (ascites), pericardial, seminal, synovial, amniotic fluid and sweat. Participants were asked to declare the strength of agreement with proposed statements using a Likert scale. Mean scores for corresponding separate statements divided according to health care setting were calculated and compared. The survey response rate was 0.64 (58 / 90). None of the participating private MBLs declared to analyse EBF. We report a mean score of 3.45 obtained for all statements evaluated. Deviations from desirable procedures were demonstrated in all EBF testing phases. Minor differences in procedures used for EBF analysis comparing secondary and tertiary health care MBLs were found. The lowest scores were obtained for statements regarding quality control procedures in EBF analysis, participation in proficiency testing programmes and provision of interpretative comments on EBF's test reports. Although good laboratory EBF practice is present in Croatia, procedures for EBF analysis should be further harmonized to improve the quality of EBF testing and patient safety.
Lucas, John A; Ivnik, Robert J; Smith, Glenn E; Ferman, Tanis J; Willis, Floyd B; Petersen, Ronald C; Graff-Radford, Neill R
2005-06-01
Normative data for older African Americans are presented for several clinical neuropsychological measures, including Boston Naming Test, Controlled Oral Word Association, Category Fluency, Token Test, WRAT-3 Reading, Trail Making Test, Stroop Color and Word Test, and Judgment of Line Orientation. Age-adjusted norms were derived from a sample of 309 cognitively normal, community-dwelling individuals, aged 56 through 94, participating in Mayo's Older African Americans Normative Studies (MOAANS). Years of education were modelled on age-scaled scores to derive regression Equations that may be applied for further demographic correction. These data should enhance interpretation of individual test performances and facilitate analysis of neuropsychological profile patterns in older African American patients who present for dementia evaluations.
Weech-Maldonado, Robert; Dreachslin, Janice L; Brown, Julie; Pradhan, Rohit; Rubin, Kelly L; Schiller, Cameron; Hays, Ron D
2012-01-01
The U.S. national standards for culturally and linguistically appropriate services (CLAS) in health care provide guidelines on policies and practices aimed at developing culturally competent systems of care. The Cultural Competency Assessment Tool for Hospitals (CCATH) was developed as an organizational tool to assess adherence to the CLAS standards. First, we describe the development of the CCATH and estimate the reliability and validity of the CCATH measures. Second, we discuss the managerial implications of the CCATH as an organizational tool to assess cultural competency. We pilot tested an initial draft of the CCATH, revised it based on a focus group and cognitive interviews, and then administered it in a field test with a sample of California hospitals. The reliability and validity of the CCATH were evaluated using factor analysis, analysis of variance, and Cronbach's alphas. Exploratory and confirmatory factor analyses identified 12 CCATH composites: leadership and strategic planning, data collection on inpatient population, data collection on service area, performance management systems and quality improvement, human resources practices, diversity training, community representation, availability of interpreter services, interpreter services policies, quality of interpreter services, translation of written materials, and clinical cultural competency practices. All the CCATH scales had internal consistency reliability of .65 or above, and the reliability was .70 or above for 9 of the 12 scales. Analysis of variance results showed that not-for-profit hospitals have higher CCATH scores than for-profit hospitals in five CCATH scales and higher CCATH scores than government hospitals in two CCATH scales. The CCATH showed adequate psychometric properties. Managers and policy makers can use the CCATH as a tool to evaluate hospital performance in cultural competency and identify and target improvements in hospital policies and practices that undergird the provision of CLAS.
Arús, Nádia A; da Silva, Átila M; Duarte, Rogério; da Silveira, Priscila F; Vizzotto, Mariana B; da Silveira, Heraldo L D; da Silveira, Heloisa E D
2017-06-01
The aims of this study were to evaluate and compare the performance of dental students in interpreting the temporomandibular joint (TMJ) with magnetic resonance imaging (MRI) scans using two learning methods (conventional and digital interactive learning) and to examine the usability of the digital learning object (DLO). The DLO consisted of tutorials about MRI and anatomic and functional aspects of the TMJ. In 2014, dental students in their final year of study who were enrolled in the elective "MRI Interpretation of the TMJ" course comprised the study sample. After exclusions for nonattendance and other reasons, 29 of the initial 37 students participated in the study, for a participation rate of 78%. The participants were divided into two groups: a digital interactive learning group (n=14) and a conventional learning group (n=15). Both methods were assessed by an objective test applied before and after training and classes. Aspects such as support and training requirements, complexity, and consistency of the DLO were also evaluated using the System Usability Scale (SUS). A significant between-group difference in the posttest results was found, with the conventional learning group scoring better than the DLO group, indicated by mean scores of 9.20 and 8.11, respectively, out of 10. However, when the pretest and posttest results were compared, both groups showed significantly improved performance. The SUS score was 89, which represented a high acceptance of the DLO by the users. The students who used the conventional method of learning showed superior performance in interpreting the TMJ using MRI compared to the group that used digital interactive learning.
Gerard, James M; Scalzo, Anthony J; Borgman, Matthew A; Watson, Christopher M; Byrnes, Chelsie E; Chang, Todd P; Auerbach, Marc; Kessler, David O; Feldman, Brian L; Payne, Brian S; Nibras, Sohail; Chokshi, Riti K; Lopreiato, Joseph O
2018-06-01
We developed a first-person serious game, PediatricSim, to teach and assess performances on seven critical pediatric scenarios (anaphylaxis, bronchiolitis, diabetic ketoacidosis, respiratory failure, seizure, septic shock, and supraventricular tachycardia). In the game, players are placed in the role of a code leader and direct patient management by selecting from various assessment and treatment options. The objective of this study was to obtain supportive validity evidence for the PediatricSim game scores. Game content was developed by 11 subject matter experts and followed the American Heart Association's 2011 Pediatric Advanced Life Support Provider Manual and other authoritative references. Sixty subjects with three different levels of experience were enrolled to play the game. Before game play, subjects completed a 40-item written pretest of knowledge. Game scores were compared between subject groups using scoring rubrics developed for the scenarios. Validity evidence was established and interpreted according to Messick's framework. Content validity was supported by a game development process that involved expert experience, focused literature review, and pilot testing. Subjects rated the game favorably for engagement, realism, and educational value. Interrater agreement on game scoring was excellent (intraclass correlation coefficient = 0.91, 95% confidence interval = 0.89-0.9). Game scores were higher for attendings followed by residents then medical students (Pc < 0.01) with large effect sizes (1.6-4.4) for each comparison. There was a very strong, positive correlation between game and written test scores (r = 0.84, P < 0.01). These findings contribute validity evidence for PediatricSim game scores to assess knowledge of pediatric emergency medicine resuscitation.
Overnight shift work: factors contributing to diagnostic discrepancies.
Hanna, Tarek N; Loehfelm, Thomas; Khosa, Faisal; Rohatgi, Saurabh; Johnson, Jamlik-Omari
2016-02-01
The aims of the study are to identify factors contributing to preliminary interpretive discrepancies on overnight radiology resident shifts and apply this data in the context of known literature to draw parallels to attending overnight shift work schedules. Residents in one university-based training program provided preliminary interpretations of 18,488 overnight (11 pm–8 am) studies at a level 1 trauma center between July 1, 2013 and December 31, 2014. As part of their normal workflow and feedback, attendings scored the reports as major discrepancy, minor discrepancy, agree, and agree--good job. We retrospectively obtained the preliminary interpretation scores for each study. Total relative value units (RVUs) per shift were calculated as an indicator of overnight workload. The dataset was supplemented with information on trainee level, number of consecutive nights on night float, hour, modality, and per-shift RVU. The data were analyzed with proportional logistic regression and Fisher's exact test. There were 233 major discrepancies (1.26 %). Trainee level (senior vs. junior residents; 1.08 vs. 1.38 %; p < 0.05) and modality were significantly associated with performance. Increased workload affected more junior residents' performance, with R3 residents performing significantly worse on busier nights. Hour of the night was not significantly associated with performance, but there was a trend toward best performance at 2 am, with subsequent decreased accuracy throughout the remaining shift hours. Improved performance occurred after the first six night float shifts, presumably as residents acclimated to a night schedule. As overnight shift work schedules increase in popularity for residents and attendings, focused attention to factors impacting interpretative accuracy is warranted.
Beyond Academia - Interrogating Research Impact in the Research Excellence Framework.
Terama, Emma; Smallman, Melanie; Lock, Simon J; Johnson, Charlotte; Austwick, Martin Zaltz
2016-01-01
Big changes to the way in which research funding is allocated to UK universities were brought about in the Research Excellence Framework (REF), overseen by the Higher Education Funding Council, England. Replacing the earlier Research Assessment Exercise, the purpose of the REF was to assess the quality and reach of research in UK universities-and allocate funding accordingly. For the first time, this included an assessment of research 'impact', accounting for 20% of the funding allocation. In this article we use a text mining technique to investigate the interpretations of impact put forward via impact case studies in the REF process. We find that institutions have developed a diverse interpretation of impact, ranging from commercial applications to public and cultural engagement activities. These interpretations of impact vary from discipline to discipline and between institutions, with more broad-based institutions depicting a greater variety of impacts. Comparing the interpretations with the score given by REF, we found no evidence of one particular interpretation being more highly rewarded than another. Importantly, we also found a positive correlation between impact score and [overall research] quality score, suggesting that impact is not being achieved at the expense of research excellence.
Beyond Academia – Interrogating Research Impact in the Research Excellence Framework
Smallman, Melanie; Lock, Simon J.; Johnson, Charlotte; Austwick, Martin Zaltz
2016-01-01
Big changes to the way in which research funding is allocated to UK universities were brought about in the Research Excellence Framework (REF), overseen by the Higher Education Funding Council, England. Replacing the earlier Research Assessment Exercise, the purpose of the REF was to assess the quality and reach of research in UK universities–and allocate funding accordingly. For the first time, this included an assessment of research ‘impact’, accounting for 20% of the funding allocation. In this article we use a text mining technique to investigate the interpretations of impact put forward via impact case studies in the REF process. We find that institutions have developed a diverse interpretation of impact, ranging from commercial applications to public and cultural engagement activities. These interpretations of impact vary from discipline to discipline and between institutions, with more broad-based institutions depicting a greater variety of impacts. Comparing the interpretations with the score given by REF, we found no evidence of one particular interpretation being more highly rewarded than another. Importantly, we also found a positive correlation between impact score and [overall research] quality score, suggesting that impact is not being achieved at the expense of research excellence. PMID:27997599
Chen, Song; Li, Xuena; Chen, Meijie; Yin, Yafu; Li, Na; Li, Yaming
2016-10-01
This study is aimed to compare the diagnostic power of using quantitative analysis or visual analysis with single time point imaging (STPI) PET/CT and dual time point imaging (DTPI) PET/CT for the classification of solitary pulmonary nodules (SPN) lesions in granuloma-endemic regions. SPN patients who received early and delayed (18)F-FDG PET/CT at 60min and 180min post-injection were retrospectively reviewed. Diagnoses are confirmed by pathological results or follow-ups. Three quantitative metrics, early SUVmax, delayed SUVmax and retention index(the percentage changes between the early SUVmax and delayed SUVmax), were measured for each lesion. Three 5-point scale score was given by blinded interpretations performed by physicians based on STPI PET/CT images, DTPI PET/CT images and CT images, respectively. ROC analysis was performed on three quantitative metrics and three visual interpretation scores. One-hundred-forty-nine patients were retrospectively included. The areas under curve (AUC) of the ROC curves of early SUVmax, delayed SUVmax, RI, STPI PET/CT score, DTPI PET/CT score and CT score are 0.73, 0.74, 0.61, 0.77 0.75 and 0.76, respectively. There were no significant differences between the AUCs in visual interpretation of STPI PET/CT images and DTPI PET/CT images, nor in early SUVmax and delayed SUVmax. The differences of sensitivity, specificity and accuracy between STPI PET/CT and DTPI PET/CT were not significantly different in either quantitative analysis or visual interpretation. In granuloma-endemic regions, DTPI PET/CT did not offer significant improvement over STPI PET/CT in differentiating malignant SPNs in both quantitative analysis and visual interpretation. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Morris, Roisin; MacNeela, Padraig; Scott, Anne; Treacy, Pearl; Hyde, Abbey; O'Brien, Julian; Lehwaldt, Daniella; Byrne, Anne; Drennan, Jonathan
2008-04-01
In a study to establish the interrater reliability of the Irish Nursing Minimum Data Set (I-NMDS) for mental health difficulties relating to the choice of reliability test statistic were encountered. The objective of this paper is to highlight the difficulties associated with testing interrater reliability for an ordinal scale using a relatively homogenous sample and the recommended kw statistic. One pair of mental health nurses completed the I-NMDS for mental health for a total of 30 clients attending a mental health day centre over a two-week period. Data was analysed using the kw and percentage agreement statistics. A total of 34 of the 38 I-NMDS for mental health variables with lower than acceptable levels of kw reliability scores achieved acceptable levels of reliability according to their percentage agreement scores. The study findings implied that, due to the homogeneity of the sample, low variability within the data resulted in the 'base rate problem' associated with the use of kw statistic. Conclusions point to the interpretation of kw in tandem with percentage agreement scores. Suggestions that kw scores were low due to chance agreement and that one should strive to use a study sample with known variability are queried.
Interrater reliability: the kappa statistic.
McHugh, Mary L
2012-01-01
The kappa statistic is frequently used to test interrater reliability. The importance of rater reliability lies in the fact that it represents the extent to which the data collected in the study are correct representations of the variables measured. Measurement of the extent to which data collectors (raters) assign the same score to the same variable is called interrater reliability. While there have been a variety of methods to measure interrater reliability, traditionally it was measured as percent agreement, calculated as the number of agreement scores divided by the total number of scores. In 1960, Jacob Cohen critiqued use of percent agreement due to its inability to account for chance agreement. He introduced the Cohen's kappa, developed to account for the possibility that raters actually guess on at least some variables due to uncertainty. Like most correlation statistics, the kappa can range from -1 to +1. While the kappa is one of the most commonly used statistics to test interrater reliability, it has limitations. Judgments about what level of kappa should be acceptable for health research are questioned. Cohen's suggested interpretation may be too lenient for health related studies because it implies that a score as low as 0.41 might be acceptable. Kappa and percent agreement are compared, and levels for both kappa and percent agreement that should be demanded in healthcare studies are suggested.
Brief Self-Efficacy Scales for use in Weight-Loss Trials: Preliminary Evidence of Validity
Wilson, Kathryn E.; Harden, Samantha M.; Almeida, Fabio A.; You, Wen; Hill, Jennie L.; Goessl, Cody; Estabrooks, Paul A.
2015-01-01
Self-efficacy is a commonly included cognitive variable in weight-loss trials, but there is little uniformity in its measurement. Weight-loss trials frequently focus on physical activity (PA) and eating behavior, as well as weight loss, but no survey is available that offers reliable measurement of self-efficacy as it relates to each of these targeted outcomes. The purpose of this study was to test the psychometric properties of brief, pragmatic self-efficacy scales specific to PA, healthful eating and weight-loss (4 items each). An adult sample (n=1790) from 28 worksites enrolled in a worksite weight-loss program completed the self-efficacy scale, as well as measures of PA, dietary fat intake, and weight, at baseline, 6-, and 12-months. The hypothesized factor structure was tested through confirmatory factor analysis, which supported the expected factor structure for three latent self-efficacy factors, specific to PA, healthful eating, and weight-loss. Measurement equivalence/invariance between relevant demographic groups, and over time was also supported. Parallel growth processes in self-efficacy factors and outcomes (PA, fat intake, and weight) support the predictive validity of score interpretations. Overall, this initial series of psychometric analyses supports the interpretation that scores on these scales reflect self-efficacy for PA, healthful eating, and weight-loss. The use of this instrument in large-scale weight-loss trials is encouraged. PMID:26619093
Clay, Ryan D; Lee, Elizabeth C; Kurtzman, Marc F; Dversdal, Renee K
2016-12-01
A growing body of evidence supports the use of bedside ultrasound for core Internal Medicine procedures and increasingly as augmentation of the physical exam. The literature also supports that trainees, both medical students and residents, can acquire these skills. However, there is no consensus on training approach. To implement and study the effectiveness of a high-yield and expedited curriculum to train internal medicine interns to use bedside ultrasound for physical examination and procedures. The study was conducted at a metropolitan, academic medical center and included 33 Internal Medicine interns. This was a prospective cohort study of a new educational intervention consisting of a single-day intensive bedside ultrasound workshop followed by two optional hour-long workshops later in the year. The investigation was conducted at Oregon Health & Science University in Portland, Oregon. The intensive day consisted of alternating didactic sessions with small group hands-on ultrasound practice sessions and ultrasound simulations. A 30-question assessment was used to assess ultrasound interpretation knowledge prior to, immediately post, and 6 months post intervention. Thirty-three interns served as their own historical controls. Assessment performance significantly increased after the intervention from a mean pre-test score of 18.3 (60.9 % correct) to a mean post-test score 25.5 (85.0 % correct), P value of <0.0001. This performance remained significantly better at 6 months with a mean score of 23.8 (79.3 % correct), P value <0.0001. There was significant knowledge attrition compared to the immediate post-assessment, P value 0.0099. A single-day ultrasound training session followed by two optional noon conference sessions yielded significantly improved ultrasound interpretation skills in internal medicine interns.
Edmonds, Lisa A; Donovan, Neila J
2014-06-01
Virtually no valid materials are available to evaluate confrontation naming in Spanish-English bilingual adults in the U.S. In a recent study, a large group of young Spanish-English bilingual adults were evaluated on An Object and Action Naming Battery (Edmonds & Donovan in Journal of Speech, Language, and Hearing Research 55:359-381, 2012). Rasch analyses of the responses resulted in evidence for the content and construct validity of the retained items. However, the scope of that study did not allow for extensive examination of individual item characteristics, group analyses of participants, or the provision of testing and scoring materials or raw data, thereby limiting the ability of researchers to administer the test to Spanish-English bilinguals and to score the items with confidence. In this study, we present the in-depth information described above on the basis of further analyses, including (1) online searchable spreadsheets with extensive empirical (e.g., accuracy and name agreeability) and psycholinguistic item statistics; (2) answer sheets and instructions for scoring and interpreting the responses to the Rasch items; (3) tables of alternative correct responses for English and Spanish; (4) ability strata determined for all naming conditions (English and Spanish nouns and verbs); and (5) comparisons of accuracy across proficiency groups (i.e., Spanish dominant, English dominant, and balanced). These data indicate that the Rasch items from An Object and Action Naming Battery are valid and sensitive for the evaluation of naming in young Spanish-English bilingual adults. Additional information based on participant responses for all of the items on the battery can provide researchers with valuable information to aid in stimulus development and response interpretation for experimental studies in this population.
Changiz, Tahereh; Haghani, Fariba; Nowroozi, Nasim
2013-01-01
Appropriate instructional design plays a crucial role in e-learning success, and analyzing learners is the cornerstone for instructional design process. Students' readiness for e-learning was assessed in the present study as an example of learner analysis for a distance course in medical education master program. A census sample of 23 students applied for distance master program on medical education, completed the "Students' E-Learning Readiness Scale" developed by Watkins, via email. The reliability and validity of the scale has been confirmed before. Average scores in total and 6 subscales were calculated. The score range was 1-5 and scores above 3 indicated good readiness. Data was interpreted using descriptive and non-parametric tests (Mann-Whitney U and Kruskal-Wallis). Response rate was 100%. The students' readiness scores in total and all subscales ("technology access", "online skills and relationships", "motivation", "online audio/video", "readiness for online discussions", and "importance of e-learning to your success") were above 3. Comparing different subscales, students' mean scores in "motivation" and "internet discussion" subscales were less than others, although the difference was not significant. There were no significant gender differences in the readiness scores. Students who were academic staff had significantly higher scores than others in total and in "motivation" and "online skills and relationship" subscales. Good learners' readiness, observed in the present study, may imply that the instructional designer can rely on e-learning strategies and build the course upon them. However, according to the slightly lower scores in "motivation" and "online discussion" subscales, it is recommended to stress more on strategies that improve these two components. To generalize the results, it is needed to test students' readiness in more different degree programs.
Prober, Charles G; Kolars, Joseph C; First, Lewis R; Melnick, Donald E
2016-01-01
The three-step United States Medical Licensing Examination (USMLE) was developed by the National Board of Medical Examiners and the Federation of State Medical Boards to provide medical licensing authorities a uniform evaluation system on which to base licensure. The test results appear to be a good measure of content knowledge and a reasonable predictor of performance on subsequent in-training and certification exams. Nonetheless, it is disconcerting that the test preoccupies so much of students' attention with attendant substantial costs (in time and money) and mental and emotional anguish. There is an increasingly pervasive practice of using the USMLE score, especially the Step 1 component, to screen applicants for residency. This is despite the fact that the test was not designed to be a primary determinant of the likelihood of success in residency. Further, relying on Step 1 scores to filter large numbers of applications has unintended consequences for students and undergraduate medical education curricula. There are many other factors likely to be equally or more predictable of performance during residency. The authors strongly recommend a move away from using test scores alone in the applicant screening process and toward a more holistic evaluation of the skills, attributes, and behaviors sought in future health care providers. They urge more rigorous study of the characteristics of students that predict success in residency, better assessment tools for competencies beyond those assessed by Step 1 that are relevant to success, and nationally comparable measures from those assessments that are easy to interpret and apply.
Affirmative Psychological Testing and Neurocognitive Assessment with Transgender Adults.
Keo-Meier, Colton L; Fitzgerald, Kara M
2017-03-01
Neither consensus on best practice nor validated neuropsychological, intelligence, or personality testing batteries exist for assessment and psychological testing on the transgender population. Historically, assessment has been used in a gate-keeping fashion with transgender clients. There are no firm standards of care when considering the content and appropriateness of evaluations conducted presurgically. These evaluations are discussed in the setting of other presurgical evaluations, with a recommendation to move toward a competency to make a medical decisions model. Additional considerations are discussed, such as effects of transition on mood and how to interpret scores in a field where normative data are often gender stratified. Copyright © 2016 Elsevier Inc. All rights reserved.
Vingerhoets, Johan; Nijs, Steven; Tambuyzer, Lotke; Hoogstoel, Annemie; Anderson, David; Picchio, Gaston
2012-01-01
The aims of this study were to compare various genotypic scoring systems commonly used to predict virological outcome to etravirine, and examine their concordance with etravirine phenotypic susceptibility. Six etravirine genotypic scoring systems were assessed: Tibotec 2010 (based on 20 mutations; TBT 20), Monogram, Stanford HIVdb, ANRS, Rega (based on 37, 30, 27 and 49 mutations, respectively) and virco(®)TYPE HIV-1 (predicted fold change based on genotype). Samples from treatment-experienced patients who participated in the DUET trials and with both genotypic and phenotypic data (n=403) were assessed using each scoring system. Results were retrospectively correlated with virological response in DUET. κ coefficients were calculated to estimate the degree of correlation between the different scoring systems. Correlation between the five scoring systems and the TBT 20 system was approximately 90%. Virological response by etravirine susceptibility was comparable regardless of which scoring system was utilized, with 70-74% of DUET patients determined as susceptible to etravirine by the different scoring systems achieving plasma viral load <50 HIV-1 RNA copies/ml. In samples classed as phenotypically susceptible to etravirine (fold change in 50% effective concentration ≤3), correlations with genotypic score were consistently high across scoring systems (≥70%). In general, the etravirine genotypic scoring systems produced similar results, and genotype-phenotype concordance was high. As such, phenotypic interpretations, and in their absence all genotypic scoring systems investigated, may be used to reliably predict the activity of etravirine.
The assessment and interpretation of Demirjian, Goldstein and Tanner's dental maturity.
Liversidge, Helen M
2012-09-01
A frequently reported advancement in dental maturity compared with the 50(th) percentile of Demirjian, Goldstein and Tanner (1973, Hum Biol 45:211-27) has been interpreted as a population difference. To review the assessment and interpretation of Demirjian et al.'s dental maturity. Dental maturity of boys from published reports was compared as maturity curves and difference to the 50(th) percentile in terms of chronological age and score. Dental maturity, as well as maturity of individual teeth, was compared in the fastest and slowest maturing groups of boys from the Chaillet database. Maturity curves from published reports by age category were broadly similar and differences occurred at the steepest part of the curve. These reduced when expressed as score rather than age. Many studies report a higher than expected score for chronological age and the database contained more than expected children with scores>97(th) percentile. Revised scores for chronological age from this database were calculated (4072 males, 3958 females, aged 2.1-17.9). Most published reports were similar to the database smoothed maturity curve. This method of dental maturity is designed to assess maturity for a single child and is unsuitable to compare groups.
Nolte, Sandra; Elsworth, Gerald R; Sinclair, Andrew J; Osborne, Richard H
2012-04-01
Program evaluations are frequently based on 'then-test' data, i.e., pre-test collected in retrospect. While the application of the then-test has practical advantages, little is known about the validity of then-test data. Because of the collection of then-test in close proximity to post-test questions, this study was aimed at exploring whether the presence of then-test questions in post-test questionnaires influenced subjects' responses to post-test. To test the influence of then-test questions, we designed a randomized three-group study in the context of chronic disease self-management programs. Interventions had comparable goals and philosophies, and all 949 study participants filled out identical Health Education Impact Questionnaires (heiQ) at pre-test. At post-test, participants were then randomized to one of the following three groups: Group A responded to post-test questions only (n = 331); Group B filled out transition questions in addition to post-test (n = 304); and Group C filled out then-test questions in addition to post-test (n = 314). Significant post-test differences were found in six of eight heiQ scales, with respondents who filled out then-test questions reporting significantly higher post-test scores than respondents of the other groups. This study provides evidence that the inclusion of then-test questions alters post-test responses, suggesting that change scores based on then-test data be interpreted with care.
Campbell, W. Scott; Lyden, Elizabeth; Van Schooneveld, Trevor C.
2017-01-01
ABSTRACT Rapid pathogen identification can alter antibiotic prescribing practices if interpreted correctly. Microbiology reporting can be difficult to understand, and new technology has made it more challenging. Nebraska Medicine recently implemented the BioFire FilmArray blood culture identification panel (BCID) coupled with stewardship-based education on interpretation. Physician BCID result interpretation and prescribing were assessed via an electronic survey, with a response rate of 40.8% (156/382 surveys). Seven questions required respondents to interpret BCID results, identify the most likely pathogen, and then choose therapy based on the results. The tallied correct responses resulted in a knowledge score. General linear models evaluated the effect of role, specialty, and utilization of the BCID interpretation guide on the mean knowledge score. The specialties of the respondents included 55.7% internal medicine, 19.7% family medicine, and 24.6% other. Roles included 41.1% residents, 5.0% fellows, and 53.9% faculty. Most reported that they reviewed antimicrobial susceptibility results (89.4%) and adjusted therapy accordingly (81.6%), while only 60% stated that they adjusted therapy based on BCID results. The correct response rates ranged from 52 to 86% for the interpretation questions. The most common errors included misinterpretation of Enterobacteriaceae and Staphylococcus genus results. Neither role nor specialty was associated with total knowledge score in multivariate analysis (P = 0.13 and 0.47, respectively). In conclusion, physician interpretation of BCID results is suboptimal and can result in ineffective treatment or missed opportunity to narrow therapy. With the implementation of new technology, improved reporting practices of BCID results with clinical decision support tools providing interpretation guidance available at the point of care is recommended. PMID:28250000
Richman, Susan D; Fairley, Jennifer; Butler, Rachel; Deans, Zandra C
2017-12-01
Evidence strongly indicates that extended RAS testing should be undertaken in mCRC patients, prior to prescribing anti-EGFR therapies. With more laboratories implementing testing, the requirement for External Quality Assurance schemes increases, thus ensuring high standards of molecular analysis. Data was analysed from 15 United Kingdom National External Quality Assessment Service (UK NEQAS) for Molecular Genetics Colorectal cancer external quality assurance (EQA) schemes, delivered between 2009 and 2016. Laboratories were provided annually with nine colorectal tumour samples for genotyping. Information on methodology and extent of testing coverage was requested, and scores given for genotyping, interpretation and clerical accuracy. There has been a sixfold increase in laboratory participation (18 in 2009 to 108 in 2016). For RAS genotyping, fewer laboratories now use Roche cobas®, pyrosequencing and Sanger sequencing, with more moving to next generation sequencing (NGS). NGS is the most commonly employed technology for BRAF and PIK3CA mutation screening. KRAS genotyping errors were seen in ≤10% laboratories, until the 2014-2015 scheme, when there was an increase to 16.7%, corresponding to a large increase in scheme participants. NRAS genotyping errors peaked at 25.6% in the first 2015-2016 scheme but subsequently dropped to below 5%. Interpretation and clerical accuracy scores have been consistently good throughout. Within this EQA scheme, we have observed that the quality of molecular analysis for colorectal cancer has continued to improve, despite changes in the required targets, the volume of testing and the technologies employed. It is reassuring to know that laboratories clearly recognise the importance of participating in EQA schemes.
Monheit, Gary D; Gendler, Ellen C; Poff, Bradley; Fleming, Laura; Bachtell, Nathan; Garcia, Emily; Burkholder, David
2010-11-01
Various scoring techniques prone to subjective interpretation have been used to evaluate soft tissue augmentation of nasolabial folds (NLFs). To design and validate a reliable wrinkle assessment scoring scale. Six photographed wrinkles of varying severity were electronically copied onto the same facial image to become a 6-point grading scale (GGS). A pilot training program (13 investigators) determined reliability, and a 12-week multicenter survey study validated the GGS scoring method. Pilot study inter- and intrarater scoring reliability were high (weighted kappa scores of 0.85 and 0.86, respectively). Seventy-five percent of survey investigators and independent review panel (IRP) members considered a GGS score difference of 0.5 to be a minimally perceivable difference. Interrater weighted kappa scores were 0.91 for the IRP and 0.80 for investigators. Intrarater agreements after repeat testing were 0.91 and 0.89, respectively. The baseline "live" assessment GGS mean score was 3.34, and the baseline blinded photographic assessment GGS mean score was 2.00 for the IRP and 2.16 for the investigators. The GGS is a reproducible method of grading the severity of NLF wrinkles. Treatment effectiveness of a dermal filler can be reliably evaluated using the GGS by comparing "live" assessments with the standard GGS photographic panel. © 2010 by the American Society for Dermatologic Surgery, Inc.
Factor structure of the functional movement screen in marine officer candidates.
Kazman, Josh B; Galecki, Jeffrey M; Lisman, Peter; Deuster, Patricia A; OʼConnor, Francis G
2014-03-01
Functional movement screening (FMS) is a musculoskeletal assessment that is intended to fill a gap between preparticipation examinations and performance tests. Functional movement screening consists of 7 standardized movements involving multiple muscle groups that are rated 0-3 during performance; scores are combined into a final score, which is intended to predict injury risk. This use of a sum-score in this manner assumes that the items are unidimensional and scores are internally consistent, which are measures of internal reliability. Despite research into the FMS' predictive value and interrater reliability, research has not assessed its psychometric properties. The present study is a standard psychometric analysis of the FMS and is the first to assess the internal consistency and factor structure of the FMS, using Cronbach's alpha and exploratory factor analysis (EFA). Using a cohort of 877 male and 57 female Marine officer candidates who performed the FMS, EFA of polychoric correlations with varimax rotation was conducted to explore the structure of the FMS. Tests were repeated on the original scores, which integrated feelings of pain during movement (0-3), and then on scores discounting the pain instruction and based only on the performance (1-3), to determine whether pain ratings affected the factor structure. The average FMS score was 16.7 ± 1.8. Cronbach's alpha was 0.39. Exploratory factor analysis availed 2 components accounting for 21 and 17% and consisting of separate individual movements (shoulder mobility and deep squat, respectively). Analysis on scores discounting pain showed similar results. The factor structures were not interpretable, and the low Cronbach's alpha suggests a lack of internal consistency in FMS sum scores. Results do not offer support for validity of the FMS sum score as a unidimensional construct. In the absence of additional psychometric research, caution is warranted when using the FMS sum score.
Hawkins, Keith A; Cromer, Jennifer R; Piotrowski, Andrea S; Pearlson, Godfrey D
2011-11-01
The Mini-Mental State Exam (MMSE) is a clinically ubiquitous yet incompletely standardized instrument. Though the test offers considerable examiner leeway, little data exist on the normative consequences of common administration variations. We sought to: (a) determine the effects of education, age, gender, health status, and a common administration variation (serial 7s subtraction vs. "world" spelled backward) on MMSE score within a minority sample, (b) provide normative data stratified on the most empirically relevant bases, and (c) briefly address item failure rates. African American citizens (N = 298) aged 55-87 living independently in the community were recruited by advertisement, community recruitment, and word of mouth. Total score with "world" spelled backward exceeded total score with serial 7s subtraction across all levels of education, replicating findings in Caucasian samples. Education is the primary source of variance on MMSE score, followed by age. In this cohort, women out-performed men when "world" spelled backward was included, but there was no gender effect when serial 7s subtraction was included in MMSE total score. To ensure an appropriate interpretation of MMSE scores, reports, whether clinical or in publications of research findings, should be explicit regarding the administration method. Stratified normative data are provided.
Jenkinson, C; Mant, J; Carter, J; Wade, D; Winner, S
2000-03-01
To assess the validity of the London handicap scale (LHS) using a simple unweighted scoring system compared with traditional weighted scoring 323 patients admitted to hospital with acute stroke were followed up by interview 6 months after their stroke as part of a trial looking at the impact of a family support organiser. Outcome measures included the six item LHS, the Dartmouth COOP charts, the Frenchay activities index, the Barthel index, and the hospital anxiety and depression scale. Patients' handicap score was calculated both using the standard procedure (with weighting) for the LHS, and using a simple summation procedure without weighting (U-LHS). Construct validity of both LHS and U-LHS was assessed by testing their correlations with the other outcome measures. Cronbach's alpha for the LHS was 0.83. The U-LHS was highly correlated with the LHS (r=0.98). Correlation of U-LHS with the other outcome measures gave very similar results to correlation of LHS with these measures. Simple summation scoring of the LHS does not lead to any change in the measurement properties of the instrument compared with standard weighted scoring. Unweighted scores are easier to calculate and interpret, so it is recommended that these are used.
Validation of the Danish Addenbrooke's Cognitive Examination as a screening test in a memory clinic.
Stokholm, Jette; Vogel, Asmus; Johannsen, Peter; Waldemar, Gunhild
2009-01-01
Addenbrooke's Cognitive Examination (ACE) is a cognitive screening test developed to detect dementia. It has been validated in several countries. Validation studies have predominantly included patients with various degrees of dementia and healthy controls. The aim of this study was to evaluate the Danish version of ACE as a screening test for early dementia in an outpatient memory clinic. Further, we wanted to investigate the ability of the ACE to discriminate patients with early Alzheimer's disease (AD) from patients with depression. 78 patients with mild AD (MMSE >or=20), 30 non-demented patients diagnosed with depression (originally referred for evaluation of cognitive symptoms), and 63 healthy volunteers, all between 60 and 85 years of age, were included. All patients were given the ACE as a supplement to the standard diagnostic work-up. The cut-off points for optimal trade-off between sensitivity and specificity for ACE were 85/86 (sensitivity 0.99, specificity 0.94). When these cut-off points were applied to the group of depressive patients, the specificity dropped to 0.64, indicating a great overlap in individual test scores for demented and depressed patients. The optimal cut-off points for ACE found in this Danish study were close to what is reported in most other European studies. The great overlap in ACE scores for demented and depressed patients emphasize that test scores must be interpreted with great caution when used in diagnostic work-up.
Munoz, Alexis R; Salsman, John M; Stein, Kevin D; Cella, David
2015-06-01
Health-related quality of life measures are common in oncology research, trials, and practice. Spiritual well-being has emerged as an important aspect of health-related quality of life and the Functional Assessment of Chronic Illness Therapy-Spiritual Well-Being; The 12-item Spiritual Well-Being Scale (FACIT-Sp-12) is the most widely used measure of spiritual well-being among those with cancer. However, there is an absence of reference values with which to facilitate the interpretation of scores in research and clinical practice. The objective of the current study was to provide FACIT-Sp-12 reference values from a representative sample of adult cancer survivors. As part of the American Cancer Society's Study of Cancer Survivors-II, a national cross-sectional study of cancer survivors (8864 survivors) completed questionnaires assessing demographic characteristics, clinical information, and the FACIT-Sp-12. Scores were calculated and summarized by FACIT-Sp-12 subscale and total scores across age, sex, race/ethnicity, time after treatment, and cancer type. Student t tests for independent samples found that women reported significantly higher FACIT-Sp-12 scores (P<.001). Analyses of variance found significant main effects for FACIT-Sp-12 scores by age (P<.01), race/ethnicity (P<.05), and cancer type (P<.001). Post hoc comparisons revealed that older adults (those aged 60-69 years and 70-79 years) and black non-Hispanic individuals reported the highest FACIT-Sp-12 scores compared with those aged 18 to 39 years (P<.05; Cohen d [an effect size used to indicate the standardized difference between 2 means], 0.20-0.50) and white non-Hispanic individuals (P<.05; Cohen d, 0.02-0.62), respectively. All other significant main effects were small in magnitude (effect size range, 0.001-0.032). These data will aid in the interpretation of the magnitude and meaning of FACIT-Sp-12 scores, and allow for comparisons of scores across studies. © 2015 American Cancer Society.
ERIC Educational Resources Information Center
Laird, Robert D.; Weems, Carl F.
2011-01-01
Research on informant discrepancies has increasingly utilized difference scores. This article demonstrates the statistical equivalence of regression models using difference scores (raw or standardized) and regression models using separate scores for each informant to show that interpretations should be consistent with both models. First,…
ERIC Educational Resources Information Center
Dorans, Neil J.
2002-01-01
The history of SAT® score scales is summarized, and the need for realigning SAT score scales is demonstrated. The process employed to produce the conversions that take scores from the original SAT scales to recentered scales in which reference group scores are centered near the midpoint of the score-reporting range is laid out. For the purposes of…
Nielsen, Anne Molgaard; Vach, Werner; Kent, Peter; Hestbaek, Lise; Kongsted, Alice
2016-01-01
Latent class analysis (LCA) is increasingly being used in health research, but optimal approaches to handling complex clinical data are unclear. One issue is that commonly used questionnaires are multidimensional, but expressed as summary scores. Using the example of low back pain (LBP), the aim of this study was to explore and descriptively compare the application of LCA when using questionnaire summary scores and when using single items to subgrouping of patients based on multidimensional data. Baseline data from 928 LBP patients in an observational study were classified into four health domains (psychology, pain, activity, and participation) using the World Health Organization's International Classification of Functioning, Disability, and Health framework. LCA was performed within each health domain using the strategies of summary-score and single-item analyses. The resulting subgroups were descriptively compared using statistical measures and clinical interpretability. For each health domain, the preferred model solution ranged from five to seven subgroups for the summary-score strategy and seven to eight subgroups for the single-item strategy. There was considerable overlap between the results of the two strategies, indicating that they were reflecting the same underlying data structure. However, in three of the four health domains, the single-item strategy resulted in a more nuanced description, in terms of more subgroups and more distinct clinical characteristics. In these data, application of both the summary-score strategy and the single-item strategy in the LCA subgrouping resulted in clinically interpretable subgroups, but the single-item strategy generally revealed more distinguishing characteristics. These results 1) warrant further analyses in other data sets to determine the consistency of this finding, and 2) warrant investigation in longitudinal data to test whether the finer detail provided by the single-item strategy results in improved prediction of outcomes and treatment response.
Text-interpreter language for flexible generation of patient notes and instructions.
Forker, T S
1992-01-01
An interpreted computer language has been developed along with a windowed user interface and multi-printer-support formatter to allow preparation of documentation of patient visits, including progress notes, prescriptions, excuses for work/school, outpatient laboratory requisitions, and patient instructions. Input is by trackball or mouse with little or no keyboard skill required. For clinical problems with specific protocols, the clinician can be prompted with problem-specific items of history, exam, and lab data to be gathered and documented. The language implements a number of text-related commands as well as branching logic and arithmetic commands. In addition to generating text, it is simple to implement arithmetic calculations such as weight-specific drug dosages; multiple branching decision-support protocols for paramedical personnel (or physicians); and calculation of clinical scores (e.g., coma or trauma scores) while simultaneously documenting the status of each component of the score. ASCII text files produced by the interpreter are available for computerized quality audit. Interpreter instructions are contained in text files users can customize with any text editor.
The Effects of Using Different Procedures to Score Maze Measures
ERIC Educational Resources Information Center
Pierce, Rebecca L.; McMaster, Kristen L.; Deno, Stanley L.
2010-01-01
The purpose of this study was to examine how different scoring procedures affect interpretation of maze curriculum-based measurements. Fall and spring data were collected from 199 students receiving supplemental reading instruction. Maze probes were scored first by counting all correct maze choices, followed by four scoring variations designed to…
Horiuchi, Yuki; Tabe, Yoko; Idei, Mayumi; Bengtsson, Hans-Inge; Ishii, Kiyoshi; Horii, Takashi; Miyake, Kazunori; Satoh, Naotake; Miida, Takashi; Ohsaka, Akimichi
2011-07-01
Quality assessment of blood cell morphological testing, such as white blood cell (WBC) differential and its interpretation, is one of the most important and difficult assignments in haematology laboratories. A monthly survey was performed to assess the possible role of the proficiency testing program produced by CellaVision competency software (CCS) in external quality assessment (EQA) of the clinical laboratories of affiliated university hospitals and the effective utilisation of this program in continuing professional development (CPD). Four monthly proficiency surveys were conducted in collaboration with four clinical laboratories affiliated with the teaching hospitals of Juntendo University of Medicine in Japan. EQA results by the CCS proficiency testing program revealed a difference of performance levels of WBC differential and morphological interpretation and a discrepancy in the WBC differential criteria among laboratories. With regard to the utilisation of this proficiency program as a tool for CPD, this program successfully improved the performance of the low-scoring laboratories and less experienced individuals. The CCS proficiency testing program was useful for the quality assessment of laboratory performance, for education, and for the storage and distribution of cell images to be utilised for further standardisation and education.
[Interpreting change scores of the Behavioural Rating Scale for Geriatric Inpatients (GIP)].
Diesfeldt, H F A
2013-09-01
The Behavioural Rating Scale for Geriatric Inpatients (GIP) consists of fourteen, Rasch modelled subscales, each measuring different aspects of behavioural, cognitive and affective disturbances in elderly patients. Four additional measures are derived from the GIP: care dependency, apathy, cognition and affect. The objective of the study was to determine the reproducibility of the 18 measures. A convenience sample of 56 patients in psychogeriatric day care was assessed twice by the same observer (a professional caregiver). The median time interval between rating occasions was 45 days (interquartile range 34-58 days). Reproducibility was determined by calculating intraclass correlation coefficients (ICC agreement) for test-retest reliability. The minimal detectable difference (MDD) was calculated based on the standard error of measurement (SEM agreement). Test-retest reliability expressed by the ICCs varied from 0.57 (incoherent behaviour) to 0.93 (anxious behaviour). Standard errors of measurement varied from 0.28 (anxious behaviour) to 1.63 (care dependency). The results show how the GIP can be applied when interpreting individual change in psychogeriatric day care participants.
2003-11-01
decreases in standardized test scores Better problem solving and planning More use of higher level reasoning strategies; improved non-verbal reasoning... data were somewhat controversial in terms of interpretation, using a longitudinal data set enabled them to show considerable stability as a result of...cognitive behavioral programs that actually increase the attractiveness of staying. I mean you can do that structurally , through relationships. You can do
Dingwall, Kylie M; Pinkerton, Jennifer; Lindeman, Melissa A
2013-01-31
Achieving culturally fair assessments of cognitive functioning for Aboriginal people is difficult due to a scarcity of appropriately validated tools for use with this group. As a result, some Aboriginal people with cognitive impairments may lack fair and equitable access to services. The objective of this study was to examine current clinical practice in the Northern Territory regarding cognitive assessment for Aboriginal people thereby providing some guidance for clinicians new to this practice setting. Qualitative enquiry was used to describe practice context, reasons for assessment, and current practices in assessing cognition for Aboriginal Australians. Semi-structured interviews were conducted with 22 clinicians working with Aboriginal clients in central and northern Australia. Results pertaining to assessment methods are reported. A range of standardised tests were utilised with little consistency across clinical practice. Nevertheless, it was recognised that such tests bear severe limitations, requiring some modification and significant caution in their interpretation. Clinicians relied heavily on informal assessment or observations, contextual information and clinical judgement. Cognitive tests developed specifically for Aboriginal people are urgently needed. In the absence of appropriate, validated tests, clinicians have relied on and modified a range of standardised and informal assessments, whilst recognising the severe limitations of these. Past clinical training has not prepared clinicians adequately for assessing Aboriginal clients, and experience and clinical judgment were considered crucial for fair interpretation of test scores. Interpretation guidelines may assist inexperienced clinicians to consider whether they are achieving fair assessments of cognition for Aboriginal clients.
Testing the Benefits of Neurofeedback on Selective Attention Measured Through Dichotic Listening.
Gadea, Marien; Aliño, Marta; Garijo, Evelio; Espert, Raul; Salvador, Alicia
2016-06-01
The electrophysiological changes after a single session of neurofeedback training (↑SMR/↓Theta) and its effects on executive attention during a dichotic listening test with forced attentional procedures were measured in a sample of 20 healthy women. A pre-post moment test double blind design, with the inclusion of a group receiving sham neurofeedback, allowed for minimization of alien influences. The interaction of Moment × Group was significant, indicating an enhancement of SMR band after the real neurofeedback. The dichotic listening scores were correlated with the amplitude of Beta band in baseline conditions. The performance on the forced left attentional condition in dichotic listening was significantly improved and correlated positively with the post-training enhancement of the SMR band. The sham neurofeedback group also improved DL scores, so a clear affirmation about the benefits of neurofeedback training over cognitive performance could not be unambiguously established. It is concluded that the protocol showed a good independence and acceptable trainability in modifying the EEG results, but there was limited interpretability regarding cognitive outcomes.
NASA Astrophysics Data System (ADS)
Ariffin, A.; Samsudin, M. A.; Zain, A. N. Md.; Hamzah, N.; Ismail, M. E.
2017-05-01
The Engineering Drawing subject develops skills in geometry drawing becoming more professional. For the concept in Engineering Drawing, students need to have good visualization skills. Visualization is needed to help students get a start before translating into a drawing. So that, Problem Based Learning (PBL) using animation mode (PBL-A) and graphics mode (PBL-G) will be implemented in class. Problem-solving process is repeatedly able to help students interpret engineering drawings step work correctly and accurately. This study examined the effects of PBL-A online and PBL-G online on visualization skills of students in polytechnics. Sixty eight mechanical engineering students have been involved in this study. The visualization test adapted from Bennett, Seashore and Wesman was used in this study. Results showed significant differences in mean scores post-test of visualization skills among the students enrolled in PBL-G with the group of students who attended PBL-A online after effects of pre-test mean score is controlled. Therefore, the effects of animation modes have a positive impact on increasing students’ visualization skills.
Infant polysomnography: reliability and validity of infant arousal assessment.
Crowell, David H; Kulp, Thomas D; Kapuniai, Linda E; Hunt, Carl E; Brooks, Lee J; Weese-Mayer, Debra E; Silvestri, Jean; Ward, Sally Davidson; Corwin, Michael; Tinsley, Larry; Peucker, Mark
2002-10-01
Infant arousal scoring based on the Atlas Task Force definition of transient EEG arousal was evaluated to determine (1). whether transient arousals can be identified and assessed reliably in infants and (2). whether arousal and no-arousal epochs scored previously by trained raters can be validated reliably by independent sleep experts. Phase I for inter- and intrarater reliability scoring was based on two datasets of sleep epochs selected randomly from nocturnal polysomnograms of healthy full-term, preterm, idiopathic apparent life-threatening event cases, and siblings of Sudden Infant Death Syndrome infants of 35 to 64 weeks postconceptional age. After training, test set 1 reliability was assessed and discrepancies identified. After retraining, test set 2 was scored by the same raters to determine interrater reliability. Later, three raters from the trained group rescored test set 2 to assess inter- and intrarater reliabilities. Interrater and intrarater reliability kappa's, with 95% confidence intervals, ranged from substantial to almost perfect levels of agreement. Interrater reliabilities for spontaneous arousals were initially moderate and then substantial. During the validation phase, 315 previously scored epochs were presented to four sleep experts to rate as containing arousal or no-arousal events. Interrater expert agreements were diverse and considered as noninterpretable. Concordance in sleep experts' agreements, based on identification of the previously sampled arousal and no-arousal epochs, was used as a secondary evaluative technique. Results showed agreement by two or more experts on 86% of the Collaborative Home Infant Monitoring Evaluation Study arousal scored events. Conversely, only 1% of the Collaborative Home Infant Monitoring Evaluation Study-scored no-arousal epochs were rated as an arousal. In summary, this study presents an empirically tested model with procedures and criteria for attaining improved reliability in transient EEG arousal assessments in infants using the modified Atlas Task Force standards. With training based on specific criteria, substantial inter- and intrarater agreement in identifying infant arousals was demonstrated. Corroborative validation results were too disparate for meaningful interpretation. Alternate evaluation based on concordance agreements supports reliance on infant EEG criteria for assessment. Results mandate additional confirmatory validation studies with specific training on infant EEG arousal assessment criteria.
Cheating in OSCEs: The Impact of Simulated Security Breaches on OSCE Performance.
Gotzmann, Andrea; De Champlain, André; Homayra, Fahmida; Fotheringham, Alexa; de Vries, Ingrid; Forgie, Melissa; Pugh, Debra
2017-01-01
Construct: Valid score interpretation is important for constructs in performance assessments such as objective structured clinical examinations (OSCEs). An OSCE is a type of performance assessment in which a series of standardized patients interact with the student or candidate who is scored by either the standardized patient or a physician examiner. In high-stakes examinations, test security is an important issue. Students accessing unauthorized test materials can create an unfair advantage and lead to examination scores that do not reflect students' true ability level. The purpose of this study was to assess the impact of various simulated security breaches on OSCE scores. Seventy-six 3rd-year medical students participated in an 8-station OSCE and were randomized to either a control group or to 1 of 2 experimental conditions simulating test security breaches: station topic (i.e., providing a list of station topics prior to the examination) or egregious security breach (i.e., providing detailed content information prior to the examination). Overall total scores were compared for the 3 groups using both a one-way between-subjects analysis of variance and a repeated measure analysis of variance to compare the checklist, rating scales, and oral question subscores across the three conditions. Overall total scores were highest for the egregious security breach condition (81.8%), followed by the station topic condition (73.6%), and they were lowest for the control group (67.4%). This trend was also found with checklist subscores only (79.1%, 64.9%, and 60.3%, respectively for the security breach, station topic, and control conditions). Rating scale subscores were higher for both the station topic and egregious security breach conditions compared to the control group (82.6%, 83.1%, and 77.6%, respectively). Oral question subscores were significantly higher for the egregious security breach condition (88.8%) followed by the station topic condition (64.3%), and they were the lowest for the control group (48.6%). This simulation of different OSCE security breaches demonstrated that student performance is greatly advantaged by having prior access to test materials. This has important implications for medical educators as they develop policies and procedures regarding the safeguarding and reuse of test content.
Chin, Esther Y; Nelson, Lindsay D; Barr, William B; McCrory, Paul; McCrea, Michael A
2016-09-01
The Sport Concussion Assessment Tool-3 (SCAT3) facilitates sideline clinical assessments of concussed athletes. Yet, there is little published research on clinically relevant metrics for the SCAT3 as a whole. We documented the psychometric properties of the major SCAT3 components (symptoms, cognition, balance) and derived clinical decision criteria (ie, reliable change score cutoffs and normative conversation tables) for clinicians to apply to cases with and without available preinjury baseline data. Cohort study (diagnosis); Level of evidence, 2. High school and collegiate athletes (N = 2018) completed preseason baseline evaluations including the SCAT3. Re-evaluations of 166 injured athletes and 164 noninjured controls were performed within 24 hours of injury and at 8, 15, and 45 days after injury. Analyses focused on predictors of baseline performance, test-retest reliability, and sensitivity and specificity of the SCAT3 using either single postinjury cutoffs or reliable change index (RCI) criteria derived from this sample. Athlete sex, level of competition, attention-deficit/hyperactivity disorder (ADHD), learning disability (LD), and estimated verbal intellectual ability (but not concussion history) were associated with baseline scores on ≥1 SCAT3 components (small to moderate effect sizes). Female sex, high school level of competition (vs college), and ADHD were associated with higher baseline symptom ratings (d = 0.25-0.32). Male sex, ADHD, and LD were associated with lower baseline Standardized Assessment of Concussion (SAC) scores (d = 0.28-0.68). Male sex, high school level of competition, ADHD, and LD were associated with poorer baseline Balance Error Scoring System (BESS) performance (d = 0.14-0.26). After injury, the symptom checklist manifested the largest effect size at the 24-hour assessment (d = 1.52), with group differences diminished but statistically significant at day 8 (d = 0.39) and nonsignificant at day 15. Effect sizes for the SAC and BESS were small to moderate at 24 hours (SAC: d = -0.36; modified BESS: d = 0.46; full BESS: d = 0.51) and became nonsignificant at day 8 (SAC) and day 15 (BESS). Receiver operating characteristic curve analyses demonstrated a stronger discrimination for symptoms (area under the curve [AUC] = 0.86) than cognitive and balance measures (AUCs = 0.58 and 0.62, respectively), with comparable discrimination of each SCAT3 component using postinjury scores alone versus baseline-adjusted scores (P = .71-.90). Normative conversion tables and RCI criteria were created to facilitate the use of the SCAT3 both with and without baseline test results. Individual predictors should be taken into account when interpreting the SCAT3. The normative conversion tables and RCIs presented can be used to help interpret concussed athletes' performance both with and without baseline data, given the comparability of the 2 interpretative approaches. © 2016 The Author(s).
Development of the Communication Complexity Scale
Brady, Nancy C.; Fleming, Kandace; Thiemann-Bourque, Kathy; Olswang, Lesley; Dowden, Patricia; Saunders, Muriel D.
2011-01-01
Accurate description of an individual's communication status is critical in both research and practice. Describing the communication status of individuals with severe intellectual and developmental disabilities is difficult because these individuals often communicate with presymbolic means that may not be readily recognized. Our goal was to design a communication scale and summary score for interpretation that could be applied across populations of children and adults with limited (often presymbolic) communication forms. Methods The Communication Complexity Scale (CCS) was developed by a team of researchers and tested with 178 participants with varying levels of presymbolic and early symbolic communication skills. Correlations between standardized and informant measures were completed, and expert opinions were obtained regarding the CCS. Results CCS scores were within expected ranges for the populations studied and inter-rater reliability was high. Comparison across other measures indicated significant correlations with standardized tests of language. Scores on informant report measures tended to place children at higher levels of communication. Expert opinions generally favored the development of the CCS. Clinical implications The scale appears to be useful for describing a given individual's level of presymbolic or early symbolic communication. Further research is needed to determine if it is sensitive to developmental growth in communication. PMID:22049404
Zhu, Leina; Gonzalez, Jorge
2017-01-01
Researchers and practitioners often use standardized vocabulary tests such as the Peabody Picture Vocabulary Test-4 (PPVT-4; Dunn and Dunn, 2007) and its companion, the Expressive Vocabulary Test-2 (EVT-2; Williams, 2007), to assess English vocabulary skills as an indicator of children's school readiness. Despite their psychometric excellence in the norm sample, issues arise when standardized vocabulary tests are used to asses children from culturally, linguistically and ethnically diverse backgrounds (e.g., Spanish-speaking English language learners) or delayed in some manner. One of the biggest challenges is establishing the appropriateness of these measures with non-English or non-standard English speaking children as often they score one to two standard deviations below expected levels (e.g., Lonigan et al., 2013). This study re-examines the issues in analyzing the PPVT-4 and EVT-2 scores in a sample of 4-to-5-year-old low SES Hispanic preschool children who were part of a larger randomized clinical trial on the effects of a supplemental English shared-reading vocabulary curriculum (Pollard-Durodola et al., 2016). It was found that data exhibited strong floor effects and the presence of floor effects made it difficult to differentiate the invention group and the control group on their vocabulary growth in the intervention. A simulation study is then presented under the multilevel structural equation modeling (MSEM) framework and results revealed that in regular multilevel data analysis, ignoring floor effects in the outcome variables led to biased results in parameter estimates, standard error estimates, and significance tests. Our findings suggest caution in analyzing and interpreting scores of ethnically and culturally diverse children on standardized vocabulary tests (e.g., floor effects). It is recommended appropriate analytical methods that take into account floor effects in outcome variables should be considered.
Kopcinovic, Lara Milevoj; Vogrinc, Zeljka; Kocijan, Irena; Culej, Jelena; Aralica, Merica; Jokic, Anja; Antoncic, Dragana; Bozovic, Marija
2016-01-01
Introduction We hypothesized that extravascular body fluid (EBF) analysis in Croatia is not harmonized and aimed to investigate preanalytical, analytical and postanalytical procedures used in EBF analysis in order to identify key aspects that should be addressed in future harmonization attempts. Materials and methods An anonymous online survey created to explore laboratory testing of EBF was sent to secondary, tertiary and private health care Medical Biochemistry Laboratories (MBLs) in Croatia. Statements were designed to address preanalytical, analytical and postanalytical procedures of cerebrospinal, pleural, peritoneal (ascites), pericardial, seminal, synovial, amniotic fluid and sweat. Participants were asked to declare the strength of agreement with proposed statements using a Likert scale. Mean scores for corresponding separate statements divided according to health care setting were calculated and compared. Results The survey response rate was 0.64 (58 / 90). None of the participating private MBLs declared to analyse EBF. We report a mean score of 3.45 obtained for all statements evaluated. Deviations from desirable procedures were demonstrated in all EBF testing phases. Minor differences in procedures used for EBF analysis comparing secondary and tertiary health care MBLs were found. The lowest scores were obtained for statements regarding quality control procedures in EBF analysis, participation in proficiency testing programmes and provision of interpretative comments on EBF’s test reports. Conclusions Although good laboratory EBF practice is present in Croatia, procedures for EBF analysis should be further harmonized to improve the quality of EBF testing and patient safety. PMID:27812307
Free digital image analysis software helps to resolve equivocal scores in HER2 immunohistochemistry.
Helin, Henrik O; Tuominen, Vilppu J; Ylinen, Onni; Helin, Heikki J; Isola, Jorma
2016-02-01
Evaluation of human epidermal growth factor receptor 2 (HER2) immunohistochemistry (IHC) is subject to interobserver variation and lack of reproducibility. Digital image analysis (DIA) has been shown to improve the consistency and accuracy of the evaluation and its use is encouraged in current testing guidelines. We studied whether digital image analysis using a free software application (ImmunoMembrane) can assist in interpreting HER2 IHC in equivocal 2+ cases. We also compared digital photomicrographs with whole-slide images (WSI) as material for ImmunoMembrane DIA. We stained 750 surgical resection specimens of invasive breast cancers immunohistochemically for HER2 and analysed staining with ImmunoMembrane. The ImmunoMembrane DIA scores were compared with the originally responsible pathologists' visual scores, a researcher's visual scores and in situ hybridisation (ISH) results. The originally responsible pathologists reported 9.1 % positive 3+ IHC scores, for the researcher this was 8.4 % and for ImmunoMembrane 9.5 %. Equivocal 2+ scores were 34 % for the pathologists, 43.7 % for the researcher and 10.1 % for ImmunoMembrane. Negative 0/1+ scores were 57.6 % for the pathologists, 46.8 % for the researcher and 80.8 % for ImmunoMembrane. There were six false positive cases, which were classified as 3+ by ImmunoMembrane and negative by ISH. Six cases were false negative defined as 0/1+ by IHC and positive by ISH. ImmunoMembrane DIA using digital photomicrographs and WSI showed almost perfect agreement. In conclusion, digital image analysis by ImmunoMembrane can help to resolve a majority of equivocal 2+ cases in HER2 IHC, which reduces the need for ISH testing.
Patients and medical statistics. Interest, confidence, and ability.
Woloshin, Steven; Schwartz, Lisa M; Welch, H Gilbert
2005-11-01
People are increasingly presented with medical statistics. There are no existing measures to assess their level of interest or confidence in using medical statistics. To develop 2 new measures, the STAT-interest and STAT-confidence scales, and assess their reliability and validity. Survey with retest after approximately 2 weeks. Two hundred and twenty-four people were recruited from advertisements in local newspapers, an outpatient clinic waiting area, and a hospital open house. We developed and revised 5 items on interest in medical statistics and 3 on confidence understanding statistics. Study participants were mostly college graduates (52%); 25% had a high school education or less. The mean age was 53 (range 20 to 84) years. Most paid attention to medical statistics (6% paid no attention). The mean (SD) STAT-interest score was 68 (17) and ranged from 15 to 100. Confidence in using statistics was also high: the mean (SD) STAT-confidence score was 65 (19) and ranged from 11 to 100. STAT-interest and STAT-confidence scores were moderately correlated (r=.36, P<.001). Both scales demonstrated good test-retest repeatability (r=.60, .62, respectively), internal consistency reliability (Cronbach's alpha=0.70 and 0.78), and usability (individual item nonresponse ranged from 0% to 1.3%). Scale scores correlated only weakly with scores on a medical data interpretation test (r=.15 and .26, respectively). The STAT-interest and STAT-confidence scales are usable and reliable. Interest and confidence were only weakly related to the ability to actually use data.
Testing non-inferiority of a new treatment in three-arm clinical trials with binary endpoints.
Tang, Nian-Sheng; Yu, Bin; Tang, Man-Lai
2014-12-18
A two-arm non-inferiority trial without a placebo is usually adopted to demonstrate that an experimental treatment is not worse than a reference treatment by a small pre-specified non-inferiority margin due to ethical concerns. Selection of the non-inferiority margin and establishment of assay sensitivity are two major issues in the design, analysis and interpretation for two-arm non-inferiority trials. Alternatively, a three-arm non-inferiority clinical trial including a placebo is usually conducted to assess the assay sensitivity and internal validity of a trial. Recently, some large-sample approaches have been developed to assess the non-inferiority of a new treatment based on the three-arm trial design. However, these methods behave badly with small sample sizes in the three arms. This manuscript aims to develop some reliable small-sample methods to test three-arm non-inferiority. Saddlepoint approximation, exact and approximate unconditional, and bootstrap-resampling methods are developed to calculate p-values of the Wald-type, score and likelihood ratio tests. Simulation studies are conducted to evaluate their performance in terms of type I error rate and power. Our empirical results show that the saddlepoint approximation method generally behaves better than the asymptotic method based on the Wald-type test statistic. For small sample sizes, approximate unconditional and bootstrap-resampling methods based on the score test statistic perform better in the sense that their corresponding type I error rates are generally closer to the prespecified nominal level than those of other test procedures. Both approximate unconditional and bootstrap-resampling test procedures based on the score test statistic are generally recommended for three-arm non-inferiority trials with binary outcomes.
Estimating chemical ecotoxicity in EU ecolabel and in EU product environmental footprint.
Saouter, Erwan; De Schryver, An; Pant, Rana; Sala, Serenella
2018-05-21
The EU Commission Ecolabel and the Product and Environmental Footprint (PEF) aim at promoting the development and consumption of greener products. The product aquatic toxicity score from these 2 methods may lead in some circumstances to opposite conclusions. Although this could be interpreted as an inconsistency, the score should not be compared to each other but used in a complementary way. In short, CDV provided a "full" product formula aquatic toxicity score, even if some chemicals may never reach or persist in freshwater ecosystems. The USEtox® score, by integrating fate and exposure, focuses on the potential toxicity of persistent-water-soluble chemicals at steady state. Since no risk or safety assessment can be conducted with USEtox® nor with the CDV, both are a hazard-based scoring system. This short communication clarifies the difference between approaches underpinning the toxicity scores used in Ecolabel and PEF, providing guidance on how to interpret the results. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
ECG interpretation skills of South African Emergency Medicine residents
Wallis, Lee; Maritz, David
2010-01-01
Background The use and interpretation of electrocardiograms (ECGs) are widely accepted as an essential core skill in Emergency Medicine. It is imperative that emergency physicians are expert in ECG interpretation when they exit their training programme. Aim It is unknown whether South African Emergency Medicine trainees are getting the necessary skills in ECG interpretation during the training programme. Currently there are no clear criteria to assess emergency physicians’ competency in ECG interpretation in South Africa. Methods A prospective cross-sectional study of Emergency Medicine residents and recently qualified emergency physicians was conducted between August 2008 and February 2009 using a focused questionnaire. Results At the time of the study, there were 55 eligible trainees in South Africa. A total of 55 assessments were distributed; 50 were returned (91%) and 49 were fully completed (89%). In this study, we found the overall average score of ECG interpretation was 46.4% [95% confidence interval (CI) 41.5–51.2%]. The junior group had an overall average of 42.2% (95% CI 36.9–47.5%), whereas the senior group managed 52.5% (95% CI 43.4–61.5%). Conclusion In this prospective cross-sectional study of Emergency Medicine residents and recently qualified emergency physicians, we found that there was improvement in the interpretation of ECGs with increased seniority. There exists, however, a low level of accuracy for many of the critical ECG diagnoses. The average score of 46.4% obtained in this study is lower than the scores obtained by other international studies from countries where Emergency Medicine is a well-established speciality. PMID:21373298
A study of perceptual and verbal skills of disabled readers in grades 4, 5 and 6.
Solan, H A; Ficarra, A P
1990-08-01
This investigation addresses the role of the optometrist in diagnosing and treating children in grades 4, 5, and 6 who have been identified as reading disabled. Fifty-one subjects with average intelligence, but whose reading comprehension skills were below the 31st percentile (mean, 20th percentile), were evaluated using verbal and perceptual tests. When the performance of this experimental group was compared with the mean scores from standardized test norms for each of the various tasks, the disabled readers scored significantly lower in seven of the eight perceptual and five of the six verbal tasks. These results lend support to the hypothesis that both perceptual and verbal deficits are related to reading comprehension. Using step-wise multiple correlation analysis, three perceptual factors; eye-movements, Auditory-Visual Integration Test (AVIT), and grooved peg-board, contributed 38 percent of the variance whereas the addition of two verbal factors (digit span and token test) provided just 2 percent. That is, 38 percent of the variations in reading comprehension could be accounted for by variations in perceptual skills in the disabled readers. The results were interpreted in terms of spatial-simultaneous and verbal-successive processing skills.
Apprenticeship-based training in neurogastroenterology and motility.
Vasant, Dipesh H; Sharma, Amol; Bhagatwala, Jigar; Viswanathan, Lavanya; Rao, Satish S C
2018-03-01
Although neurogastroenterology and motility (NGM) disorders affect 50% of patients seen in clinics, many gastroenterologists receive limited NGM training. One-month apprenticeship-based NGM training has been provided at ten centers in the USA for a decade, however, outcomes of this training are unclear. Our goal was to describe the effectiveness of this program from a trainees perspective. Areas covered: We describe the training model, learning experiences, and outcomes of one-month apprenticeship-based training in NGM at a center of excellence, using a detailed individual observer account and data from 12 consecutive trainees that completed the program. During a one-month training period, 302 procedures including; breath tests (BT) n = 132, anorectal manometry (ARM) n = 29 and esophageal manometry (EM) n = 28, were performed. Post-training, all trainees (n = 12) knew indications for motility tests, and the majority achieved independence in basic interpretation of BT, EM and ARM. Additionally, in a multiple-choice NGM written-test paper, trainees achieved significant improvements in test scores post-training (P = 0.003). Expert commentary: One-month training at a high-volume center can facilitate rapid learning of NGM and the indications, basic interpretation and utility of motility tests. Trainees demonstrate significant independence, and this training model provides an ideal platform for those interested in sub-specialty NGM.
Francis, Claire E; Longmuir, Patricia E; Boyer, Charles; Andersen, Lars Bo; Barnes, Joel D; Boiarskaia, Elena; Cairney, John; Faigenbaum, Avery D; Faulkner, Guy; Hands, Beth P; Hay, John A; Janssen, Ian; Katzmarzyk, Peter T; Kemper, Han C; Knudson, Duane; Lloyd, Meghann; McKenzie, Thomas L; Olds, Tim S; Sacheck, Jennifer M; Shephard, Roy J; Zhu, Weimo; Tremblay, Mark S
2016-02-01
The Canadian Assessment of Physical Literacy (CAPL) was conceptualized as a tool to monitor children's physical literacy. The original model (fitness, activity behavior, knowledge, motor skill) required revision and relative weights for calculating/interpreting scores were required. Nineteen childhood physical activity/fitness experts completed a 3-round Delphi process. Round 1 was open-ended questions. Subsequent rounds rated statements using a 5-point Likert scale. Recommendations were sought regarding protocol inclusion, relative importance within composite scores and score interpretation. Delphi participant consensus was achieved for 64% (47/73) of statement topics, including a revised conceptual model, specific assessment protocols, the importance of longitudinal tracking, and the relative importance of individual protocols and composite scores. Divergent opinions remained regarding the inclusion of sleep time, assessment/ scoring of the obstacle course assessment of motor skill, and the need for an overall physical literacy classification. The revised CAPL model (overlapping domains of physical competence, motivation, and knowledge, encompassed by daily behavior) is appropriate for monitoring the physical literacy of children aged 8 to 12 years. Objectively measured domains (daily behavior, physical competence) have higher relative importance. The interpretation of CAPL results should be reevaluated as more data become available.
Al-Ghatani, Ali M; Obonsawin, Marc C; Binshaig, Basmah A; Al-Moutaery, Khalaf R
2011-01-01
There are 2 aims for this study: first, to collect normative data for the Wisconsin Card Sorting Test (WCST), Stroop test, Test of Non-verbal Intelligence (TONI-3), Picture Completion (PC) and Vocabulary (VOC) sub-test of the Wechsler Adult Intelligence Scale-Revised for use in a Saudi Arabian culture, and second, to use the normative data provided to generate the regression equations. To collect the normative data and generate the regression equations, 198 healthy individuals were selected to provide a representative distribution for age, gender, years of education, and socioeconomic class. The WCST, Stroop test, TONI-3, PC, and VOC were administrated to the healthy individuals. This study was carried out at the Department of Clinical Neurosciences, Riyadh Military Hospital, Riyadh, Kingdom of Saudi Arabia from January 2000 to July 2002. Normative data were obtained for all tests, and tables were constructed to interpret scores for different age groups. Regression equations to predict performance on the 3 tests of frontal function from scores on tests of fluid (TONI-3) and premorbid intelligence were generated from the data from the healthy individuals. The data collected in this study provide normative tables for 3 tests of frontal lobe function and for tests of general intellectual ability for use in Saudi Arabia. The data also provide a method to estimate pre-injury ability without the use of verbally based tests.
Hren, Darko; Marušić, Matko; Marušić, Ana
2011-01-01
Background Moral reasoning is important for developing medical professionalism but current evidence for the relationship between education and moral reasoning does not clearly apply to medical students. We used a combined study design to test the effect of clinical teaching on moral reasoning. Methods We used the Defining Issues Test-2 as a measure of moral judgment, with 3 general moral schemas: Personal Interest, Maintaining Norms, and Postconventional Schema. The test was applied to 3 consecutive cohorts of second year students in 2002 (n = 207), 2003 (n = 192), and 2004 (n = 139), and to 707 students of all 6 study years in 2004 cross-sectional study. We also tested 298 age-matched controls without university education. Results In the cross-sectional study, there was significant main effect of the study year for Postconventional (F(5,679) = 3.67, P = 0.003) and Personal Interest scores (F(5,679) = 3.38, P = 0.005). There was no effect of the study year for Maintaining Norms scores. 3rd year medical students scored higher on Postconventional schema score than all other study years (p<0.001). There were no statistically significant differences among 3 cohorts of 2nd year medical students, demonstrating the absence of cohort or point-of-measurement effects. Longitudinal study of 3 cohorts demonstrated that students regressed from Postconventional to Maintaining Norms schema-based reasoning after entering the clinical part of the curriculum. Interpretation Our study demonstrated direct causative relationship between the regression in moral reasoning development and clinical teaching during medical curriculum. The reasons may include hierarchical organization of clinical practice, specific nature of moral dilemmas faced by medical students, and hidden medical curriculum. PMID:21479204
Xu, Ronghui; Li, Mofei; Johnson, Diana L; Luo, Yunjun; Chambers, Christina D
2017-05-01
Suboptimal asthma control during pregnancy may impact perinatal outcomes. U.S. guidelines recommend questionnaires to assess asthma control including the Asthma Control Test (ACT). It is unknown in a research setting to what extent recall differs by the time between symptom occurrence and the administration of the questionnaire. Between 2009-2014, 196 pregnant asthmatic women were recruited by the Organization of Teratology Information Specialists (OTIS) MotherToBaby Pregnancy Studies. Participants were administered the ACT at enrollment, gestational weeks 20 and 32, and shortly after delivery. The same women were also administered the ACT retrospectively at approximately 6 months postpartum. The Pearson correlation coefficients between the in-pregnancy and retrospective continuous ACT scores for the 1st, 2nd and 3rd trimesters were: 0.67 (95% CI: 0.58, 0.74), 0.61 (0.52, 0.70) and 0.65 (0.56, 0.72), respectively. When dichotomized into well-controlled asthma (ACT score ≥ 20) versus otherwise, the chi-square test for all three trimesters resulted in p values <0.0001. Cohen's Kappa statistics for the same dichotomized scores were 0.51, 0.45 and 0.40 for each trimester respectively. There was no evidence that adverse outcome of pregnancy (recall bias) influenced postpartum responses. The retrospectively recalled ACT score obtained postpartum was substantially different compared to in-pregnancy administration of the same questionnaire which could reflect test-retest variability as well as attenuation of recall. Documentation of the magnitude and direction of these differences could be useful in interpretation of the impact of asthma control when the ACT is used in retrospective case-control studies for pregnancy outcomes.
Coefficient Alpha and Reliability of Scale Scores
ERIC Educational Resources Information Center
Almehrizi, Rashid S.
2013-01-01
The majority of large-scale assessments develop various score scales that are either linear or nonlinear transformations of raw scores for better interpretations and uses of assessment results. The current formula for coefficient alpha (a; the commonly used reliability coefficient) only provides internal consistency reliability estimates of raw…
Prevalence and characteristics of orthorexia nervosa in a sample of university students in Italy.
Dell'Osso, L; Carpita, Barbara; Muti, D; Cremone, I M; Massimetti, G; Diadema, E; Gesi, C; Carmassi, C
2018-02-01
Orthorexia nervosa (ON) has been recently defined as a pathological approach to feeding related to healthiness concerns and purity of food and/or feeding habits. This condition recently showed an increasing prevalence particularly among young adults. In order to investigate the prevalence of ON and its relationship with gender and nutritional style among young adults, we explored a sample of students from the University of Pisa, Italy. Assessments included the ORTO-15 questionnaire and a socio-demographic and eating habits form. Subjects were dichotomized for eating habits (i.e. standard vs vegetarian/vegan diet), gender, parents' educational level, type of high school attended, BMI (low vs high vs normal BMI). Chi square tests were performed to compare rates of subjects with overthreshold ORTO-15 scores, and Student's unpaired t test to compare mean scores between groups. Two Classification tree analyses with CHAID growing method were employed to identify the variables best predicting ON and ORTO-15 total score. more than one-third of the sample showed ON symptoms (ORTO-15 ≥ 35), with higher rates among females. Tree analyses showed diet type to predict ON and ORTO-15 total score more than gender. Our results seem to corroborate recent data highlighting similarities between ON and anorexia nervosa (AN). We propose an interpretation of ON as a phenotype of AN in the broader context of Feeding and eating disorders (FEDs) spectrum.
Hughes, Alicia M; Hirsch, Colette R; Nikolaus, Stephanie; Chalder, Trudie; Knoop, Hans; Moss-Morris, Rona
2018-02-01
This study aims to replicate a UK study, with a Dutch sample to explore whether attention and interpretation biases and general attentional control deficits in chronic fatigue syndrome (CFS) are similar across populations and cultures. Thirty eight Dutch CFS participants were compared to 52 CFS and 51 healthy participants recruited from the UK. Participants completed self-report measures of symptoms, functioning, and mood, as well as three experimental tasks (i) visual-probe task measuring attentional bias to illness (somatic symptoms and disability) versus neutral words, (ii) interpretive bias task measuring positive versus somatic interpretations of ambiguous information, and (iii) the Attention Network Test measuring general attentional control. Compared to controls, Dutch and UK participants with CFS showed a significant attentional bias for illness-related words and were significantly more likely to interpret ambiguous information in a somatic way. These effects were not moderated by attentional control. There were no significant differences between the Dutch and UK CFS groups on attentional bias, interpretation bias, or attentional control scores. This study replicated the main findings of the UK study, with a Dutch CFS population, indicating that across these two cultures, people with CFS demonstrate biases in how somatic information is attended to and interpreted. These illness-specific biases appear to be unrelated to general attentional control deficits.
Rouselle, Serge D; Dillon, Krista N; Rousselle-Sabiac, Theo H; Brady, Dane A; Tunev, Stefan; Tellez, Armando
2016-08-01
The use of preclinical animal models is integral to the safety assessment, pathogenesis research, and testing of diagnostic technologies and therapeutic interventions. With inherent similarity to human anatomy and physiology, various porcine models have been the preferred preclinical model in some research areas such as medical devices, wound healing, and skin therapies. The porcine model has been the cornerstone for interventional cardiology for the evaluation and development of this catheter-based renal denervation (RDN) therapy. The porcine model provides similar vascular access and renal neurovascular anatomy to humans. In these preclinical studies, the downstream kidneys from treated arteries are assessed for possible histopathological changes in the vessel dependent territories. In assessing renal safety following RDN, it becomes critical to distinguish treatment-related changes from pre-existing background pathologies. The incidence of background pathological changes in porcine kidneys has not been previously established in normal clinically healthy. Samples from the cranial, middle, and caudal portion of 331 naïve kidneys from 181 swine were processed histologically to slides and evaluated microscopically. The most commonly encountered spontaneous changes were chronic pyelonephritis found in nearly half of the evaluated naïve kidneys (∼40 %; score 1 = 91 %, score 2 = 8.4 %, score 3 = 0.76 %) followed by chronic interstitial inflammation in 9.7 % of the kidneys (score 1 = 90.6 %, score 2 = 9.4 %). Interestingly, there were a few rare spontaneous vascular changes that could potentially affect data interpretation in interventional and toxicology studies: arteritis and arteriolar dissection. The presence of pelvic cysts was a common occurrence (6.3 %) in the kidney. The domestic swine is a widely used preclinical species in interventional research, namely in the emerging field of transcatheter renal denervation. This retrospective study presents the historical incidence of spontaneous lesions recorded in the kidneys from naive pigs enrolled in renal denervation studies. There were commonly encountered changes of little pathological consequence such as pyelonephritis or pelvic cysts and rare vascular changes such as arteritis and arteriolar dissection that were of greater potential impact on study data interpretation. These results offer a benchmark by which to gage the potential effect of a procedure or treatment on renal histopathology in swine and assist in data interpretation.
Hoch, Johanna M; Sinnott, Cori W; Robinson, Kendall P; Perkins, William O; Hartman, Jonathan W
2018-03-01
There is a lack of literature to support the diagnostic accuracy and cut-off scores of commonly used patient-reported outcome measures (PROMs) and clinician-oriented outcomes such as postural-control assessments (PCAs) when treating post-ACL reconstruction (ACLR) patients. These scores could help tailor treatments, enhance patient-centered care and may identify individuals in need of additional rehabilitation. To determine if differences in 4-PROMs and 3-PCAs exist between post-ACLR and healthy participants, and to determine the diagnostic accuracy and cut-off scores of these outcomes. Case control. Laboratory. A total of 20 post-ACLR and 40 healthy control participants. The participants completed 4-PROMs (the Disablement in the Physically Active Scale [DPA], The Fear-Avoidance Belief Questionnaire [FABQ], the Knee Osteoarthritis Outcomes Score [KOOS] subscales, and the Tampa Scale of Kinesiophobia [TSK-11]) and 3-PCAs (the Balance Error Scoring System [BESS], the modified Star Excursion Balance Test [SEBT], and static balance on an instrumented force plate). Mann-Whitney U tests examined differences between groups. Receiver operating characteristic (ROC) curves were employed to determine sensitivity and specificity. The Area Under the Curve (AUC) was calculated to determine the diagnostic accuracy of each instrument. The Youdin Index was used to determine cut-off scores. Alpha was set a priori at P < 0.05. There were significant differences between groups for all PROMs (P < 0.05). There were no differences in PCAs between groups. The cut-off scores should be interpreted with caution for some instruments, as the scores may not be clinically applicable. Post-ACLR participants have decreased self-reported function and health-related quality of life. The PROMs are capable of discriminating between groups. Clinicians should consider using the cut-off scores in clinical practice. Further use of the instruments to examine detriments after completion of standard rehabilitation may be warranted.
Evaluation of a New Scoring System for Retinal Nerve Fiber Layer Photography Using HRA1 in 964 Eyes
Hong, Samin; Moon, Jong Wook; Ha, Seung Joo; Kim, Chan Yun; Seong, Gong Je
2007-01-01
Purpose To evaluate retinal nerve fiber layer (RNFL) defect by a new scoring system for RNFL photography using the Heidelberg Retina Angiograph 1 (HRA1). Methods This retrospective study included 128 healthy eyes and 836 primary open-angle glaucoma eyes. The RNFL photography using HRA1 was interpreted using a new scoring system, and correlated with visual field indices of standard automated perimetry (SAP). Using the presence of RNFL defect, darkness, width, and location, we established the new scoring system of RNFL photos. Results The mean RNFL defect score I in the early, moderate, severe, and control groups were 7.3, 9.2, 10.4, and 3.6, respectively. The mean RNFL defect score II in the early, moderate, severe, and control groups were 14.5, 28.5, 43.4, and 3.4, respectively. Correlations between the RNFL defect score II and the mean deviation of SAP was the strongest of the various combinations (r=-0.675, P<.001). Conclusions Using a new scoring system, we propose a method for semi-quantitative interpretation of RNFL photographs. This scoring system may be helpful to distinguish between normal and glaucomatous eyes, and the score is associated with the severity of visual field loss. PMID:18063886
ARBOOK: Development and Assessment of a Tool Based on Augmented Reality for Anatomy
NASA Astrophysics Data System (ADS)
Ferrer-Torregrosa, J.; Torralba, J.; Jimenez, M. A.; García, S.; Barcia, J. M.
2015-02-01
The evolution of technologies and the development of new tools with educational purposes are growing up. This work presents the experience of a new tool based on augmented reality (AR) focusing on the anatomy of the lower limb. ARBOOK was constructed and developed based on TC and MRN images, dissections and drawings. For ARBOOK evaluation, a specific questionnaire of three blocks was performed and validated according to the Delphi method. The questionnaire included motivation and attention tasks, autonomous work and three-dimensional interpretation tasks. A total of 211 students from 7 public and private Spanish universities were divided in two groups. Control group received standard teaching sessions supported by books, and video. The ARBOOK group received the same standard sessions but additionally used the ARBOOK tool. At the end of the training, a written test on lower limb anatomy was done by students. Statistically significant better scorings for the ARBOOK group were found on attention-motivation, autonomous work and three-dimensional comprehension tasks. Additionally, significantly better scoring was obtained by the ARBOOK group in the written test. The results strongly suggest that the use of AR is suitable for anatomical purposes. Concretely, the results indicate how this technology is helpful for student motivation, autonomous work or spatial interpretation. The use of this type of technologies must be taken into account even more at the present moment, when new technologies are naturally incorporated to our current lives.
Green, Robin E A; Melo, Brenda; Christensen, Bruce; Ngo, Le-Anh; Monette, Georges; Bradbury, Cheryl
2008-02-01
Estimation of premorbid IQ in traumatic brain injury (TBI) is clinically and scientifically valuable because it permits the quantification of the cognitive impact of injury. This is achieved by comparing performances on tests of current ability to estimates of premorbid IQ, thereby enabling current capacity to be interpreted in light of preinjury ability. However, the validity of premorbid IQ tests that are commonly used for TBI has been questioned. In the present study, we examined the psychometric properties of a recently developed test, the Wechsler Test of Adult Reading (WTAR), which has yet to be examined for TBI. The cognitive performance of a group of 24 patients recovering from TBI (with a mean Glasgow Coma Scale score in the severely impaired range) was measured at 2 and 5 months postinjury. On both occasions, patients were administered three tests that have been used to measure premorbid IQ (the WTAR and the Vocabulary and Matrix Reasoning subtests of the Wechsler Adult Intelligence Scale 3rd Edition, WAIS-III) and three tests of current ability (Symbol Digit Modalities Test-Oral and Similarities and Block Design subtests of the WAIS-III). We found that performance significantly improved on tests of current cognitive ability, confirming recovery. In contrast, stable performance was observed on the WTAR from Assessment 1 (M = 34.25/50) to Assessment 2 (M = 34.21/50; r = .970, p < .001). Mean improvement across assessments was negligible (t = -0.086, p = .47; Cohen's d = -.005), and minimal individual participant change was observed (modal scaled score change = 0). WTAR scores were also highly similar to scores on a demographic estimate of premorbid IQ. Thus, converging evidence--high stability during recovery from TBI and similar IQ estimates to those of a demographic equation suggests that the WTAR is a valid measure of premorbid IQ for TBI. Where word pronunciation tests are indicated (i.e., in patients for whom English is spoken and read fluently), these results endorse the use of the WTAR for patients with TBI.
Ngo, Carine; Laé, Marick; Ratour, Julia; Hamel, Frédérique; Taris, Corinne; Caly, Martial; Le Cunff, Annie; Reyal, Fabien; Kirova, Youlia; Pierga, Jean-Yves; Vincent-Salomon, Anne
The implementation of an internal quality control is mandatory to guarantee the accuracy of HER2 status in invasive breast cancers. To evaluate the impact of our quality control assurance on HER2 status results in invasive breast carcinomas from 2008 to 2014. HER2 status was determined by immunohistochemistry as the first-line indication, completed by fluorescence in situ hybridization (FISH) for scores 2+ by immunohistochemistry. Internal quality control of HER2 status relied on the standardization of pre-analytical phases, the use of external controls with a known number of HER2 gene copies determined by FISH and continued monitoring of concordance between immunohistochemistry and FISH. The proportion of HER2-positive cases corresponding to scores 3+ by immunohistochemistry and 2+ amplified by FISH varied from 10.6% to 13.8% (median of 11.3%). The proportion of scores 2+ amplified by FISH varied from 13.3% to 32.7% during period of study. The rate of concordance between FISH and immunohistochemistry for score 0/1+ and 3+ cases were≥97%. Eight among 12 discordant cases were false positive resulting from errors in interpretation of immunohistochemistry (score 2+ instead of 3+). Calibration of immunohistochemistry on FISH for HER2 status contributes to limit variability of immunohistochemistry results due to technical issues or interpretation. The implementation of an external control of score 3+ on each slide enables accurate interpretation of score 2+ and 3+ by immunohistochemistry. Copyright © 2017 Société Française du Cancer. Published by Elsevier Masson SAS. All rights reserved.
Barisoni, Laura; Troost, Jonathan P; Nast, Cynthia; Bagnasco, Serena; Avila-Casado, Carmen; Hodgin, Jeffrey; Palmer, Matthew; Rosenberg, Avi; Gasim, Adil; Liensziewski, Chrysta; Merlino, Lino; Chien, Hui-Ping; Chang, Anthony; Meehan, Shane M; Gaut, Joseph; Song, Peter; Holzman, Lawrence; Gibson, Debbie; Kretzler, Matthias; Gillespie, Brenda W; Hewitt, Stephen M
2016-07-01
The multicenter Nephrotic Syndrome Study Network (NEPTUNE) digital pathology scoring system employs a novel and comprehensive methodology to document pathologic features from whole-slide images, immunofluorescence and ultrastructural digital images. To estimate inter- and intra-reader concordance of this descriptor-based approach, data from 12 pathologists (eight NEPTUNE and four non-NEPTUNE) with experience from training to 30 years were collected. A descriptor reference manual was generated and a webinar-based protocol for consensus/cross-training implemented. Intra-reader concordance for 51 glomerular descriptors was evaluated on jpeg images by seven NEPTUNE pathologists scoring 131 glomeruli three times (Tests I, II, and III), each test following a consensus webinar review. Inter-reader concordance of glomerular descriptors was evaluated in 315 glomeruli by all pathologists; interstitial fibrosis and tubular atrophy (244 cases, whole-slide images) and four ultrastructural podocyte descriptors (178 cases, jpeg images) were evaluated once by six and five pathologists, respectively. Cohen's kappa for inter-reader concordance for 48/51 glomerular descriptors with sufficient observations was moderate (0.40
McDonald, Carrie R; Delis, Dean C; Kramer, Joel H; Tecoma, Evelyn S; Iragui, Vicente J
2008-05-01
The ability to interpret nonliteral, metaphoric language was explored in patients with frontal lobe epilepsy (FLE) and temporal lobe epilepsy (TLE), and matched control participants, to determine (1) if patients with FLE were impaired in their interpretations relative to those with TLE and controls, and (2) if disease-related variables (e.g., age of seizure onset) predicted performances in either patient group. A total of 22 patients with FLE, 20 patients with TLE, and 23 controls were administered a test of proverb interpretation to assess their ability to grasp the abstract meaning of nonliteral language. Participants were presented with a series of proverbs and asked to provide an oral interpretation of each. Responses to each proverb were scored according to their accuracy and level of abstractness. Patients with FLE, but not TLE, were impaired relative to controls in their overall interpretation of proverbs. However, a subgroup analysis revealed that only patients with left FLE showed impaired interpretation accuracy relative to the other groups, whereas patients with both left FLE and left TLE showed impaired abstraction. Patients with FLE were also impaired when they were asked to select the best interpretation of the proverb from response alternatives. In patients with FLE, only a left-sided seizure focus was associated with poorer performance. In patients with TLE, both an early age of onset and a left-sided seizure focus predicted poorer performance. Overall, FLE patients exhibit greater impairment than TLE patients in interpreting proverbs. However, the nature and disease-specific correlates of impaired performances in proverb interpretation differ between the groups.
McDonald, Carrie R.; Delis, Dean C.; Kramer, Joel H.; Tecoma, Evelyn S.; Iragui, Vicente J.
2017-01-01
The ability to interpret nonliteral, metaphoric language was explored in patients with frontal lobe epilepsy (FLE) and temporal lobe epilepsy (TLE), and matched control participants, to determine (1) if patients with FLE were impaired in their interpretations relative to those with TLE and controls, and (2) if disease-related variables (e.g., age of seizure onset) predicted performances in either patient group. A total of 22 patients with FLE, 20 patients with TLE, and 23 controls were administered a test of proverb interpretation to assess their ability to grasp the abstract meaning of nonliteral language. Participants were presented with a series of proverbs and asked to provide an oral interpretation of each. Responses to each proverb were scored according to their accuracy and level of abstractness. Patients with FLE, but not TLE, were impaired relative to controls in their overall interpretation of proverbs. However, a subgroup analysis revealed that only patients with left FLE showed impaired interpretation accuracy relative to the other groups, whereas patients with both left FLE and left TLE showed impaired abstraction. Patients with FLE were also impaired when they were asked to select the best interpretation of the proverb from response alternatives. In patients with FLE, only a left-sided seizure focus was associated with poorer performance. In patients with TLE, both an early age of onset and a left-sided seizure focus predicted poorer performance. Overall, FLE patients exhibit greater impairment than TLE patients in interpreting proverbs. However, the nature and disease-specific correlates of impaired performances in proverb interpretation differ between the groups. PMID:17853125
Measurement invariance, the lack thereof, and modeling change.
Edwards, Michael C; Houts, Carrie R; Wirth, R J
2017-08-17
Measurement invariance issues should be considered during test construction. In this paper, we provide a conceptual overview of measurement invariance and describe how the concept is implemented in several different statistical approaches. Typical applications look for invariance over things such as mode of administration (paper and pencil vs. computer based), language/translation, age, time, and gender, to cite just a few examples. To the extent that the relationships between items and constructs are stable/invariant, we can be more confident in score interpretations. A series of simulated examples are reported which highlight different kinds of non-invariance, the impact it can have, and the effect of appropriately modeling a lack of invariance. One example focuses on the longitudinal context, where measurement invariance is critical to understanding trends over time. Software syntax is provided to help researchers apply these models with their own data. The simulation studies demonstrate the negative impact an erroneous assumption of invariance may have on scores and substantive conclusions drawn from naively analyzing those scores. Measurement invariance implies that the links between the items and the construct of interest are invariant over some domain, grouping, or classification. Examining a new or existing test for measurement invariance should be part of any test construction/implementation plan. In addition to reviewing implications of the simulation study results, we also provide a discussion of the limitations of current approaches and areas in need of additional research.
Visuospatial ability correlates with performance in simulated gynecological laparoscopy.
Ahlborg, Liv; Hedman, Leif; Murkes, Daniel; Westman, Bo; Kjellin, Ann; Felländer-Tsai, Li; Enochsson, Lars
2011-07-01
To analyze the relationship between visuospatial ability and simulated laparoscopy performed by consultants in obstetrics and gynecology (OBGYN). This was a prospective cohort study carried out at two community hospitals in Sweden. Thirteen consultants in obstetrics and gynecology were included. They had previously independently performed 10-100 advanced laparoscopies. Participants were tested for visuospatial ability by the Mental Rotations Test version A (MRT-A). After a familiarization session and standardized instruction, all participants subsequently conducted three consecutive virtual tubal occlusions followed by three virtual salpingectomies. Performance in the simulator was measured by Total Time, Score and Ovarian Diathermy Damage. Linear regression was used to analyze the relationship between visuospatial ability and simulated laparoscopic performance. The learning curves in the simulator were assessed in order to interpret the relationship with the visuospatial ability. Visuospatial ability correlated with Total Time (r=-0.62; p=0.03) and Score (r=0.57; p=0.05) in the medium level of the virtual tubal occlusion. In the technically more advanced virtual salpingectomy the visuospatial ability correlated with Total Time (r=-0.64; p=0.02), Ovarian Diathermy Damage (r=-0.65; p=0.02) and with overall Score (r=0.64; p=0.02). Visuospatial ability appears to be related to the performance of gynecological laparoscopic procedures in a simulator. Testing visuospatial ability might be helpful when designing individual training programs. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Assessing pharmacy residents' knowledge of biostatistics and research study design.
Bookstaver, P Brandon; Miller, April D; Felder, Tisha M; Tice, Danielle L; Norris, LeAnn B; Sutton, S Scott
2012-01-01
Historically, clinicians have demonstrated a lack of confidence and poor aptitude for biostatistics as a tool for medical literature interpretation. Evaluation of pharmacy residents' ability to interpret biostatistics commonly used in peer-reviewed literature has not been previously conducted. To evaluate the level of understanding and perception of biostatistics concepts among pharmacy residents. A survey of postgraduate year 1 (PGY1) residents in American Society of Health-System Pharmacists-accredited residency programs was conducted in May 2009. The survey instrument consisted of 27 items, including 10 knowledge-based questions, and was distributed to residency programs for anonymous reporting via SurveyMonkey. The primary outcome of interest was biostatistics knowledge, defined as the percent total score of correct knowledge items. Statistical attitude and confidence questions were rated on a 5-point Likert-type scale (1 = strongly disagree, 5 = strongly agree). The t-test or 1-way analysis of variance was conducted, as appropriate, to assess for differences in mean biostatistics knowledge scores by respondent characteristics. Forward stepwise regression was used to identify which characteristics were independently associated with biostatistics knowledge. A total of 214 PGY1 residents responded to the online survey assessment, and a subset of respondents (n = 166) answered 1 or more of the biostatistics knowledge questions. Of those who responded to at least 1 knowledge assessment, the overall mean (SD) biostatistics knowledge score was 47.3% (18.50%; range 0-90). Overall, respondents were predominantly female (74%) and younger than 30 years (81%). Residents scored highest in the recognition of the purpose of a double-blind study (92.6%; 95% CI 88.52 to 96.67), interpretation of relative risk (75.8%; 95% CI 69.02 to 82.57), and identification of the appropriate analytic method for a nominal variable (69.4%; 95% CI 62.16 to 76.59). Bivariate analyses showed that there were statistically significant mean differences in knowledge scores by attitude (p = 0.001) and confidence (p < 0.001). The multivariate model showed that above-average confidence ratings were associated with an absolute increase of 7.6% in biostatistics knowledge score (p < 0.019) compared to those whose confidence rating was at or below average. Overall, pharmacy residents' perception and understanding of biostatistics were poor in this assessment, which correlates with previous reports. Enhanced training in biostatistics and literature evaluation of both mentors and trainees should be incorporated in PharmD programs and residency training sites.
Takeuchi, K.; Togashi, Y.; Kamihara, Y.; Fukuyama, T.; Yoshioka, H.; Inoue, A.; Katsuki, H.; Kiura, K.; Nakagawa, K.; Seto, T.; Maemondo, M.; Hida, T.; Harada, M.; Ohe, Y.; Nogami, N.; Yamamoto, N.; Nishio, M.; Tamura, T.
2016-01-01
Background Anaplastic lymphoma kinase (ALK) fusions need to be accurately and efficiently detected for ALK inhibitor therapy. Fluorescence in situ hybridization (FISH) remains the reference test. Although increasing data are supporting that ALK immunohistochemistry (IHC) is highly concordant with FISH, IHC screening needed to be clinically and prospectively validated. Patients and methods In the AF-001JP trial for alectinib, 436 patients were screened for ALK fusions through IHC (n = 384) confirmed with FISH (n = 181), multiplex RT-PCR (n = 68), or both (n = 16). IHC results were scored with iScore. Result ALK fusion was positive in 137 patients and negative in 250 patients. Since the presence of cancer cells in the samples for RT-PCR was not confirmed, ALK fusion negativity could not be ascertained in 49 patients. IHC interpreted with iScore showed a 99.4% (173/174) concordance with FISH. All 41 patients who had iScore 3 and were enrolled in phase II showed at least 30% tumor reduction with 92.7% overall response rate. Two IHC-positive patients with an atypical FISH pattern responded to ALK inhibitor therapy. The reduction rate was not correlated with IHC staining intensity. Conclusions Our study showed (i) that when sufficiently sensitive and appropriately interpreted, IHC can be a stand-alone diagnostic for ALK inhibitor therapies; (ii) that when atypical FISH patterns are accompanied by IHC positivity, the patients should be considered as candidates for ALK inhibitor therapies, and (iii) that the expression level of ALK fusion is not related to the level of response to ALK inhibitors and is thus not required for patient selection. Registration number JapicCTI-101264 (This study is registered with the Japan Pharmaceutical Information Center). PMID:26487585
Competency Assessment in Senior Emergency Medicine Residents for Core Ultrasound Skills.
Schmidt, Jessica N; Kendall, John; Smalley, Courtney
2015-11-01
Quality resident education in point-of-care ultrasound (POC US) is becoming increasingly important in emergency medicine (EM); however, the best methods to evaluate competency in graduating residents has not been established. We sought to design and implement a rigorous assessment of image acquisition and interpretation in POC US in a cohort of graduating residents at our institution. We evaluated nine senior residents in both image acquisition and image interpretation for five core US skills (focused assessment with sonography for trauma (FAST), aorta, echocardiogram (ECHO), pelvic, central line placement). Image acquisition, using an observed clinical skills exam (OSCE) directed assessment with a standardized patient model. Image interpretation was measured with a multiple-choice exam including normal and pathologic images. Residents performed well on image acquisition for core skills with an average score of 85.7% for core skills and 74% including advanced skills (ovaries, advanced ECHO, advanced aorta). Residents scored well but slightly lower on image interpretation with an average score of 76%. Senior residents performed well on core POC US skills as evaluated with a rigorous assessment tool. This tool may be developed further for other EM programs to use for graduating resident evaluation.
Tiemens, Annemiek; van Rijn, Rogier M; Wyon, Matthew A; Redding, Emma; Stubbe, Janine H
2018-06-01
To explore whether movement quality has influence on heart rate (HR) frequency during the dance-specific aerobic fitness test (DAFT). Thirteen contemporary university dance students (age 19 ± 1.46 yrs) underwent two trials performing the DAFT while wearing a Polar HR monitor (Kempele, Finland). During the first trial, dancers were asked to perform the movements as if they were performing on stage, whereas during the second trial, standardized verbal instructions were given to reduce the quality of movement (e.g., no need to perform technically correct pliés). The variables measured at each trial were HR for all five stages of the DAFT and HR recovery (1 and 2 min after finishing the DAFT), movement quality (MQ) score, and rate of perceived exertion score (RPE). There were significant differences in HR between Trial 1 and Trial 2. For all stages and the resting period, HR was lower during Trial 2 (p<0.001). Also, the RPE score was significantly lower and the MQ score was significantly higher, indicating a poorer performance, during Trial 2 (both p<0.001). The results suggest that DAFT performance with lower movement quality elicits lower HR frequency and RPE during the DAFT. We recommend that specific instructions be given to participants about executing the movement sequence during the DAFT before testing commences. Also, movement quality must be taken into account when interpreting HR results from the DAFT in order to distinguish if a dancer's low HR results from good aerobic fitness or from poor performance of the movement sequence.
SERE: single-parameter quality control and sample comparison for RNA-Seq.
Schulze, Stefan K; Kanwar, Rahul; Gölzenleuchter, Meike; Therneau, Terry M; Beutler, Andreas S
2012-10-03
Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson's correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task. Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson's r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen's simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute. SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter.
SERE: Single-parameter quality control and sample comparison for RNA-Seq
2012-01-01
Background Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson’s correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task. Results Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson’s r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen’s simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute. Conclusions SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter. PMID:23033915
Reassessing the "traditional background hypothesis" for elevated MMPI and MMPI-2 Lie-scale scores.
Rosen, Gerald M; Baldwin, Scott A; Smith, Ronald E
2016-10-01
The Lie (L) scale of the Minnesota Multiphasic Personality Inventory (MMPI) is widely regarded as a measure of conscious attempts to deny common human foibles and to present oneself in an unrealistically positive light. At the same time, the current MMPI-2 manual states that "traditional" and religious backgrounds can account for elevated L scale scores as high as 65T-79T, thereby tempering impression management interpretations for faith-based individuals. To assess the validity of the traditional background hypothesis, we reviewed 11 published studies that employed the original MMPI with religious samples and found that only 1 obtained an elevated mean L score. We then conducted a meta-analysis of 12 published MMPI-2 studies in which we compared L scores of religious samples to the test normative group. The meta-analysis revealed large between-study heterogeneity (I2 = 87.1), L scale scores for religious samples that were somewhat higher but did not approach the upper limits specified in the MMPI-2 manual, and an overall moderate effect size (d¯ = 0.54, p < .001; 95% confidence interval [0.37, 0.70]). Our analyses indicated that religious-group membership accounts, on average, for elevations on L of about 5 t-score points. Whether these scores reflect conscious "fake good" impression management or religious-based virtuousness remains unanswered. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Ziebolz, Dirk; Schmalz, Gerhard; Kauffels, Anne; Widmer, Florian; Widmer, Katja; Slotta, Jan E; Mausberg, Rainer F; Kollmar, Otto
2017-04-01
The aim of this single-center cross-sectional study was to detect the prevalence of selected periodontal pathogenic bacteria and active matrix metalloproteinase-8 (aMMP-8) level in patients before (preLTx) and after liver transplantation (postLTx). Periodontal pocket depth (PPD) and clinical attachment loss (CAL) were assessed. Subgingival biofilm samples were analyzed using polymerase chain reaction (PCR) to detect 11 common periodontal pathogens. Gingival crevicular fluid (GCF) samples were analyzed with enzyme-linked immunosorbent assay (ELISA) to determine aMMP-8 level and assigned to a scoring system: score 0: 0-8 ng/ml, score 1: 8-20 ng/ml, and score 2: >20 ng/ml. The following were used for the statistical analysis: t test, Mann-Whitney U test, Fishers test (α = 5 %). In total, 110 patients (preLTx: n = 35, postLTx: n = 75) could be included in the study. Periodontal findings were not significantly different between groups. In microbiological analysis, a significantly higher prevalence of Campylobacter rectus in preLTx group was detected (p = 0.03). Significantly more patients with score 0 in postLTx group (p = 0.024) and significantly more patients with score 1 in preLTx group were found (p = 0.004). Furthermore, aMMP-8 concentrations for patients with moderate periodontitis were significantly lower in postLTx group compared to preLTx group (p = 0.045). Additionally, in postLTx group, aMMP-8 concentration was significantly higher in patients with severe periodontitis compared to those with no/mild periodontitis (p = 0.016). LTx appears to affect aMMP-8 level, but not bacterial findings in patients after LTx. Determination of aMMP-8 level in patients after LTx with immunosuppressive medication might lead to wrong interpretation of the results.
Ruaño, Gualberto; Kocherla, Mohan; Graydon, James S; Holford, Theodore R; Makowski, Gregory S; Goethe, John W
2016-05-01
We describe a population genetic approach to compare samples interpreted with expert calling (EC) versus automated calling (AC) for CYP2D6 haplotyping. The analysis represents 4812 haplotype calls based on signal data generated by the Luminex xMap analyzers from 2406 patients referred to a high-complexity molecular diagnostics laboratory for CYP450 testing. DNA was extracted from buccal swabs. We compared the results of expert calls (EC) and automated calls (AC) with regard to haplotype number and frequency. The ratio of EC to AC was 1:3. Haplotype frequencies from EC and AC samples were convergent across haplotypes, and their distribution was not statistically different between the groups. Most duplications required EC, as only expansions with homozygous or hemizygous haplotypes could be automatedly called. High-complexity laboratories can offer equivalent interpretation to automated calling for non-expanded CYP2D6 loci, and superior interpretation for duplications. We have validated scientific expert calling specified by scoring rules as standard operating procedure integrated with an automated calling algorithm. The integration of EC with AC is a practical strategy for CYP2D6 clinical haplotyping. Copyright © 2016 Elsevier B.V. All rights reserved.
Fitzsimmons-Craft, Ellen E; Bardone-Cone, Anna M; Harney, Megan B
2012-09-01
We constructed and validated a measure of comparison dimensions associated with eating pathology, namely, the body, eating, and exercise comparison orientation measure (BEECOM). Participants were 441 undergraduate women. In Study 1, items were generated and refined via exploratory factor analysis, yielding three interpretable factors (i.e., body, eating, and exercise comparison orientation). Confirmatory factor analysis was then used to confirm the three-factor structure of the BEECOM and to investigate the potential presence of a higher-order factor. Given that the lower-order factors loaded strongly onto a higher-order factor, it is appropriate to use a total BEECOM score, in addition to subscale scores. Further, the BEECOM's scores yielded evidence of internal consistency and construct validity in this sample. Study 2 demonstrated two-week test-retest reliability of the BEECOM among college women. Overall, the BEECOM demonstrated good psychometric properties and may be useful for more comprehensively assessing eating disorder-related social comparison behavior. Copyright © 2012 Elsevier Ltd. All rights reserved.
Improving Abnormality Detection on Chest Radiography Using Game-Like Reinforcement Mechanics.
Chen, Po-Hao; Roth, Howard; Galperin-Aizenberg, Maya; Ruutiainen, Alexander T; Gefter, Warren; Cook, Tessa S
2017-11-01
Despite their increasing prevalence, online textbooks, question banks, and digital references focus primarily on explicit knowledge. Implicit skills such as abnormality detection require repeated practice on clinical service and have few digital substitutes. Using mechanics traditionally deployed in video games such as clearly defined goals, rapid-fire levels, and narrow time constraints may be an effective way to teach implicit skills. We created a freely available, online module to evaluate the ability of individuals to differentiate between normal and abnormal chest radiographs by implementing mechanics, including instantaneous feedback, rapid-fire cases, and 15-second timers. Volunteer subjects completed the modules and were separated based on formal experience with chest radiography. Performance between training and testing sets were measured for each group, and a survey was administered after each session. The module contained 74 cases and took approximately 20 minutes to complete. Thirty-two cases were normal radiographs and 56 cases were abnormal. Of the 60 volunteers recruited, 25 were "never trained" and 35 were "previously trained." "Never trained" users scored 21.9 out of 37 during training and 24.0 out of 37 during testing (59.1% vs 64.9%, P value <.001). "Previously trained" users scored 28.0 out of 37 during training and 28.3 out of 37 during testing phases (75.6% vs 76.4%, P value = .56). Survey results showed that 87% of all subjects agreed the module is an efficient way of learning, and 83% agreed the rapid-fire module is valuable for medical students. A gamified online module may improve the abnormality detection rates of novice interpreters of chest radiography, although experienced interpreters are less likely to derive similar benefits. Users reviewed the educational module favorably. Copyright © 2017 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
Validity and Reliability of Nintendo Wii Fit Balance Scores
Wikstrom, Erik A.
2012-01-01
Context: Interactive gaming systems have the potential to help rehabilitate patients with musculoskeletal conditions. The Nintendo Wii Balance Board, which is part of the Wii Fit game, could be an effective tool to monitor progress during rehabilitation because the board and game can provide objective measures of balance. However, the validity and reliability of Wii Fit balance scores remain unknown. Objective: To determine the concurrent validity of balance scores produced by the Wii Fit game and the intrasession and intersession reliability of Wii Fit balance scores. Design: Descriptive laboratory study. Setting: Sports medicine research laboratory. Patients or Other Participants: Forty-five recreationally active participants (age = 27.0 ± 9.8 years, height = 170.9 ± 9.2 cm, mass = 72.4 ± 11.8 kg) with a heterogeneous history of lower extremity injury. Intervention(s): Participants completed a single-limb–stance task on a force plate and the Star Excursion Balance Test (SEBT) during the first test session. Twelve Wii Fit balance activities were completed during 2 test sessions separated by 1 week. Main Outcome Measure(s): Postural sway in the anteroposterior (AP) and mediolateral (ML) directions and the AP, ML, and resultant center-of-pressure (COP) excursions were calculated from the single-limb stance. The normalized reach distance was recorded for the anterior, posteromedial, and posterolateral directions of the SEBT. Wii Fit balance scores that the game software generated also were recorded. Results: All 96 of the calculated correlation coefficients among Wii Fit activity outcomes and established balance outcomes were interpreted as poor (r < 0.50). Intrasession reliability for Wii Fit balance activity scores ranged from good (intraclass correlation coefficient [ICC] = 0.80) to poor (ICC = 0.39), with 8 activities having poor intrasession reliability. Similarly, 11 of the 12 Wii Fit balance activity scores demonstrated poor intersession reliability, with scores ranging from fair (ICC = 0.74) to poor (ICC = 0.29). Conclusions: Wii Fit balance activity scores had poor concurrent validity relative to COP outcomes and SEBT reach distances. In addition, the included Wii Fit balance activity scores generally had poor intrasession and intersession reliability. PMID:22892412
Vispoel, Walter P; Kim, Han Yi
2014-09-01
[Correction Notice: An Erratum for this article was reported in Vol 26(3) of Psychological Assessment (see record 2014-16017-001). The mean, standard deviation and alpha coefficient originally reported in Table 1 should be 74.317, 10.214 and .802, respectively. The validity coefficients in the last column of Table 4 are affected as well. Correcting this error did not change the substantive interpretations of the results, but did increase the mean, standard deviation, alpha coefficient, and validity coefficients reported for the Honesty subscale in the text and in Tables 1 and 4. The corrected versions of Tables 1 and Table 4 are shown in the erratum.] Item response theory (IRT) models were applied to dichotomous and polytomous scoring of the Self-Deceptive Enhancement and Impression Management subscales of the Balanced Inventory of Desirable Responding (Paulhus, 1991, 1999). Two dichotomous scoring methods reflecting exaggerated endorsement and exaggerated denial of socially desirable behaviors were examined. The 1- and 2-parameter logistic models (1PLM, 2PLM, respectively) were applied to dichotomous responses, and the partial credit model (PCM) and graded response model (GRM) were applied to polytomous responses. For both subscales, the 2PLM fit dichotomous responses better than did the 1PLM, and the GRM fit polytomous responses better than did the PCM. Polytomous GRM and raw scores for both subscales yielded higher test-retest and convergent validity coefficients than did PCM, 1PLM, 2PLM, and dichotomous raw scores. Information plots showed that the GRM provided consistently high measurement precision that was superior to that of all other IRT models over the full range of both construct continuums. Dichotomous scores reflecting exaggerated endorsement of socially desirable behaviors provided noticeably weak precision at low levels of the construct continuums, calling into question the use of such scores for detecting instances of "faking bad." Dichotomous models reflecting exaggerated denial of the same behaviors yielded much better precision at low levels of the constructs, but it was still less precision than that of the GRM. These results support polytomous over dichotomous scoring in general, alternative dichotomous scoring for detecting faking bad, and extension of GRM scoring to situations in which IRT offers additional practical advantages over classical test theory (adaptive testing, equating, linking, scaling, detecting differential item functioning, and so forth). PsycINFO Database Record (c) 2014 APA, all rights reserved.
Some Considerations on the Partial Credit Model
ERIC Educational Resources Information Center
Verhelst, N. D.; Verstralen, H. H. F. M.
2008-01-01
The Partial Credit Model (PCM) is sometimes interpreted as a model for stepwise solution of polytomously scored items, where the item parameters are interpreted as difficulties of the steps. It is argued that this interpretation is not justified. A model for stepwise solution is discussed. It is shown that the PCM is suited to model sums of binary…
Kent, Peter; Stochkendahl, Mette Jensen; Christensen, Henrik Wulff; Kongsted, Alice
2015-01-01
Recognition of homogeneous subgroups of patients can usefully improve prediction of their outcomes and the targeting of treatment. There are a number of research approaches that have been used to recognise homogeneity in such subgroups and to test their implications. One approach is to use statistical clustering techniques, such as Cluster Analysis or Latent Class Analysis, to detect latent relationships between patient characteristics. Influential patient characteristics can come from diverse domains of health, such as pain, activity limitation, physical impairment, social role participation, psychological factors, biomarkers and imaging. However, such 'whole person' research may result in data-driven subgroups that are complex, difficult to interpret and challenging to recognise clinically. This paper describes a novel approach to applying statistical clustering techniques that may improve the clinical interpretability of derived subgroups and reduce sample size requirements. This approach involves clustering in two sequential stages. The first stage involves clustering within health domains and therefore requires creating as many clustering models as there are health domains in the available data. This first stage produces scoring patterns within each domain. The second stage involves clustering using the scoring patterns from each health domain (from the first stage) to identify subgroups across all domains. We illustrate this using chest pain data from the baseline presentation of 580 patients. The new two-stage clustering resulted in two subgroups that approximated the classic textbook descriptions of musculoskeletal chest pain and atypical angina chest pain. The traditional single-stage clustering resulted in five clusters that were also clinically recognisable but displayed less distinct differences. In this paper, a new approach to using clustering techniques to identify clinically useful subgroups of patients is suggested. Research designs, statistical methods and outcome metrics suitable for performing that testing are also described. This approach has potential benefits but requires broad testing, in multiple patient samples, to determine its clinical value. The usefulness of the approach is likely to be context-specific, depending on the characteristics of the available data and the research question being asked of it.
Psychometrics of the neonatal oral motor assessment scale
Zarem, Cori S; Kidokoro, Hiroyuki; Neil, Jeffrey; Wallendorf, Michael; Inder, Terrie; Pineda, Roberta
2013-01-01
AIM To establish the psychometrics of the Neonatal Oral Motor Assessment Scale (NOMAS). METHOD In this prospective cohort study of 75 preterm infants (39 females,36 males) born at 30 weeks' or less gestation (mean gestational age 26.56wk, SD 1.90, range 23–30wk; mean birthweight 967.33g, SD 288.54, range 480–2240), oral feeding was videotaped before discharge from the neonatal intensive care unit (NICU) discharge. The NOMAS was used to classify feeding as normal, disorganized, or dysfunctional. Neurobehavior was assessed at term equivalent, and infants underwent magnetic resonance imaging. Children returned for developmental testing at 2 years corrected age. Associations between NOMAS scores and (1) neurobehavior, (2) cerebral injury and metrics, and (3) developmental outcome were investigated using χ2-analyses, t-tests, and linear regression. For reliability, six certified NOMAS evaluators rated five randomly selected NOMAS recordings and re-scored them in a second randomized order. Reliability was calculated with Cohen’s kappa coefficient. RESULTS Dysfunctional NOMAS scores were associated with lower Dubowitz scores [t=–2.14; mean difference –2.32 (95% confidence interval [CI] –0.157 to –4.49); p=0.036], higher stress on the NICU Network Neurobehavioral Scale (t=2.61; mean difference 0.073 [95% CI 0.017 to 0.129]; p=0.0110, and decreased transcerebellar diameter (t=–2.22; mean difference –2.04 [CI=–3.89 to –0.203]; p=0.03). No significant associations were found between NOMAS scores and 2 year outcome. INTERPRETATION Some concurrent validity was established with associations between NOMAS scores and measures of infant behavior and cerebral structure. The NOMAS did not show predictive validity in this study of preterm infants at high risk of developmental delay. Reliability was variable and suboptimal. PMID:23869958
Fabbiani, Massimiliano; Grima, Pierfrancesco; Milanini, Benedetta; Mondi, Annalisa; Baldonero, Eleonora; Ciccarelli, Nicoletta; Cauda, Roberto; Silveri, Maria C; De Luca, Andrea; Di Giambenedetto, Simona
2015-01-01
The aim of the study was to explore how viral resistance and antiretroviral central nervous system (CNS) penetration could impact on cognitive performance of HIV-infected patients. We performed a multicentre cross-sectional study enrolling HIV-infected patients undergoing neuropsychological testing, with a previous genotypic resistance test on plasma samples. CNS penetration-effectiveness (CPE) scores and genotypic susceptibility scores (GSS) were calculated for each regimen. A composite score (CPE-GSS) was then constructed. Factors associated with cognitive impairment were investigated by logistic regression analysis. A total of 215 patients were included. Mean CPE was 7.1 (95% CI 6.9, 7.3) with 206 (95.8%) patients showing a CPE≥6. GSS correction decreased the CPE value in 21.4% (mean 6.5, 95% CI 6.3, 6.7), 26.5% (mean 6.4, 95% CI 6.1, 6.6) and 24.2% (mean 6.4, 95% CI 6.2, 6.6) of subjects using ANRS, HIVDB and REGA rules, respectively. Overall, 66 (30.7%) patients were considered cognitively impaired. No significant association could be demonstrated between CPE and cognitive impairment. However, higher GSS-CPE was associated with a lower risk of cognitive impairment (CPE-GSSANRS odds ratio 0.75, P=0.022; CPE-GSSHIVDB odds ratio 0.77, P=0.038; CPE-GSSREGA odds ratio 0.78, P=0.038). Overall, a cutoff of CPE-GSS≥5 seemed the most discriminatory according to each different interpretation system. GSS-corrected CPE score showed a better correlation with neurocognitive performance than the standard CPE score. These results suggest that antiretroviral drug susceptibility, besides drug CNS penetration, can play a role in the control of HIV-associated neurocognitive disorders.
Diagnostic accuracy of FEV1/forced vital capacity ratio z scores in asthmatic patients.
Lambert, Allison; Drummond, M Bradley; Wei, Christine; Irvin, Charles; Kaminsky, David; McCormack, Meredith; Wise, Robert
2015-09-01
The FEV1/forced vital capacity (FVC) ratio is used as a criterion for airflow obstruction; however, the test characteristics of spirometry in the diagnosis of asthma are not well established. The accuracy of a test depends on the pretest probability of disease. We wanted to estimate the FEV1/FVC ratio z score threshold with optimal accuracy for the diagnosis of asthma for different pretest probabilities. Asthmatic patients enrolled in 4 trials from the Asthma Clinical Research Centers were included in this analysis. Measured and predicted FEV1/FVC ratios were obtained, with calculation of z scores for each participant. Across a range of asthma prevalences and z score thresholds, the overall diagnostic accuracy was calculated. One thousand six hundred eight participants were included (mean age, 39 years; 71% female; 61% white). The mean FEV1 percent predicted value was 83% (SD, 15%). In a symptomatic population with 50% pretest probability of asthma, optimal accuracy (68%) is achieved with a z score threshold of -1.0 (16th percentile), corresponding to a 6 percentage point reduction from the predicted ratio. However, in a screening population with a 5% pretest probability of asthma, the optimum z score is -2.0 (second percentile), corresponding to a 12 percentage point reduction from the predicted ratio. These findings were not altered by markers of disease control. Reduction of the FEV1/FVC ratio can support the diagnosis of asthma; however, the ratio is neither sensitive nor specific enough for diagnostic accuracy. When interpreting spirometric results, consideration of the pretest probability is an important consideration in the diagnosis of asthma based on airflow limitation. Copyright © 2015 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
Bensalem-Owen, Meriem; Chau, Destiny F; Sardam, Sean C; Fahy, Brenda G
2011-08-23
Educational methods for residents are shifting toward greater learner independence aided by technological advances. A Web-based program using a podcast was created for resident EEG instruction, replacing conventional didactics. The EEG curriculum also consisted of EEG interpretations under the tutelage of a neurophysiologist. This pilot study aimed to objectively evaluate the effectiveness of the podcast as a new teaching tool. A podcast for resident EEG instruction was implemented on the Web, replacing the traditional lecture. After Institutional Review Board approval, consent was obtained from the participating residents. Using 25-question evaluation tools, participants were assessed at baseline before any EEG instruction, and reassessed after podcasting and after 10 clinical EEG exposures. Each 25-item evaluation tool contained tracings used for clinical EEG interpretations. Scores after podcast training were also compared to scores after traditional didactic training from a previous study among anesthesiology trainees. Ten anesthesiology residents completed the study. The mean scores with standard deviations are 9.50 ± 2.92 at baseline, 13.40 ± 3.31 (p = 0.034) after the podcast, and 16.20 ± 1.87 (p = 0.019) after interpreting 10 EEGs. No differences were noted between the mean educational tool scores for those who underwent podcasting training compared to those who had undergone traditional didactic training. In this pilot study, podcast training was as effective as the prior conventional lecture in meeting the curricular goals of increasing EEG knowledge after 10 EEG interpretations as measured by assessment tools.
Grigoriadis, Themos; Giannoulis, George; Zacharakis, Dimitris; Protopapas, Athanasios; Cardozo, Linda; Athanasiou, Stavros
2016-03-01
The purpose of the study was to examine whether a test performed during urodynamics, the "1-3-5 cough test", could determine the severity of urodynamic stress incontinence (USI). We included women referred for urodynamics who were diagnosed with USI. The "1-3-5 cough test" was performed to grade the severity of USI at the completion of filling cystometry. A diagnosis of "severe", "moderate" or "mild" USI was given if urine leakage was observed after one, three or five consecutive coughs respectively. We examined the associations between grades of USI severity and measures of subjective perception of stress urinary incontinence (SUI): International Consultation of Incontinence Modular Questionnaire-Female Lower Urinary Tract Symptom (ICIQ-FLUTS), King's Health Questionnaire (KHQ), Urinary Distress Inventory-6 (UDI-6), Urinary Impact Questionnaire-7 (UIQ-7). A total of 1,181 patients completed the ICIQ-FLUTS and KHQ and 612 completed the UDI-6 and UIQ-7 questionnaires. There was a statistically significant association of higher grades of USI severity with higher scores of the incontinence domain of the ICIQ-FLUTS. The scores of the UDI-6, UIQ-7 and of all KHQ domains (with the exception of general health perception and personal relationships) had statistically significant larger mean values for higher USI severity grade. Groups of higher USI severity had statistically significant associations with higher scores of most of the subjective measures of SUI. Severity of USI, as defined by the "1-3-5 cough test", was associated with the severity of subjective measures of SUI. This test may be a useful tool for the objective interpretation of patients with SUI who undergo urodynamics.
ERIC Educational Resources Information Center
Rupp, André A.
2018-01-01
This article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the…
Apparently abnormal Wechsler Memory Scale index score patterns in the normal population.
Carrasco, Roman Marcus; Grups, Josefine; Evans, Brittney; Simco, Edward; Mittenberg, Wiley
2015-01-01
Interpretation of the Wechsler Memory Scale-Fourth Edition may involve examination of multiple memory index score contrasts and similar comparisons with Wechsler Adult Intelligence Scale-Fourth Edition ability indexes. Standardization sample data suggest that 15-point differences between any specific pair of index scores are relatively uncommon in normal individuals, but these base rates refer to a comparison between a single pair of indexes rather than multiple simultaneous comparisons among indexes. This study provides normative data for the occurrence of multiple index score differences calculated by using Monte Carlo simulations and validated against standardization data. Differences of 15 points between any two memory indexes or between memory and ability indexes occurred in 60% and 48% of the normative sample, respectively. Wechsler index score discrepancies are normally common and therefore not clinically meaningful when numerous such comparisons are made. Explicit prior interpretive hypotheses are necessary to reduce the number of index comparisons and associated false-positive conclusions. Monte Carlo simulation accurately predicts these false-positive rates.
Palomaki, Glenn E.; Deciu, Cosmin; Kloza, Edward M.; Lambert-Messerlian, Geralyn M.; Haddow, James E.; Neveux, Louis M.; Ehrich, Mathias; van den Boom, Dirk; Bombard, Allan T.; Grody, Wayne W.; Nelson, Stanley F.; Canick, Jacob A.
2012-01-01
Purpose: To determine whether maternal plasma cell–free DNA sequencing can effectively identify trisomy 18 and 13. Methods: Sixty-two pregnancies with trisomy 18 and 12 with trisomy 13 were selected from a cohort of 4,664 pregnancies along with matched euploid controls (including 212 additional Down syndrome and matched controls already reported), and their samples tested using a laboratory-developed, next-generation sequencing test. Interpretation of the results for chromosome 18 and 13 included adjustment for CG content bias. Results: Among the 99.1% of samples interpreted (1,971/1,988), observed trisomy 18 and 13 detection rates were 100% (59/59) and 91.7% (11/12) at false-positive rates of 0.28% and 0.97%, respectively. Among the 17 samples without an interpretation, three were trisomy 18. If z-score cutoffs for trisomy 18 and 13 were raised slightly, the overall false-positive rates for the three aneuploidies could be as low as 0.1% (2/1,688) at an overall detection rate of 98.9% (280/283) for common aneuploidies. An independent academic laboratory confirmed performance in a subset. Conclusion: Among high-risk pregnancies, sequencing circulating cell–free DNA detects nearly all cases of Down syndrome, trisomy 18, and trisomy 13, at a low false-positive rate. This can potentially reduce invasive diagnostic procedures and related fetal losses by 95%. Evidence supports clinical testing for these aneuploidies. PMID:22281937
Brenneman, Lauren; Cash, Elizabeth; Chermak, Gail D; Guenette, Linda; Masters, Gay; Musiek, Frank E; Brown, Mallory; Ceruti, Julianne; Fitzegerald, Krista; Geissler, Kristin; Gonzalez, Jennifer; Weihing, Jeffrey
2017-09-01
Pediatric central auditory processing disorder (CAPD) is frequently comorbid with other childhood disorders. However, few studies have examined the relationship between commonly used CAPD, language, and cognition tests within the same sample. The present study examined the relationship between diagnostic CAPD tests and "gold standard" measures of language and cognitive ability, the Clinical Evaluation of Language Fundamentals (CELF) and the Wechsler Intelligence Scale for Children (WISC). A retrospective study. Twenty-seven patients referred for CAPD testing who scored average or better on the CELF and low average or better on the WISC were initially included. Seven children who scored below the CELF and/or WISC inclusion criteria were then added to the dataset for a second analysis, yielding a sample size of 34. Participants were administered a CAPD battery that included at least the following three CAPD tests: Frequency Patterns (FP), Dichotic Digits (DD), and Competing Sentences (CS). In addition, they were administered the CELF and WISC. Relationships between scores on CAPD, language (CELF), and cognition (WISC) tests were examined using correlation analysis. DD and FP showed significant correlations with Full Scale Intelligence Quotient, and the DD left ear and the DD interaural difference measures both showed significant correlations with working memory. However, ∼80% or more of the variance in these CAPD tests was unexplained by language and cognition measures. Language and cognition measures were more strongly correlated with each other than were the CAPD tests with any CELF or WISC scale. Additional correlations with the CAPD tests were revealed when patients who scored in the mild-moderate deficit range on the CELF and/or in the borderline low intellectual functioning range on the WISC were included in the analysis. While both the DD and FP tests showed significant correlations with one or more cognition measures, the majority of the variance in these CAPD measures went unexplained by cognition. Unlike DD and FP, the CS test was not correlated with cognition. Additionally, language measures were not significantly correlated with any of the CAPD tests. Our findings emphasize that the outcomes and interpretation of results vary as a function of the subject inclusion criteria that are applied for the CELF and WISC. Including participants with poorer cognition and/or language scores increased the number of significant correlations observed. For this reason, it is important that studies investigating the relationship between CAPD and other domains or disorders report the specific inclusion criteria used for all tests. American Academy of Audiology
Changiz, Tahereh; Haghani, Fariba; Nowroozi, Nasim
2013-01-01
Introduction: Appropriate instructional design plays a crucial role in e-learning success, and analyzing learners is the cornerstone for instructional design process. Students’ readiness for e-learning was assessed in the present study as an example of learner analysis for a distance course in medical education master program. Materials and Methods: A census sample of 23 students applied for distance master program on medical education, completed the “Students’ E-Learning Readiness Scale” developed by Watkins, via email. The reliability and validity of the scale has been confirmed before. Average scores in total and 6 subscales were calculated. The score range was 1-5 and scores above 3 indicated good readiness. Data was interpreted using descriptive and non-parametric tests (Mann-Whitney U and Kruskal-Wallis). Results: Response rate was 100%. The students’ readiness scores in total and all subscales (“technology access”, “online skills and relationships”, “motivation”, “online audio/video”, “readiness for online discussions”, and “importance of e-learning to your success”) were above 3. Comparing different subscales, students’ mean scores in “motivation” and “internet discussion” subscales were less than others, although the difference was not significant. There were no significant gender differences in the readiness scores. Students who were academic staff had significantly higher scores than others in total and in “motivation” and “online skills and relationship” subscales. Conclusion: Good learners’ readiness, observed in the present study, may imply that the instructional designer can rely on e-learning strategies and build the course upon them. However, according to the slightly lower scores in “motivation” and “online discussion” subscales, it is recommended to stress more on strategies that improve these two components. To generalize the results, it is needed to test students’ readiness in more different degree programs. PMID:24524090
Theys, Kristof; Abecasis, Ana; Libin, Pieter; Gomes, Perpe Tua; Cabanas, Joaquim; Camacho, Ricardo J; Van Laethem, Kristel
2015-09-01
Dolutegravir is approved for the treatment of HIV-1 patients exposed to other integrase inhibitors, but the decision to use dolutegravir in this setting should be informed by drug resistance testing. This study determined the extent of disagreement in predicted residual dolutegravir activity after raltegravir use, and identified individual mutational patterns for which uncertainty exists among HIV-1 expert systems. Mutation patterns were classified in raltegravir signature pathways including positions 143, 148 and 155, and interpreted into clinically informative resistance levels using genotypic drug resistance interpretation systems ANRS v24, HIVdb v7.0 and Rega v9.1.0, and instructions of dolutegravir use as approved by the Food and Drug Administration and the European Medicines Agency. In 216HIV-1 patients failing raltegravir-therapy, 87% patients displayed mutations associated with resistance towards integrase inhibitors. A total of 141 unique mutational patterns were observed, with N155H (25.4%), Q148H (16.2%) and Y143R (8.3%) the most prevalent signature mutations. The Q148 pathway occurred almost exclusively in HIV-1 subtype B viruses. Concordances in predicted dolutegravir susceptibility scores among 5 systems were obtained in 57.8% of patients, and concordant intermediate resistant and concordant resistant scores were only observed in 6.5% and 0.9% of patients, respectively. However, systems individually scored higher levels of dolutegravir intermediate resistance and resistance, ranging from 4.2% to 10.2% and from 14.8% to 22.7% of patients, respectively. A consensus on interpreting the extent of residual activity was lacking in 34.7% of patients and was highly resistance pathway-specific. Dolutegravir may potentially be effective in the majority of HIV-1 patients failing raltegravir, but concern over the uncertainty in predicted residual activity could withhold clinicians from prescribing dolutegravir during its clinical assessment. Copyright © 2015 Elsevier B.V. All rights reserved.
Vink, W D; Jones, G; Johnson, W O; Brown, J; Demirkan, I; Carter, S D; French, N P
2009-11-15
Bovine digital dermatitis (BDD) is an epidermitis which is a leading cause of infectious lameness. The only recognized diagnostic test is foot inspection, which is a labour-intensive procedure. There is no universally recognized, standardized lesion scoring system. As small lesions are easily missed, foot inspection has limited diagnostic sensitivity. Furthermore, interpretation is subjective, and prone to observer bias. Serology is more convenient to carry out and is potentially a more sensitive indicator of infection. By carrying out 20 serological assays using lesion-associated Treponema spp. isolates, three serogroups were identified. The reliability of the tests was established by assessing the level of agreement and the concordance correlation coefficient. Subsequently, an ELISA suitable for routine use was developed. The benchmark of diagnostic test validation is conventionally the determination of the key test parameters, sensitivity and specificity. This requires the imposition of a cut-off point. For serological assays with outcomes on a continuous scale, the degree by which the test result differs from this cut-off is disregarded. Bayesian statistical methodology has been developed which enables the assay result also to be interpreted on a continuous scale, thereby optimizing the information inherent in the test. Using a cross-sectional study dataset carried out on 8 representative dairy farms in the UK, the probability of infection, P(I), of each individual animal was estimated in the absence of a 'Gold Standard' by modelling I as a latent variable which was determined by lesion status, L as well as serology, S. Covariate data (foot hygiene score and age) were utilized to estimate P(L) when no lesion inspection was performed. Informative prior distributions were elicited where possible. The model was utilized for predictive inference, by computing estimates of P(I) and P(L) independently of the data. A more detailed and informative analysis of the farm-level distribution of infection could thus be performed. Also, biases associated with the subjective interpretation of lesion status were minimized. Model outputs showed that young stock were unlikely to be infected, whereas cows tended to have high or low probabilities of being infected. Estimates of probability of infection were considerably higher for animals with lesions than for those without. Associations were identified between both covariates and probability of infection in cows, but not in the young stock. Under the condition that the model assumptions are valid for the larger population, the results of this work can be generalized by predictive inference.
Weech-Maldonado, Robert; Dreachslin, Janice L.; Brown, Julie; Pradhan, Rohit; Rubin, Kelly L.; Schiller, Cameron; Hays, Ron D.
2016-01-01
Background The U.S. national standards for culturally and linguistically appropriate services (CLAS) in health care provide guidelines on policies and practices aimed at developing culturally competent systems of care. The Cultural Competency Assessment Tool for Hospitals (CCATH) was developed as an organizational tool to assess adherence to the CLAS standards. Purposes First, we describe the development of the CCATH and estimate the reliability and validity of the CCATH measures. Second, we discuss the managerial implications of the CCATH as an organizational tool to assess cultural competency. Methodology/Approach We pilot tested an initial draft of the CCATH, revised it based on a focus group and cognitive interviews, and then administered it in a field test with a sample of California hospitals. The reliability and validity of the CCATH were evaluated using factor analysis, analysis of variance, and Cronbach’s alphas. Findings Exploratory and confirmatory factor analyses identified 12 CCATH composites: leadership and strategic planning, data collection on inpatient population, data collection on service area, performance management systems and quality improvement, human resources practices, diversity training, community representation, availability of interpreter services, interpreter services policies, quality of interpreter services, translation of written materials, and clinical cultural competency practices. All the CCATH scales had internal consistency reliability of .65 or above, and the reliability was .70 or above for 9 of the 12 scales. Analysis of variance results showed that not-for-profit hospitals have higher CCATH scores than for-profit hospitals in five CCATH scales and higher CCATH scores than government hospitals in two CCATH scales. Practice Implications The CCATH showed adequate psychometric properties. Managers and policy makers can use the CCATH as a tool to evaluate hospital performance in cultural competency and identify and target improvements in hospital policies and practices that undergird the provision of CLAS. PMID:21934511
Prabhakar, Anand M; Gottumukkala, Ravi V; Wang, Wenyi; Hughes, Danny R; Duszak, Richard
2018-05-07
Nationally, nonradiologists interpret an increasing proportion of lower extremity venous duplex ultrasound (LEVDU) examinations. We aimed to study day of week, site of service, and patient complexity differences in LEVDU services interpreted by radiologists versus nonradiologists. Using carrier claims files for a 5% national sample of Medicare beneficiaries from 2012 to 2015, we retrospectively classified all LEVDU examinations by physician specialty (radiologist versus nonradiologist), day of week (weekday versus weekend), site of service, and patient Charlson Comorbidity Index (CCI) scores. Pearson's χ 2 was used to test statistical significance. Of 760,433 LEVDU examinations for which provider specialty could be determined, 439,964 (58%) were interpreted by radiologists and 320,469 (42%) by nonradiologists. On weekends, radiologists interpreted 75% (66,094 of 88,244) and nonradiologists 25% (22,150 of 88,244) (P < .0001). Of LEVDU examinations interpreted by radiologists, 57% were performed in the inpatient or emergency department settings, and 70% of LEVDU examinations interpreted by nonradiologists were performed in the private office or outpatient hospital setting. Radiologists interpreted a slightly larger proportion (17%) of their examinations on patients with more comorbidities (CCI of ≥3) than nonradiologists (15%) (P < .0001). Compared with nonradiologists, radiologists interpret a disproportionately larger share of weekend (versus weekday) LEVDU examinations and a considerably larger proportion in higher acuity settings. Additionally, the patients on whom they render services have more comorbidities. To optimize around-the-clock patient access to necessary imaging, emerging quality payment programs should consider the timing and sites of service, as well as patient complexity. Copyright © 2018 American College of Radiology. Published by Elsevier Inc. All rights reserved.
Saunders, Gabrielle H; Forsline, Anna
2006-06-01
Results of objective clinical tests (e.g., measures of speech understanding in noise) often conflict with subjective reports of hearing aid benefit and satisfaction. The Performance-Perceptual Test (PPT) is an outcome measure in which objective and subjective evaluations are made by using the same test materials, testing format, and unit of measurement (signal-to-noise ratio, S/N), permitting a direct comparison between measured and perceived ability to hear. Two variables are measured: a Performance Speech Reception Threshold in Noise (SRTN) for 50% correct performance and a Perceptual SRTN, which is the S/N at which listeners perceive that they can understand the speech material. A third variable is computed: the Performance-Perceptual Discrepancy (PPDIS); it is the difference between the Performance and Perceptual SRTNs and measures the extent to which listeners "misjudge" their hearing ability. Saunders et al. in 2004 examined the relation between PPT scores and unaided hearing handicap. In this publication, the relations between the PPT, residual aided handicap, and hearing aid satisfaction are described. Ninety-four individuals between the ages of 47 and 86 yr participated. All had symmetrical sensorineural hearing loss and had worn binaural hearing aids for at least 6 wk before participating. All subjects underwent routine audiological examination and completed the PPT, the Hearing Handicap Inventory for the Elderly/Adults (HHIE/A), and the Satisfaction for Amplification in Daily Life questionnaire. Sixty-five subjects attended one research visit for participation in this study, and 29 attended a second visit to complete the PPT a second time. Performance and Perceptual SRTN and PPDIS scores were normally distributed and showed excellent test-retest reliability. Aided SRTNs were significantly better than unaided SRTNs; aided and unaided PPDIS values did not differ. Stepwise multiple linear regression showed that the PPDIS, the Performance SRTN, and age were significant predictors of scores on the HHIE/A such that greater reported handicap is associated with underestimating hearing ability, poorer aided ability to understand speech in noise, and being younger. Scores on the Satisfaction with Amplification in Daily Life were not well explained by the PPT, age, or audiometric thresholds. When individuals were grouped by their HHIE/A scores, it was seen that individuals who report more handicap than expected based on their audiometric thresholds, have a more negative PPDIS, i.e., underestimate their hearing ability, relative to individuals who report expected handicap, who in turn have a more negative PPDIS than individuals who report less handicap than expected. No such patterns were apparent for the Performance SRTN. The study showed the PPT to be a reliable outcome measure that can provide more information than a performance measure and/or a questionnaire measure alone, in that the PPDIS can provide the clinician with an explanation for discrepant objective and subjective reports of hearing difficulties. The finding that self-reported handicap is affected independently by both actual ability to hear and the (mis)perception of ability to hear underscores the difficulty clinicians encounter when trying to interpret outcomes questionnaires. We suggest that this variable should be measured and taken into account when interpreting questionnaires and counseling patients.
Ivins, Brian J; Lange, Rael T; Cole, Wesley R; Kane, Robert; Schwab, Karen A; Iverson, Grant L
2015-02-01
Base rates of low ANAM4 TBI-MIL scores were calculated in a convenience sample of 733 healthy male active duty soldiers using available military reference values for the following cutoffs: ≤2nd percentile (2 SDs), ≤5th percentile, <10th percentile, and <16th percentile (1 SD). Rates of low scores were also calculated in 56 active duty male soldiers who sustained an mTBI an average of 23 days (SD = 36.1) prior. 22.0% of the healthy sample and 51.8% of the mTBI sample had two or more scores below 1 SD (i.e., 16th percentile). 18.8% of the healthy sample and 44.6% of the mTBI sample had one or more scores ≤5th percentile. Rates of low scores in the healthy sample were influenced by cutoffs and race/ethnicity. Importantly, some healthy soldiers obtain at least one low score on ANAM4. These base rate analyses can improve the methodology for interpreting ANAM4 performance in clinical practice and research. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Rios, Anthony; Kavuluru, Ramakanth
2017-11-01
The CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing (NLP) provided a set of 1000 neuropsychiatric notes to participants as part of a competition to predict psychiatric symptom severity scores. This paper summarizes our methods, results, and experiences based on our participation in the second track of the shared task. Classical methods of text classification usually fall into one of three problem types: binary, multi-class, and multi-label classification. In this effort, we study ordinal regression problems with text data where misclassifications are penalized differently based on how far apart the ground truth and model predictions are on the ordinal scale. Specifically, we present our entries (methods and results) in the N-GRID shared task in predicting research domain criteria (RDoC) positive valence ordinal symptom severity scores (absent, mild, moderate, and severe) from psychiatric notes. We propose a novel convolutional neural network (CNN) model designed to handle ordinal regression tasks on psychiatric notes. Broadly speaking, our model combines an ordinal loss function, a CNN, and conventional feature engineering (wide features) into a single model which is learned end-to-end. Given interpretability is an important concern with nonlinear models, we apply a recent approach called locally interpretable model-agnostic explanation (LIME) to identify important words that lead to instance specific predictions. Our best model entered into the shared task placed third among 24 teams and scored a macro mean absolute error (MMAE) based normalized score (100·(1-MMAE)) of 83.86. Since the competition, we improved our score (using basic ensembling) to 85.55, comparable with the winning shared task entry. Applying LIME to model predictions, we demonstrate the feasibility of instance specific prediction interpretation by identifying words that led to a particular decision. In this paper, we present a method that successfully uses wide features and an ordinal loss function applied to convolutional neural networks for ordinal text classification specifically in predicting psychiatric symptom severity scores. Our approach leads to excellent performance on the N-GRID shared task and is also amenable to interpretability using existing model-agnostic approaches. Copyright © 2017 Elsevier Inc. All rights reserved.
Sakai, Hiromi; Nagano, Akinori; Seki, Keiko; Okahashi, Sayaka; Kojima, Maki; Luo, Zhiwei
2018-07-01
We developed a virtual reality test to assess the cognitive function of Japanese people in near-daily-life environment, namely, a virtual shopping test (VST). In this test, participants were asked to execute shopping tasks using touch panel operations in a "virtual shopping mall." We examined differences in VST performances among healthy participants of different ages and correlations between VST and screening tests, such as the Mini-Mental State Examination (MMSE) and Everyday Memory Checklist (EMC). We included 285 healthy participants between 20 and 86 years of age in seven age groups. Therefore, each VST index tended to decrease with advancing age; differences among age groups were significant. Most VST indices had a significantly negative correlation with MMSE and significantly positive correlation with EMC. VST may be useful for assessing general cognitive decline; effects of age must be considered for proper interpretation of the VST scores.
Pearson, J L; Ferguson, L R
1989-01-01
Relationships were explored among three measures of spatial ability--the Embedded Figures Test (EFT), the Mental Rotations Test (MRT), and the Differential Aptitude Spatial Relations subtest (DAT)--an environmental cognition task (MAP), American College Testing (ACT) math and English achievement, and gender in a sample of 282 undergraduates. Variance attributable to gender among the spatial tasks ranged from 0.5% in the EFT to 12% in the MRT. Gender accounted for only 1% of the variance in the MAP task. Gender differences were noted in regression analyses; women's math and English achievement scores were both predictive of spatial ability, while for men, only math achievement was predictive of spatial ability. The results were interpreted as substantiating sex role socialization theory of cognitive abilities.
Detectable changes in physical performance measures in elderly African Americans.
Mangione, Kathleen Kline; Craik, Rebecca L; McCormick, Alyson A; Blevins, Heather L; White, Meaghan B; Sullivan-Marx, Eileen M; Tomlinson, James D
2010-06-01
African American older adults have higher rates of self-reported disability and lower physical performance scores compared with white older adults. Measures of physical performance are used to predict future morbidity and to determine the effect of exercise. Characteristics of performance measures are not known for African American older adults. The purpose of this study was to estimate the standard error of measurement (SEM) and minimal detectable change (MDC) for the Short Physical Performance Battery (SPPB), Timed "Up & Go" Test (TUG) time, free gait speed, fast gait speed, and Six-Minute Walk Test (6MWT) distance in frail African American adults. This observational measurement study used a test-retest design. Individuals were tested 2 times over a 1-week period. Demographic data collected included height, weight, number of medications, assistive device use, and Mini-Mental Status Examination (MMSE) scores. Participants then completed the 5 physical performance tests. Fifty-two participants (mean age=78 years) completed the study. The average MMSE score was 25 points, and the average body mass index was 29.4 kg/m(2). On average, participants took 7 medications, and the majority used assistive devices. Intraclass correlation coefficients (ICC [2,1]) were greater than .90, except for the SPPB score (ICC=.81). The SEMs were 1.2 points for the SPPB, 1.7 seconds for the TUG, 0.08 m/s for free gait speed, 0.09 m/s for fast gait speed, and 28 m for 6MWT distance. The MDC values were 2.9 points for the SPPB, 4 seconds for the TUG, 0.19 m/s for free gait speed, 0.21 m/s for fast gait speed, and 65 m for 6MWT distance. The entire sample was from an urban area. The SEMs were similar to previously reported values and can be used when working with African American and white older adults. Estimates of MDC were calculated to assist in clinical interpretation.
Lovett, Rosemary; Summerfield, Quentin; Vickers, Deborah
2013-06-01
The Toy Discrimination Test measures children's ability to discriminate spoken words. Previous assessments of reliability tested children with normal hearing or mild hearing impairment, and most studies used a version of the test without a masking sound. We assessed test-retest reliability for children with hearing impairment using maskers of broadband noise and two-talker babble. Stimuli were presented from a loudspeaker. The signal-to-noise ratio (SNR) was varied adaptively to estimate the speech-reception threshold (SRT) corresponding to 70.7% correct performance. Participants completed each masked condition twice. Fifty-five children with permanent hearing impairment participated, aged 3.0 to 6.3 years. Thirty-four children used acoustic hearing aids; 21 children used cochlear implants. For the noise masker, the within-subject standard deviation of SRTs was 2.4 dB, and the correlation between first and second SRT was + 0.73. For the babble masker, corresponding values were 2.7 dB and + 0.60. Reliability was similar for children with hearing aids and children with cochlear implants. The results can inform the interpretation of scores from individual children. If a child completes a condition twice in different listening situations (e.g. aided and unaided), a difference between scores ≥ 7.5 dB would be statistically significant (p <.05).
Murray, Aja Louise; Allison, Carrie; Smith, Paula L; Baron-Cohen, Simon; Booth, Tom; Auyeung, Bonnie
2017-05-01
Diagnostic bias is a concern in autism spectrum conditions (ASC) where prevalence and presentation differ by sex. To ensure that females with ASC are not under-identified, it is important that ASC screening tools do not systematically underestimate autistic traits in females relative to males. We evaluated whether the AQ-10, a brief screen for ASC recommended by the National Institute of Clinical Excellence in cases of suspected ASC, exhibits such a bias. Using an item response theory approach, we evaluated differential item functioning and differential test functioning. We found that although individual items showed some sex bias, these biases at times favored males and at other times favored females. Thus, at the level of test scores the item-level biases cancelled out to give an unbiased overall score. Results support the continued use of the AQ-10 sum score in its current form; however, suggest that caution should be exercised when interpreting responses to individual items. The nature of the item level biases could serve as a guide for future research into how ASC affects males and females differently. Autism Res 2017, 10: 790-800. © 2016 International Society for Autism Research, Wiley Periodicals, Inc. © 2016 International Society for Autism Research, Wiley Periodicals, Inc.
The Level and Nature of Autistic Intelligence
Dawson, Michelle; Soulières, Isabelle; Ann Gernsbacher, Morton; Mottron, Laurent
2015-01-01
Autistics are presumed to be characterized by cognitive impairment, and their cognitive strengths (e.g., in Block Design performance) are frequently interpreted as low-level by-products of high-level deficits, not as direct manifestations of intelligence. Recent attempts to identify the neuroanatomical and neurofunctional signature of autism have been positioned on this universal, but untested, assumption. We therefore assessed a broad sample of 38 autistic children on the preeminent test of fluid intelligence, Raven’s Progressive Matrices. Their scores were, on average, 30 percentile points, and in some cases more than 70 percentile points, higher than their scores on the Wechsler scales of intelligence. Typically developing control children showed no such discrepancy, and a similar contrast was observed when a sample of autistic adults was compared with a sample of nonautistic adults. We conclude that intelligence has been underestimated in autistics. PMID:17680932
Lange, R; Thalbourne, M A; Houran, J; Storm, L
2000-12-01
The concept of transliminality ("a hypothesized tendency for psychological material to cross thresholds into or out of consciousness") was anticipated by William James (1902/1982), but it was only recently given an empirical definition by Thalbourne in terms of a 29-item Transliminality Scale. This article presents the 17-item Revised Transliminality Scale (or RTS) that corrects age and gender biases, is unidimensional by a Rasch criterion, and has a reliability of.82. The scale defines a probabilistic hierarchy of items that address magical ideation, mystical experience, absorption, hyperaesthesia, manic experience, dream interpretation, and fantasy proneness. These findings validate the suggestions by James and Thalbourne that some mental phenomena share a common underlying dimension with selected sensory experiences (such being overwhelmed by smells, bright lights, sights, and sounds). Low scores on transliminality remain correlated with "tough mindedness" in on Cattell 16PF test, as well as "self-control" and "rule consciousness," whereas high scores are associated with "abstractedness" and an "openness to change" on that test. An independent validation study confirmed the predictions implied by our definition of transliminality. Implications for test construction are discussed. Copyright 2000 Academic Press.
Concordance of Motion Sensor and Clinician-Rated Fall Risk Scores in Older Adults.
Elledge, Julie
2017-12-01
As the older adult population in the United States continues to grow, developing reliable, valid, and practical methods for identifying fall risk is a high priority. Falls are prevalent in older adults and contribute significantly to morbidity and mortality rates and rising health costs. Identifying at-risk older adults and intervening in a timely manner can reduce falls. Conventional fall risk assessment tools require a health professional trained in the use of each tool for administration and interpretation. Motion sensor technology, which uses three-dimensional cameras to measure patient movements, is promising for assessing older adults' fall risk because it could eliminate or reduce the need for provider oversight. The purpose of this study was to assess the concordance of fall risk scores as measured by a motion sensor device, the OmniVR Virtual Rehabilitation System, with clinician-rated fall risk scores in older adult outpatients undergoing physical rehabilitation. Three standardized fall risk assessments were administered by the OmniVR and by a clinician. Validity of the OmniVR was assessed by measuring the concordance between the two assessment methods. Stability of the OmniVR fall risk ratings was assessed by measuring test-retest reliability. The OmniVR scores showed high concordance with the clinician-rated scores and high stability over time, demonstrating comparability with provider measurements.
Chronic viral hepatitis: the histology report.
Guido, Maria; Mangia, Alessandra; Faa, Gavino
2011-03-01
In chronic viral hepatitis, the role of liver biopsy as a diagnostic test has seen a decline, paralleled by its increasing importance for prognostic purposes. Nowadays, the main indication for liver biopsy in chronic viral hepatitis is to assess the severity of the disease, in terms of both necro-inflammation (grade) and fibrosis (stage), which is important for prognosis and therapeutic management. Several scoring systems have been proposed for grading and staging chronic viral hepatitis and there is no a general consensus on the best system to be used in the daily practice. All scoring systems have their drawbacks and all may be affected by sampling and observer variability. Whatever the system used, a histological score is a reductive approach since damage in chronic viral hepatitis is a complex biological process. Thus, scoring systems are not intended to replace the detailed, descriptive, pathology report. In fact, lesions other than those scored for grading and staging may have clinical relevance and should be assessed and reported. This paper aims to provide a systematic approach to the interpretation of liver biopsies obtained in cases of chronic viral hepatitis, with the hope of helping general pathologists in their diagnostic practice. Copyright © 2011 Editrice Gastroenterologica Italiana S.r.l. Published by Elsevier Ltd.. All rights reserved.
Crowther, Mark; Cook, Deborah; Guyatt, Gordon; Zytaruk, Nicole; McDonald, Ellen; Williamson, David; Albert, Martin; Dodek, Peter; Finfer, Simon; Vallance, Shirley; Heels-Ansdell, Diane; McIntyre, Lauralyn; Mehta, Sangeeta; Lamontagne, Francois; Muscedere, John; Jacka, Michael; Lesur, Olivier; Kutsiogiannis, Jim; Friedrich, Jan; Klinger, James R; Qushmaq, Ismael; Burry, Lisa; Khwaja, Kosar; Sheppard, Jo-Ann; Warkentin, Theodore E
2014-06-01
Thrombocytopenia occurs in 20% to 45% of critically ill medical-surgical patients. The 4Ts heparin-induced thrombocytopenia (HIT) score (with 4 domains: Thrombocytopenia, Timing of thrombocytopenia, Thrombosis and oTher reason[s] for thrombocytopenia) might reliably identify patients at low risk for HIT. Interobserver agreement on 4Ts scoring is uncertain in this setting. To evaluate whether a published clinical prediction rule (the "4Ts score") reliably rules out HIT in "low-risk" intensive care unit (ICU) patients as assessed by research coordinators (who prospectively scored) and 2 adjudicators (who scored retrospectively) during an international heparin thromboprophylaxis trial (PROTECT, NCT00182143). Of 3746 medical-surgical ICU patients in PROTECT, 794 met the enrollment criteria for this HIT substudy. Enrollment was predicated on one of the following occurring in ICU: platelets less than 50×10(9)/L, platelets decreased to 50% of ICU admission value (if admission value<100×10(9)/L), any venous thrombosis, or if HIT was otherwise clinically suspected. Independently, 4Ts scores were completed in real time by research coordinators blinded to study drug and laboratory HIT results, and retrospectively by 2 adjudicators blinded to study drug, laboratory HIT results, and research coordinators' scores; the adjudicators arrived at consensus in all cases. Of the 763 patients, 474 had a central or local laboratory HIT test performed and had 4Ts scoring by adjudicators; 432 were scored by trained research coordinators. Heparin-induced thrombocytopenia was defined by a centrally performed positive serotonin release assay (SRA). Of the 474 patients with central adjudication, 407 (85.9%) had a 4Ts score of 3 or lower, conferring a low pretest probability (PTP) of HIT; of these, 6 (1.5% [95% confidence interval, 0.7%-3.2%) had a positive SRA. Fifty-nine (12.4%) had a moderate PTP (4Ts score of 4-5); of these, 4 (6.8%) had a positive SRA. Eight patients had a high PTP (4Ts score of ≥6); of these, 1 (12.5%) had a positive SRA. Raw agreement between research coordinators and central adjudication on each domain of the 4Ts score and low, intermediate, and high PTP was good. However, chance-corrected agreement was variable between adjudicators (weighted κ values of 0.31-0.93) and between the adjudicator consensus and research coordinators (weighted κ values of 0.13 and 0.78). Post hoc review of the 6 SRA-positive cases with an adjudicated low PTP demonstrated that their scores would have been increased if the adjudicators had had additional information on heparin exposure prior to ICU admission. In general, the fourth domain of 4Ts (oTher causes of thrombocytopenia) generated the most disagreement. Real-time 4Ts scoring by research coordinators at the time of testing for HIT was not consistent with 4Ts scores obtained by central adjudicators. The results of this comprehensive HIT testing highlight the need for further research to improve the assessment of PTP scoring of HIT for critically ill patients. Copyright © 2014 Elsevier Inc. All rights reserved.
ACT Reporting Category Interpretation Guide: Version 1.0. ACT Working Paper 2016 (05)
ERIC Educational Resources Information Center
Powers, Sonya; Li, Dongmei; Suh, Hongwook; Harris, Deborah J.
2016-01-01
ACT reporting categories and ACT Readiness Ranges are new features added to the ACT score reports starting in fall 2016. For each reporting category, the number correct score, the maximum points possible, the percent correct, and the ACT Readiness Range, along with an indicator of whether the reporting category score falls within the Readiness…
Validating Automated Essay Scoring: A (Modest) Refinement of the "Gold Standard"
ERIC Educational Resources Information Center
Powers, Donald E.; Escoffery, David S.; Duchnowski, Matthew P.
2015-01-01
By far, the most frequently used method of validating (the interpretation and use of) automated essay scores has been to compare them with scores awarded by human raters. Although this practice is questionable, human-machine agreement is still often regarded as the "gold standard." Our objective was to refine this model and apply it to…
Speed-Accuracy Response Models: Scoring Rules Based on Response Time and Accuracy
ERIC Educational Resources Information Center
Maris, Gunter; van der Maas, Han
2012-01-01
Starting from an explicit scoring rule for time limit tasks incorporating both response time and accuracy, and a definite trade-off between speed and accuracy, a response model is derived. Since the scoring rule is interpreted as a sufficient statistic, the model belongs to the exponential family. The various marginal and conditional distributions…
Methods for interpreting change over time in patient-reported outcome measures.
Wyrwich, K W; Norquist, J M; Lenderking, W R; Acaster, S
2013-04-01
Interpretation guidelines are needed for patient-reported outcome (PRO) measures' change scores to evaluate efficacy of an intervention and to communicate PRO results to regulators, patients, physicians, and providers. The 2009 Food and Drug Administration (FDA) Guidance for Industry Patient-Reported Outcomes (PRO) Measures: Use in Medical Product Development to Support Labeling Claims (hereafter referred to as the final FDA PRO Guidance) provides some recommendations for the interpretation of change in PRO scores as evidence of treatment efficacy. This article reviews the evolution of the methods and the terminology used to describe and aid in the communication of meaningful PRO change score thresholds. Anchor- and distribution-based methods have played important roles, and the FDA has recently stressed the importance of cross-sectional patient global assessments of concept as anchor-based methods for estimation of the responder definition, which describes an individual-level treatment benefit. The final FDA PRO Guidance proposes the cumulative distribution function (CDF) of responses as a useful method to depict the effect of treatments across the study population. While CDFs serve an important role, they should not be a replacement for the careful investigation of a PRO's relevant responder definition using anchor-based methods and providing stakeholders with a relevant threshold for the interpretation of change over time.
Health literacy, numeracy, and interpretation of graphical breast cancer risk estimates.
Brown, Sandra M; Culver, Julie O; Osann, Kathryn E; MacDonald, Deborah J; Sand, Sharon; Thornton, Andrea A; Grant, Marcia; Bowen, Deborah J; Metcalfe, Kelly A; Burke, Harry B; Robson, Mark E; Friedman, Susan; Weitzel, Jeffrey N
2011-04-01
Health literacy and numeracy are necessary to understand health information and to make informed medical decisions. This study explored the relationships among health literacy, numeracy, and ability to accurately interpret graphical representations of breast cancer risk. Participants (N=120) were recruited from the Facing Our Risk of Cancer Empowered (FORCE) membership. Health literacy and numeracy were assessed. Participants interpreted graphs depicting breast cancer risk, made hypothetical treatment decisions, and rated preference of graphs. Most participants were Caucasian (98%) and had completed at least one year of college (93%). Fifty-two percent had breast cancer, 86% had a family history of breast cancer, and 57% had a deleterious BRCA gene mutation. Mean health literacy score was 65/66; mean numeracy score was 4/6; and mean graphicacy score was 9/12. Education and numeracy were significantly associated with accurate graph interpretation (r=0.42, p<0.001 and r=0.65, p<0.001, respectively). However, after adjusting for numeracy in multivariate linear regression, education added little to the prediction of graphicacy (r(2)=0.41 versus 0.42, respectively). In our highly health-literate population, numeracy was predictive of graphicacy. Effective risk communication strategies should consider the impact of numeracy on graphicacy and patient understanding. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
HÖner, Oliver; Votteler, Andreas; Schmid, Markus; Schultz, Florian; Roth, Klaus
2015-01-01
The utilisation of motor performance tests for talent identification in youth sports is discussed intensively in talent research. This article examines the reliability, differential stability and validity of the motor diagnostics conducted nationwide by the German football talent identification and development programme and provides reference values for a standardised interpretation of the diagnostics results. Highly selected players (the top 4% of their age groups, U12-U15) took part in the diagnostics at 17 measurement points between spring 2004 and spring 2012 (N = 68,158). The heterogeneous test battery measured speed abilities and football-specific technical skills (sprint, agility, dribbling, ball control, shooting, juggling). For all measurement points, the overall score and the speed tests showed high internal consistency, high test-retest reliability and satisfying differential stability. The diagnostics demonstrated satisfying factorial-related validity with plausible and stable loadings on the two empirical factors "speed" and "technical skills". The score, and the technical skills dribbling and juggling, differentiated the most among players of different performance levels and thus showed the highest criterion-related validity. Satisfactory psychometric properties for the diagnostics are an important prerequisite for a scientifically sound rating of players' actual motor performance and for the future examination of the prognostic validity for success in adulthood.
[Critical reading aptitude of clinical research texts in teaching specialist doctors].
Carranza Lira, Sebastián; Varela, Alejandro
2007-11-01
Learning can be divided in two types: the unconscious learning and the significant learning. The critical aptitude for reading clinical research articles is a learning experience that reflects the doctor's active participation in article reading. To know the degree of aptitude for critical reading of clinical research articles in specialists under training. To all the specialist that were under training in the different services of the Hospital, a previous validated evaluation instrument for critical reading of clinical research studies was applied. Kruskal-Wallis and Mann-Whitney's U test were used for statistical analysis. After the application of the evaluation instrument, it was found that the global score had a median of 42.5 (12-89) points. In the results obtained by indicator it was found that there was a greater score for to interpret, than for to judge and for to propose. In the analysis of domain degrees according to the interpret indicator, the greater proportion was in low level. According to the indicators to judge and to propose, most of the results were in the by chance expected level. The critical reading aptitude it's not developed in specialized physicians that are under training. The development of this aptitude will allow them to have a greater profit in their courses.
Home and Community Language Proficiency in Spanish-English Early Bilingual University Students.
Schmidtke, Jens
2017-10-17
This study assessed home and community language proficiency in Spanish-English bilingual university students to investigate whether the vocabulary gap reported in studies of bilingual children persists into adulthood. Sixty-five early bilinguals (mean age = 21 years) were assessed in English and Spanish vocabulary and verbal reasoning ability using subtests of the Woodcock-Muñoz Language Survey-Revised (Schrank & Woodcock, 2009). Their English scores were compared to 74 monolinguals matched in age and level of education. Participants also completed a background questionnaire. Bilinguals scored below the monolingual control group on both subtests, and the difference was larger for vocabulary compared to verbal reasoning. However, bilinguals were close to the population mean for verbal reasoning. Spanish scores were on average lower than English scores, but participants differed widely in their degree of balance. Participants with an earlier age of acquisition of English and more current exposure to English tended to be more dominant in English. Vocabulary tests in the home or community language may underestimate bilingual university students' true verbal ability and should be interpreted with caution in high-stakes situations. Verbal reasoning ability may be more indicative of a bilingual's verbal ability.
Cancer prevention knowledge of people with profound hearing loss.
Zazove, Philip; Meador, Helen E; Reed, Barbara D; Sen, Ananda; Gorenflo, Daniel W
2009-03-01
Deaf persons, a documented minority population, have low reading levels and difficulty communicating with physicians. The effect of these on their knowledge of cancer prevention recommendations is unknown. A cross-sectional study of 222 d/Deaf persons in Michigan, age 18 and older, chose one of four ways (voice, video of a certified American Sign Language interpreter, captions, or printed English) to complete a self-administered computer video questionnaire about demographics, hearing loss, language history, health-care utilization, and health-care information sources, as well as family and social variables. Twelve questions tested their knowledge of cancer prevention recommendations. The outcome measures were the percentage of correct answers to the questions and the association of multiple variables with these responses. Participants averaged 22.9% correct answers with no gender difference. Univariate analysis revealed that smoking history, types of medical problems, last physician visit, and women having previous cancer preventive tests did not affect scores. Improved scores occurred with computer use (p = 0.05), higher education (p < 0.01) and income (p = 0.01), hearing spouses (p < 0.01), speaking English in multiple situations (p < 0.001), and in men with previous prostate cancer testing (p = 0.04). Obtaining health information from books (p = 0.05), physicians (p = 0.008), nurses (p = 0.03) or the internet (p = 0.02), and believing that smoking is bad (p < 0.001) also improved scores. Multivariate analysis revealed that English use (p = 0.01) and believing that smoking was bad (p = 0.05) were associated with improved scores. Persons with profound hearing loss have poor knowledge of recommended cancer prevention interventions. English use in multiple settings was strongly associated with increased knowledge.
Damman, Olga C; De Jong, Anco; Hibbard, Judith H; Timmermans, Danielle R M
2016-01-01
Study objectives We aimed to investigate how different presentation formats influence comprehension and use of comparative performance information (CPI) among consumers. Methods An experimental between-subjects and within-subjects design with manipulations of CPI presentation formats. We enrolled both consumers with lower socioeconomic status (SES)/cognitive skills and consumers with higher SES/cognitive skills, recruited through an online access panel. Respondents received fictitious CPI and completed questions about interpretation and information use. Between subjects, we tested (1) displaying an overall performance score (yes/no); (2) displaying a small number of quality indicators (5 vs 9); and (3) displaying different types of evaluative symbols (star ratings, coloured dots and word icons vs numbers and bar graphs). Within subjects, we tested the effect of a reduced number of healthcare providers (5 vs 20). Data were analysed using descriptive analysis, analyses of variance and paired-sampled t tests. Results A total of 902 (43%) respondents participated. Displaying an overall performance score and the use of coloured dots and word icons particularly enhanced consumer understanding. Importantly, respondents provided with coloured dots most often correctly selected the top three healthcare providers (84.3%), compared with word icons (76.6% correct), star ratings (70.6% correct), numbers (62.0%) and bars (54.2%) when viewing performance scores of 20 providers. Furthermore, a reduced number of healthcare providers appeared to support consumers, for example, when provided with 20 providers, 69.5% correctly selected the top three, compared with 80.2% with five providers. Discussion Particular presentation formats enhanced consumer understanding of CPI, most importantly the use of overall performance scores, word icons and coloured dots, and a reduced number of providers displayed. Public report efforts should use these formats to maximise impact on consumers. PMID:26543066
Adrenal insufficiency in patients with stable non-cystic fibrosis bronchiectasis
Rajagopala, Srinivas; Ramakrishnan, Anantharaman; Bantwal, Ganapathi; Devaraj, Uma; Swamy, Smrita; Ayyar, S V; D’Souza, George
2014-01-01
Background & objectives: Suppressed adrenal responses associated with inhaled steroid use have been reported in patients with bronchiectasis and have been shown to be associated with poor quality of life. This study was undertaken to examine the prevalence of suppressed cortisol responses in stable bronchiectasis and determine their correlation with the use of inhaled corticosteroids, radiologic severity of bronchiectasis and quality of life (QOL) scores. Methods: In this case-control study, cases were patients with bronchiectasis and suppressed cortisol responses and controls were healthy volunteers, and patients with bronchiectasis without suppressed cortisol responses. Symptoms, lung function test values, exercise capacity, HRCT severity scores for bronchiectasis, exacerbations, inhaled corticosteroid use and quality of life scores were compared between patients with and without suppressed cortisol values. Results: Forty consecutive patients with bronchiectasis and 40 matched controls underwent 1-μg cosyntropin testing. Baseline cortisol (mean difference -2.0 μg/dl, P=0.04) and 30-minute stimulated cortisol (mean difference -3.73 μg/dl, P=0.001) were significantly lower in patients with bronchiectasis. One patient had absolute adrenal insufficiency and 39.5 per cent (15/38) patients with bronchiectasis had impaired stimulated responses. Baseline and stimulated cortisol responses were unaffected by inhaled steroids (O.R 1.03, P=0.96). SGRQ scores were negatively correlated with body mass (r= -0.51, P=0.001) and bronchiectasis severity (r=0.37, P=0.019), but not related to baseline or stimulated cortisol responses. Interpretation & conclusions: Our results showed that the impaired adrenal responses to 1-μg cosyntropin were common in patients with bronchiectasis. This was not associated with the use of inhaled steroids or severity of bronchiectasis. Poor health status was associated with advanced disease and not with cortisol responses to the 1-μg cosyntropin test. PMID:24820833
Cancer Prevention Knowledge of People with Profound Hearing Loss
Meador, Helen E.; Reed, Barbara D.; Sen, Ananda; Gorenflo, Daniel W.
2009-01-01
BACKGROUND Deaf persons, a documented minority population, have low reading levels and difficulty communicating with physicians. The effect of these on their knowledge of cancer prevention recommendations is unknown. METHODS A cross-sectional study of 222 d/Deaf persons in Michigan, age 18 and older, chose one of four ways (voice, video of a certified American Sign Language interpreter, captions, or printed English) to complete a self-administered computer video questionnaire about demographics, hearing loss, language history, health-care utilization, and health-care information sources, as well as family and social variables. Twelve questions tested their knowledge of cancer prevention recommendations. The outcome measures were the percentage of correct answers to the questions and the association of multiple variables with these responses. RESULTS Participants averaged 22.9% correct answers with no gender difference. Univariate analysis revealed that smoking history, types of medical problems, last physician visit, and women having previous cancer preventive tests did not affect scores. Improved scores occurred with computer use (p = 0.05), higher education (p < 0.01) and income (p = 0.01), hearing spouses (p < 0.01), speaking English in multiple situations (p < 0.001), and in men with previous prostate cancer testing (p = 0.04). Obtaining health information from books (p = 0.05), physicians (p = 0.008), nurses (p = 0.03) or the internet (p = 0.02), and believing that smoking is bad (p < 0.001) also improved scores. Multivariate analysis revealed that English use (p = 0.01) and believing that smoking was bad (p = 0.05) were associated with improved scores. CONCLUSION Persons with profound hearing loss have poor knowledge of recommended cancer prevention interventions. English use in multiple settings was strongly associated with increased knowledge. PMID:19132325
Comparison of ThinPrep and conventional preparations on fine needle aspiration cytology material.
Dey, P; Luthra, U K; George, J; Zuhairy, F; George, S S; Haji, B I
2000-01-01
To compare the various cytologic features on ThinPrep 2000 (TP) (Cytyc Corporation, Marlborough, Massachusetts, U.S.A.) and conventional preparation (CP) specimens from fine needle aspiration cytology (FNAC) material by a semiquantitative scoring system. In this prospective study a total of 71 consecutive cases were included. In each case, two passes were performed. The first pass was used for conventional preparations, with direct smears made and fixed immediately in 95% alcohol for Papanicolaou stain. For TP preparation a second pass produced material for processing in the ThinPrep 2000. The TP and CP slides were studied independently by two observers and representative slides of CP and TP compared for cellularity, background blood and necrotic cell debris, cell architecture, informative background, presence of monolayer cells, and nuclear and cytoplasmic details by a semiquantitative scoring system. Statistical analysis was performed by Wilcoxon's signed rank test on an SPSS program (Chicago, Illinois, U.S.A.). TP preparations contained adequate diagnostic cells in all cases and were tangibly superior to CP preparations concerning monolayer cells, absence of blood and necrosis, and preservation of nuclear and cytoplasmic detail (statistically significant, Wilcoxon's signed rank test, P < .000). TP preparations are superior to conventional preparations with regard to clear background, monolayer cell preparation and cell preservation. It is easier and less time consuming to screen and interpret TP preparations because the cells are limited to smaller areas on clear backgrounds, with excellent cellular preservation. However, TP preparations are more expensive than CP and require some experience for interpretation.
Allison, Kimberly H; Rendi, Mara H; Peacock, Sue; Morgan, Tom; Elmore, Joann G; Weaver, Donald L
2016-12-01
This study examined the case-specific characteristics associated with interobserver diagnostic agreement in atypical ductal hyperplasia (ADH) of the breast. Seventy-two test set cases with a consensus diagnosis of ADH from the B-Path study were evaluated. Cases were scored for 17 histological features, which were then correlated with the participant agreement with the consensus ADH diagnosis. Participating pathologists' perceptions of case difficulty, borderline features or whether they would obtain a second opinion were also examined for associations with agreement. Of the 2070 participant interpretations of the 72 consensus ADH cases, 48% were scored by participants as difficult and 45% as borderline between two diagnoses; the presence of both of these features was significantly associated with increased agreement (P < 0.001). A second opinion would have been obtained in 80% of interpretations, and this was associated with increased agreement (P < 0.001). Diagnostic agreement ranged from 10% to 89% on a case-by-case basis. Cases with papillary lesions, cribriform architecture and obvious cytological monotony were associated with higher agreement. Lower agreement rates were associated with solid or micropapillary architecture, borderline cytological monotony, or cases without a diagnostic area that was obvious on low power. The results of this study suggest that pathologists frequently recognize the challenge of ADH cases, with some cases being more prone to diagnostic variability. In addition, there are specific histological features associated with diagnostic agreement on ADH cases. Multiple example images from cases in this test set are provided to serve as educational illustrations of these challenges. © 2016 John Wiley & Sons Ltd.
Allison, Kimberly H.; Rendi, Mara H.; Peacock, Sue; Morgan, Tom; Elmore, Joann G.; Weaver, Donald L.
2016-01-01
Background Case specific characteristics associated with interobserver diagnostic agreement in atypical ductal hyperplasia (ADH) of the breast are poorly understood. Methods Seventy-two test set cases with a consensus diagnosis of ADH from the B-Path study were evaluated. Cases were scored for 17 histologic features which were then correlated with the participant agreement with the consensus ADH diagnosis. Participating pathologists’ perceptions of case difficulty, borderline features, or if they would obtain a second opinion were also examined for associations with agreement. Results Of the 2,070 participant interpretations on the 72 consensus ADH cases, 48% were scored by participants as difficult and 45% as borderline between two diagnoses; the presence of both of these features was significantly associated with increased agreement (p < 0.001). A second opinion would have been obtained in 80% of interpretations, and this was associated with increased agreement (p < 0.001). Diagnostic agreement ranged from 10–89% on a case-by-case basis. Cases with papillary lesions, cribriform architecture and obvious cytologic monotony were associated with higher agreement. Lower agreement rates were associated with solid or micro-papillary architecture, borderline cytologic monotony or cases without a diagnostic area that was obvious on low power. Conclusions The results of this study suggest that pathologists frequently recognize the challenge of ADH cases with some cases more prone to diagnostic variability. In addition, there are specific histologic features associated with diagnostic agreement on ADH cases. Multiple example images from cases in this test set are provided to serve as educational illustrations of these challenges. PMID:27398812
Søreide, Kjetil; Kørner, Hartwig; Søreide, Jon Arne
2011-01-01
In surgical research, the ability to correctly classify one type of condition or specific outcome from another is of great importance for variables influencing clinical decision making. Receiver-operating characteristic (ROC) curve analysis is a useful tool in assessing the diagnostic accuracy of any variable with a continuous spectrum of results. In order to rule a disease state in or out with a given test, the test results are usually binary, with arbitrarily chosen cut-offs for defining disease versus health, or for grading of disease severity. In the postgenomic era, the translation from bench-to-bedside of biomarkers in various tissues and body fluids requires appropriate tools for analysis. In contrast to predetermining a cut-off value to define disease, the advantages of applying ROC analysis include the ability to test diagnostic accuracy across the entire range of variable scores and test outcomes. In addition, ROC analysis can easily examine visual and statistical comparisons across tests or scores. ROC is also favored because it is thought to be independent from the prevalence of the condition under investigation. ROC analysis is used in various surgical settings and across disciplines, including cancer research, biomarker assessment, imaging evaluation, and assessment of risk scores.With appropriate use, ROC curves may help identify the most appropriate cutoff value for clinical and surgical decision making and avoid confounding effects seen with subjective ratings. ROC curve results should always be put in perspective, because a good classifier does not guarantee the expected clinical outcome. In this review, we discuss the fundamental roles, suggested presentation, potential biases, and interpretation of ROC analysis in surgical research.
RELIABILITY OF THE TUCK JUMP INJURY RISK SCREENING ASSESSMENT IN ELITE MALE YOUTH SOCCER PLAYERS
READ, PAUL; OLIVER, JON L.; DE STE CROIX, MARK B.A.; MYER, GREGORY D.; LLOYD, RHODRI S.
2015-01-01
Altered neuromuscular control has been suggested as a mechanism for injury in soccer players. Ligamentous injuries most often occur during dynamic movements, such as decelerations from jump-landing maneuvers where high risk movement patterns are present. The assessment of kinematic variables during jump-landing tasks as part of a pre-participation screen is useful in the identification of injury risk. An example of a field-based screening tool is the repeated tuck jump assessment. The purpose of this study was to analyze the within-subject variation of the tuck jump screening assessment in elite male youth soccer players. 25 pre and 25 post-peak height velocity (PHV) elite male youth soccer players from the academy of a professional English soccer club completed the assessment. A test, re-test design was used to explore the within-subject inter-session reliability. Technique was graded retrospectively against the 10-point criteria set out in the screening protocol using two-dimensional video cameras. The typical error range reported for tuck jump total score (0.90 – 1.01 in pre and post-PHV players respectively) was considered acceptable. When each criteria was analyzed individually, Kappa coefficient determined that knee valgus was the only criterion to reach substantial agreement across the two test sessions for both groups. The results of this study suggest that although tuck jump total score may be reliably assessed in elite male youth soccer players, caution should be applied in solely interpreting the composite score due to the high within-subject variation in a number of the individual criteria. Knee valgus may be reliably used to screen elite youth male soccer players for this plyometric technique error and for test, re-test comparison. PMID:26562715
Exploring the dimensionality of digit span.
Bowden, Stephen C; Petrauskas, Vilija M; Bardenhagen, Fiona J; Meade, Catherine E; Simpson, Leonie C
2013-04-01
The Digit Span subtest from the Wechsler Scales is used to measure Freedom from Distractibility or Working Memory. Some published research suggests that Digit Span forward should be interpreted differently from Digit Span backward. The present study explored the dimensionality of the Wechsler Memory Scale-III Digit Span (forward and backward) items in a sample of heterogeneous neuroscience patients (n = 267) using confirmatory factor analysis (CFA) for dichotomous items. Results suggested that four correlated factors underlie Digit Span, reflecting easy and hard items in both forward and backward presentation orders. The model for Digit Span was then cross-validated in a seizure disorders sample (n = 223) by replication of the CFA and by examination of measurement invariance. Measurement invariance tests of the precise numerical generalization of trait estimation across groups. Results supported measurement invariance and it was concluded that forward and backward digit span scores should be interpreted as measures of the same cognitive ability.
Effects of using the developing nurses' thinking model on nursing students' diagnostic accuracy.
Tesoro, Mary Gay
2012-08-01
This quasi-experimental study tested the effectiveness of an educational model, Developing Nurses' Thinking (DNT), on nursing students' clinical reasoning to achieve patient safety. Teaching nursing students to develop effective thinking habits that promote positive patient outcomes and patient safety is a challenging endeavor. Positive patient outcomes and safety are achieved when nurses accurately interpret data and subsequently implement appropriate plans of care. This study's pretest-posttest design determined whether use of the DNT model during 2 weeks of clinical postconferences improved nursing students' (N = 83) diagnostic accuracy. The DNT model helps students to integrate four constructs-patient safety, domain knowledge, critical thinking processes, and repeated practice-to guide their thinking when interpreting patient data and developing effective plans of care. The posttest scores of students from the intervention group showed statistically significant improvement in accuracy. Copyright 2012, SLACK Incorporated.
Snitz, Beth E; Unverzagt, Frederick W; Chang, Chung-Chou H; Bilt, Joni Vander; Gao, Sujuan; Saxton, Judith; Hall, Kathleen S; Ganguli, Mary
2009-12-01
Neuropsychological tests, including tests of language ability, are frequently used to differentiate normal from pathological cognitive aging. However, language can be particularly difficult to assess in a standardized manner in cross-cultural studies and in patients from different educational and cultural backgrounds. This study examined the effects of age, gender, education and race on performance of two language tests: the animal fluency task (AFT) and the Indiana University Token Test (IUTT). We report population-based normative data on these tests from two combined ethnically divergent, cognitively normal, representative population samples of older adults. Participants aged > or =65 years from the Monongahela-Youghiogheny Healthy Aging Team (MYHAT) and from the Indianapolis Study of Health and Aging (ISHA) were selected based on (1) a Clinical Dementia Rating (CDR) score of 0; (2) non-missing baseline language test data; and (3) race self-reported as African-American or white. The combined sample (n = 1885) was 28.1% African-American. Multivariate ordinal logistic regression was used to model the effects of demographic characteristics on test scores. On both language tests, better performance was significantly associated with higher education, younger age, and white race. On the IUTT, better performance was also associated with female gender. We found no significant interactions between age and sex, and between race and education. Age and education are more potent variables than are race and gender influencing performance on these language tests. Demographically stratified normative tables for these measures can be used to guide test interpretation and aid clinical diagnosis of impaired cognition.
Distinguishing body mass and activity level from the lower limb: can entheses diagnose obesity?
Godde, Kanya; Taylor, Rebecca Wilson
2013-03-10
The ability to estimate body size from the skeleton has broad applications, but is especially important to the forensic community when identifying unknown skeletal remains. This research investigates the utility of using entheses/muscle skeletal markers of the lower limb to estimate body size and to classify individuals into average, obese, and active categories, while using a biomechanical approach to interpret the results. Eighteen muscle attachment sites of the lower limb, known to be involved in the sit-to-stand transition, were scored for robusticity and stress in 105 white males (aged 31-81 years) from the William M. Bass Donated Skeletal Collection. Both logistic regression and log linear models were applied to the data to (1) test the utility of entheses as an indicator of body weight and activity level, and (2) to generate classification percentages that speak to the accuracy of the method. Thirteen robusticity scores differed significantly between the groups, but classification percentages were only slightly greater than chance. However, clear differences could be seen between the average and obese and the average and active groups. Stress scores showed no value in discriminating between groups. These results were interpreted in relation to biomechanical forces at the microscopic and macroscopic levels. Even though robusticity alone is not able to classify individuals well, its significance may show greater value when incorporated into a model that has multiple skeletal indicators. Further research needs to evaluate a larger sample and incorporate several lines of evidence to improve classification rates. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Jabez Christopher, J; Khanna Nehemiah, H; Kannan, A
2015-10-01
Allergic Rhinitis is a universal common disease, especially in populated cities and urban areas. Diagnosis and treatment of Allergic Rhinitis will improve the quality of life of allergic patients. Though skin tests remain the gold standard test for diagnosis of allergic disorders, clinical experts are required for accurate interpretation of test outcomes. This work presents a clinical decision support system (CDSS) to assist junior clinicians in the diagnosis of Allergic Rhinitis. Intradermal Skin tests were performed on patients who had plausible allergic symptoms. Based on patient׳s history, 40 clinically relevant allergens were tested. 872 patients who had allergic symptoms were considered for this study. The rule based classification approach and the clinical test results were used to develop and validate the CDSS. Clinical relevance of the CDSS was compared with the Score for Allergic Rhinitis (SFAR). Tests were conducted for junior clinicians to assess their diagnostic capability in the absence of an expert. The class based Association rule generation approach provides a concise set of rules that is further validated by clinical experts. The interpretations of the experts are considered as the gold standard. The CDSS diagnoses the presence or absence of rhinitis with an accuracy of 88.31%. The allergy specialist and the junior clinicians prefer the rule based approach for its comprehendible knowledge model. The Clinical Decision Support Systems with rule based classification approach assists junior doctors and clinicians in the diagnosis of Allergic Rhinitis to make reliable decisions based on the reports of intradermal skin tests. Copyright © 2015 Elsevier Ltd. All rights reserved.
Park, Subin; Lee, Jong-Min; Baik, Young; Kim, Kihyun; Yun, Hyuk Jin; Kwon, Hunki; Jung, Yeon-Kyung; Kim, Bung-Nyun
2015-11-01
The authors examined the effects of arts education on cognition, behavior, and brain of children. Twenty-nine nonclinical children participated in a 15-week arts education program that was composed of either creative movement or musical arts. Children completed the Wisconsin Card Sorting Test, clinical scales, and brain magnetic resonance imaging before and after the intervention. Following program completion, performances on the Wisconsin Card Sorting Test, the Children's Depression Inventory scores, and conduct disorder scores were significantly improved. Furthermore, cortical thickness in the left postcentral gyrus and superior parietal lobule were increased, and the mean diffusivity values in the right posterior corona radiate and superior longitudinal fasciculus were decreased. Positive correlations between changes in cognitive measurements and changes in cortical thickness were observed. This preliminary study suggests a positive effect of arts education on executive functions in association with brain changes. However, these findings must be interpreted with caution due to the noncomparative study design. © The Author(s) 2015.
Bendo, Cristiane B.; Shulman, Robert J.; Self, Mariella M.; Nurko, Samuel; Franciosi, James P.; Saps, Miguel; Saeed, Shehzad; Zacur, George M.; Vaughan Dark, Chelsea; Pohl, John F.
2015-01-01
Objective The present study investigates the clinical interpretability of the Pediatric Quality of Life Inventory™ (PedsQL™) Gastrointestinal Symptoms Scales and Worry Scales in pediatric patients with functional gastrointestinal disorders or organic gastrointestinal diseases in comparison with healthy controls. Methods The PedsQL™ Gastrointestinal Scales were completed by 587 patients with gastrointestinal disorders/diseases and 685 parents, and 513 healthy children and 337 parents. Minimal important difference (MID) scores were derived from the standard error of measurement (SEM). Cut-points were derived based on one and two standard deviations (SDs) from the healthy reference means. Results The percentages of patients below the scales’ cut-points were significantly greater than the healthy controls (most p values ≤ .001). Scale scores 2 SDs from the healthy reference means were within the range of scores for pediatric patients with a gastrointestinal disorder. MID values were generated using the SEM. Conclusions The findings support the clinical interpretability of the new PedsQL™ Gastrointestinal Symptoms Scales and Worry Scales. PMID:25682210
Lozano, Oscar M; Rojas, Antonio J; Pérez, Cristino; González-Sáiz, Francisco; Ballesta, Rosario; Izaskun, Bilbao
2008-05-01
The aim of this work is to show evidence of the validity of the Health-Related Quality of Life for Drug Abusers Test (HRQoLDA Test). This test was developed to measure specific HRQoL for drugs abusers, within the theoretical addiction framework of the biaxial model. The sample comprised 138 patients diagnosed with opiate drug dependence. In this study, the following constructs and variables of the biaxial model were measured: severity of dependence, physical health status, psychological adjustment and substance consumption. Results indicate that the HRQoLDA Test scores are related to dependency and consumption-related problems. Multiple regression analysis reveals that HRQoL can be predicted from drug dependence, physical health status and psychological adjustment. These results contribute empirical evidence of the theoretical relationships established between HRQoL and the biaxial model, and they support the interpretation of the HRQoLDA Test to measure HRQoL in drug abusers, thus providing a test to measure this specific construct in this population.
Fernandez, Alicia; Wang, Frances; Braveman, Melissa; Finkas, Lindsay K; Hauer, Karen E
2007-08-01
Clinical performance examinations (CPX) with standardized patients (SPs) have become a preferred method to assess communication skills in US medical schools. Little is known about how trainees' backgrounds impact CPX performance. The objective of this paper is to examine the impact of student ethnicity, primary childhood language, and experience of diversity on the communication scores of a high-stakes CPX using SPs. This research was designed as an observational study. The participants of this study were third-year medical students at one US medical school. The measurements used in this study were CPX scores from mandatory exam, student demographics and experience with diversity measured by self-report on a survey, and Medical College Admission Test (MCAT) and United States Medical Licensing Examination (USMLE) scores. A total of 135 students participated. Asian and black students scored lower than white students on the communication portion of the CPX by approximately half a standard deviation (Asian, 67.4%; black, 64.4%; white, 69.4%, p < .05). There were no differences by ethnicity on history/physical exam scores. Multivariate analysis controlling for MCAT verbal scores reduced ethnic differences in communication scores (Asian-white mean differences = 1.95, p = 0.02), but Asian-white differences were eliminated only after sequential models included primary childhood language (difference = 0.57, p = 0.6). Even after controlling for English language knowledge as measured in MCAT verbal scores, speaking a primary childhood language other than English is associated with lower CPX communication scores for Asian students. While poorer communication skills cannot be ruled out, SP exams may contain measurement bias associated with differences in childhood language or culture. Caution is indicated when interpreting CPX communication scores among diverse examinees.
Establishing Reliability and Validity of the Criterion Referenced Exam of GeoloGy Standards EGGS
NASA Astrophysics Data System (ADS)
Guffey, S. K.; Slater, S. J.; Slater, T. F.; Schleigh, S.; Burrows, A. C.
2016-12-01
Discipline-based geoscience education researchers have considerable need for a criterion-referenced, easy-to-administer and -score conceptual diagnostic survey for undergraduates taking introductory science survey courses in order for faculty to better be able to monitor the learning impacts of various interactive teaching approaches. To support ongoing education research across the geosciences, we are continuing to rigorously and systematically work to firmly establish the reliability and validity of the recently released Exam of GeoloGy Standards, EGGS. In educational testing, reliability refers to the consistency or stability of test scores whereas validity refers to the accuracy of the inferences or interpretations one makes from test scores. There are several types of reliability measures being applied to the iterative refinement of the EGGS survey, including test-retest, alternate form, split-half, internal consistency, and interrater reliability measures. EGGS rates strongly on most measures of reliability. For one, Cronbach's alpha provides a quantitative index indicating the extent to which if students are answering items consistently throughout the test and measures inter-item correlations. Traditional item analysis methods further establish the degree to which a particular item is reliably assessing students is actually quantifiable, including item difficulty and item discrimination. Validity, on the other hand, is perhaps best described by the word accuracy. For example, content validity is the to extent to which a measurement reflects the specific intended domain of the content, stemming from judgments of people who are either experts in the testing of that particular content area or are content experts. Perhaps more importantly, face validity is a judgement of how representative an instrument is reflective of the science "at face value" and refers to the extent to which a test appears to measure a the targeted scientific domain as viewed by laypersons, examinees, test users, the public, and other invested stakeholders.
The reliability of multidimensional neuropsychological measures: from alpha to omega.
Watkins, Marley W
To demonstrate that Coefficient omega, a model-based estimate, is more a more appropriate index of reliability than coefficient alpha for the multidimensional scales that are commonly employed by neuropsychologists. As an illustration, a structural model of an overarching general factor and four first-order factors for the WAIS-IV based on the standardization sample of 2200 participants was identified and omega coefficients were subsequently computed for WAIS-IV composite scores. Alpha coefficients were ≥ .90 and omega coefficients ranged from .75 to .88 for WAIS-IV factor index scores, indicating that the blend of general and group factor variance in each index score created a reliable multidimensional composite. However, the amalgam of variance from general and group factors did not allow the precision of Full Scale IQ (FSIQ) and factor index scores to be disentangled. In contrast, omega hierarchical coefficients were low for all four factor index scores (.10-.41), indicating that most of the reliable variance of each factor index score was due to the general intelligence factor. In contrast, the omega hierarchical coefficient for the FSIQ score was .84. Meaningful interpretation of WAIS-IV factor index scores as unambiguous indicators of group factors is imprecise, thereby fostering unreliable identification of neurocognitive strengths and weaknesses, whereas the WAIS-IV FSIQ score can be interpreted as a reliable measure of general intelligence. It was concluded that neuropsychologists should base their clinical decisions on reliable scores as indexed by coefficient omega.
Pearman, Timothy; Yanez, Betina; Peipert, John; Wortman, Katy; Beaumont, Jennifer; Cella, David
2014-09-15
Health-related quality of life (HRQOL) measures are commonly used in oncology research. Interest in their use for monitoring or screening is increasing. The Functional Assessment of Cancer Therapy (FACT) is one of the most widely used HRQOL instruments. Consequently, oncology researchers and practitioners have an increasing need for reference values for the Functional Assessment of Cancer Therapy-General (FACT-G) and its 7-item rapid version, the Functional Assessment of Cancer Therapy-General 7 (FACT-G7), to compare FACT scores across specific subgroups of patients in research trials and practice. The objectives of this study are to provide 1) reference values from a sample of the general US adult population and a sample of adults diagnosed with cancer and 2) cutoff scores for quality of life. A sample of the general US population (N = 1075) and a sample of patients with cancer from 12 studies (N = 5065) were analyzed. Cutoff scores were established using distribution- and anchor-based methods. Mean values for the cancer sample were analyzed by performance status, cancer type, and disease status. Also, t tests and established criteria for meaningful differences were used to compare values. FACT-G and FACT-G7 scores in the general US population sample and cancer sample were generally comparable. Among the sample of patients with cancer, FACT-G and FACT-G7 scores worsened with declining performance status and increasing disease status. These data will aid interpretation of the magnitude and meaning of FACT scores, and allow for comparisons of scores across studies. © 2014 American Cancer Society.
Dore, Kelly L; Reiter, Harold I; Kreuger, Sharyn; Norman, Geoffrey R
2017-12-01
In re-examining the paper "CASPer, an online pre-interview screen for personal/professional characteristics: prediction of national licensure scores" published in AHSE (22(2), 327-336), we recognized two errors of interpretation.
Matías-Guiu, Jordi A; Cabrera-Martín, María Nieves; Valles-Salgado, María; Pérez-Pérez, Alicia; Rognoni, Teresa; Moreno-Ramos, Teresa; Carreras, José Luis; Matías-Guiu, Jorge
2017-07-01
Interpreting cognitive tests is often challenging. The same test frequently examines multiple cognitive functions, and the functional and anatomical basis underlying test performance is unknown in many cases. This study analyses the correlation of different neuropsychological test results with brain metabolism in a series of patients evaluated for suspected Alzheimer disease. 20 healthy controls and 80 patients consulting for memory loss were included, in which cognitive study and 18 F-fluorodeoxyglucose PET were performed. Patients were categorized according to Reisberg's Global Deterioration Scale. Voxel-based analysis was used to determine correlations between brain metabolism and performance on the following tests: Free and Cued Selective Reminding Test (FCSRT), Boston Naming Test (BNT), Trail Making Test, Rey-Osterrieth Complex Figure test, Visual Object and Space Perception Battery (VOSP), and Tower of London (ToL) test. Mean age in the patient group was 73.9 ± 10.6 years, and 47 patients were women (58.7%). FCSRT findings were positively correlated with metabolism in the medial and anterior temporal region bilaterally, the left precuneus, and posterior cingulate. BNT results were correlated with metabolism in the middle temporal, superior, fusiform, and frontal medial gyri bilaterally. VOSP results were related to the occipital and parietotemporal regions bilaterally. ToL scores were correlated to metabolism in the right temporoparietal and frontal regions. These results suggest that different areas of the brain are involved in the processes required to complete different cognitive tests. Ascertaining the functional basis underlying these tests may prove helpful for understanding and interpreting them. Copyright © 2017 American Association for Geriatric Psychiatry. Published by Elsevier Inc. All rights reserved.
Bhatia, Sujata K; Yetter, Ann B
2008-08-01
Medical devices and implanted biomaterials are often assessed for biological reactivity using visual scores of cell-material interactions. In such testing, biomaterials are assigned cytotoxicity ratings based on visual evidence of morphological cellular changes, including cell lysis, rounding, spreading, and proliferation. For example, ISO 10993 cytotoxicity testing of medical devices allows the use of a visual grading scale. The present study compared visual in vitro cytotoxicity ratings to quantitative in vitro cytotoxicity measurements for biomaterials to determine the level of correlation between visual scoring and a quantitative cell viability assay. Biomaterials representing a spectrum of biological reactivity levels were evaluated, including organo-tin polyvinylchloride (PVC; a known cytotoxic material), ultra-high molecular weight polyethylene (a known non-cytotoxic material), and implantable tissue adhesives. Each material was incubated in direct contact with mouse 3T3 fibroblast cell cultures for 24 h. Visual scores were assigned to the materials using a 5-point rating scale; the scorer was blinded to the material identities. Quantitative measurements of cell viability were performed using a 3-(4,5-dimethylthiozol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) colorimetric assay; again, the assay operator was blinded to material identities. The investigation revealed a high degree of correlation between visual cytotoxicity ratings and quantitative cell viability measurements; a Pearson's correlation gave a correlation coefficient of 0.90 between the visual cytotoxicity score and the percent viable cells. An equation relating the visual cytotoxicity score and the percent viable cells was derived. The results of this study are significant for the design and interpretation of in vitro cytotoxicity studies of novel biomaterials.
Wodushek, Thomas R; Greher, Michael R
2017-05-01
In the first column in this 2-part series, Performance Validity Testing in Neuropsychology: Scientific Basis and Clinical Application-A Brief Review, the authors introduced performance validity tests (PVTs) and their function, provided a justification for why they are necessary, traced their ongoing endorsement by neuropsychological organizations, and described how they are used and interpreted by ever increasing numbers of clinical neuropsychologists. To enhance readers' understanding of these measures, this second column briefly describes common detection strategies used in PVTs as well as the typical methods used to validate new PVTs and determine cut scores for valid/invalid determinations. We provide a discussion of the latest research demonstrating how neuropsychologists can combine multiple PVTs in a single battery to improve sensitivity/specificity to invalid responding. Finally, we discuss future directions for the research and application of PVTs.
Interactive or static reports to guide clinical interpretation of cancer genomics.
Gray, Stacy W; Gagan, Jeffrey; Cerami, Ethan; Cronin, Angel M; Uno, Hajime; Oliver, Nelly; Lowenstein, Carol; Lederman, Ruth; Revette, Anna; Suarez, Aaron; Lee, Charlotte; Bryan, Jordan; Sholl, Lynette; Van Allen, Eliezer M
2018-05-01
Misinterpretation of complex genomic data presents a major challenge in the implementation of precision oncology. We sought to determine whether interactive genomic reports with embedded clinician education and optimized data visualization improved genomic data interpretation. We conducted a randomized, vignette-based survey study to determine whether exposure to interactive reports for a somatic gene panel, as compared to static reports, improves physicians' genomic comprehension and report-related satisfaction (overall scores calculated across 3 vignettes, range 0-18 and 1-4, respectively, higher score corresponding with improved endpoints). One hundred and five physicians at a tertiary cancer center participated (29% participation rate): 67% medical, 20% pediatric, 7% radiation, and 7% surgical oncology; 37% female. Prior to viewing the case-based vignettes, 34% of the physicians reported difficulty making treatment recommendations based on the standard static report. After vignette/report exposure, physicians' overall comprehension scores did not differ by report type (mean score: interactive 11.6 vs static 10.5, difference = 1.1, 95% CI, -0.3, 2.5, P = .13). However, physicians exposed to the interactive report were more likely to correctly assess sequencing quality (P < .001) and understand when reports needed to be interpreted with caution (eg, low tumor purity; P = .02). Overall satisfaction scores were higher in the interactive group (mean score 2.5 vs 2.1, difference = 0.4, 95% CI, 0.2-0.7, P = .001). Interactive genomic reports may improve physicians' ability to accurately assess genomic data and increase report-related satisfaction. Additional research in users' genomic needs and efforts to integrate interactive reports into electronic health records may facilitate the implementation of precision oncology.
Cappelleri, Joseph C; Zou, Kelly H; Bushmakin, Andrew G; Carlsson, Martin O; Symonds, Tara
2013-03-01
What's known on the subject? and What does the study add? Studies on erectile dysfunction (ED) therapies rely heavily on patient-reported outcomes (PROs) to measure efficacy on treatment response. A challenge when using PROs is interpretation of the clinical meaning of changes in scores. A responder analysis provides a threshold score to indicate whether a change in score qualifies a patient as a responder. However, a major consideration with responder analysis is the sometimes arbitrary nature of defining the threshold for a response. By contrast, cumulative response curves (CRCs) display patient response rates over a continuum of possible thresholds, thus eliminating problems with a rigid threshold definition, allowing for a variety of response thresholds to be examined simultaneously, and encompassing all data. With respect to the psychosocial factors addressed in the Self-Esteem And Relationship questionnaire in ED, CRCs clearly, distinctly, and meaningfully highlighted the favourable profiles of responses to sildenafil compared with placebo. CRCs for PROs in urology can provide a clear, transparent and meaningful visual depiction of efficacy data that can supplement and complement other analyses. To use cumulative response curves (CRCs) to enrich meaning and enhance interpretation of scores on the Self-Esteem And Relationship (SEAR) questionnaire with respect to treatment differences for men with erectile dysfunction (ED). This post hoc analysis used data from all patients who took at least one dose of study drug and had at least one post-baseline efficacy evaluation in a previously published 12-week, multicentre, randomized, double-blind, placebo-controlled trial of flexible-dose (25, 50, or 100 mg) sildenafil citrate (Viagra) in adult men with ED who had scored ≤ 75 out of 100 on the Self-Esteem subscale of the SEAR questionnaire. CRCs were used on the numeric change in transformed SEAR scores from baseline to end-of-study for each SEAR component. The horizontal axis of the CRC represented change from baseline on the SEAR score, and the vertical axis represented the percentage of patients experiencing that change or greater. The differences between CRCs for the sildenafil group vs the placebo group were assessed using the Kolmogorov-Smirov test. For each of the SEAR components, there was essentially no overlap in the CRCs between the sildenafil group (n = 113) and placebo group (n = 115 or 116, depending on the component), showing that a greater percentage of sildenafil recipients compared with placebo recipients had a more favourable change across the spectrum of response thresholds (P ≤ 0.01). Previous research showed that a 10-point score increase is the minimal clinically meaningful improvement for most SEAR components. In the sildenafil vs placebo groups, a ≥10-point score increase occurred in 72 vs 37% of patients, respectively, on the Sexual Relationship Satisfaction domain, 71 vs 41% on the Confidence domain, 76 vs 49% on the Self-Esteem subscale, 60 vs 44% on the Overall Relationship Satisfaction subscale, and 75 vs 38% on the Overall score. With respect to the psychosocial factors addressed in the SEAR questionnaire, CRCs clearly, distinctly, and meaningfully highlighted the favourable profiles of responses to sildenafil compared with placebo. CRCs for patient-reported outcomes in urology can provide a clear, transparent, and meaningful visual depiction of efficacy data that can supplement and complement other analyses. © 2012 BJU INTERNATIONAL.
Lin, Chung-Ying; Hwang, Jing-Shiang; Wang, Wen-Chung; Lai, Wu-Wei; Su, Wu-Chou; Wu, Tzu-Yi; Yao, Grace; Wang, Jung-Der
2018-04-13
Quality of life (QoL) is important for clinicians to evaluate how cancer survivors judge their sense of well-being, and WHOQOL-BREF may be a good tool for clinical use. However, at least three issues remain unresolved: (1) the psychometric properties of the WHOQOL-BREF for cancer patients are insufficient; (2) the scoring method used for WHOQOL-BREF needs to be clarify; (3) whether different types of cancer patients interpret the WHOQOL-BREF similarly. We recruited 1000 outpatients with head/neck cancer, 1000 with colorectal cancer, 965 with liver cancer, 1438 with lung cancer and 1299 with gynecologic cancers in a medical center. Data analyses included Rasch models, confirmatory factor analysis (CFA), and Pearson correlations. The mean WHOQOL-BREF domain scores were between 13.34 and 14.77 among all participants. CFA supported construct validity; Rasch models revealed that almost all items were embedded in their expected domains and were interpreted similarly across five types of cancer patients; all correlation coefficients between Rasch scores and original domain scores were above 0.9. The linear relationship between Rasch scores and domain scores suggested that the current calculations for domain scores were applicable and without serious bias. Clinical practitioners may regularly collect and record the WHOQOL-BREF domain scores into electronic health records. Copyright © 2018. Published by Elsevier B.V.
[Quality improvement in workers health surveillance: the spirometry training courses experience].
Innocenti, A; Quercia, A; Roscelli, F
2012-01-01
The spirometry execution during workers health surveillance requires accurate and reproducible spirometric measurements, which should comply with the ATS/ERS guidelines. Low acceptability of spirometric manoeuvres has been reported in health surveillance. This may hamper the validity of the results and affect clinical decision making. Training and refresher courses may produce and maintain good-quality testing, promote the use of spirometric results in clinical practice and enhance the quality of interpretation. We evaluated (with PLATINO score) 239 spirometries from 23 occupational physicians recorded before and after a spirometry refresher course (16 hours) and we verified that only 4 physicians showed a very good improvement and others 4 a good improvement of score, while 9 showed a very slight improvement and 6 instead no improvement. It is worthy of note that in 2012 some spirometers not suitable to UNI EN 26782/2009 were still in use.
Student’s profile about science literacy in Surakarta
NASA Astrophysics Data System (ADS)
Nur’aini, D.; Rahardjo, S. B.; Elfi Susanti, V. H.
2018-05-01
This research was conducted to find out student’s initial profile of science literacy. The method used was descriptive with 46 students as subjects. The instrument used is science literacy question referring to PISA 2015. Data processing technique used are scoring on each question, changing the score values, grouping the level subjects obtain based on the value and conclusion. Competencies measure in this test are explaining scientific phenomena, interpretation of data and evidence scientifically, and evaluate and design scientific inquiry. The results of the three competencies are 30,87%, 40,20% and 24,90%. Achievement level of science literacy achieved by students in level 1 47,82%, level 2 33,82%, level 3 42,93%, level 4 26,50%, level 5 21,73%. Based on the result of research, it can be concluded that the ability of science literacy students in Surakarta relatively low.
Petrillo, Jennifer; Bressler, Neil M; Lamoureux, Ecosse; Ferreira, Alberto; Cano, Stefan
2017-08-14
The NEI VFQ-25 has undergone psychometric evaluation in patients with varying ocular conditions and the general population. However, important limitations which may affect the interpretation of clinical trial results have been previously identified, such as concerns with reliability and validity. The purpose of this study was to evaluate the National Eye Institute Visual Functioning Questionnaire (NEI VFQ-25) and make recommendations for a revised scoring structure, with a view to improving its psychometric performance and interpretability. Rasch Measurement Theory analyses were conducted in two stages using pooled baseline NEI VFQ-25 data for 2487 participants with retinal diseases enrolled in six clinical trials. In stage 1, we examined: scale-to-sample targeting; thresholds for item response options; item fit statistics; stability; local dependence; and reliability. In stage 2, a post-hoc revision of the scoring structure (VFQ-28R) was created and psychometrically re-evaluated. In stage 1, we found that the NEI VFQ-25 was mis-targeted to the sample, and had disordered response thresholds (15/25 items) and mis-fitting items (8/25 items). However, items appeared to be stable (differential item functioning for three items), have minimal item dependency (one pair of items) and good reliability (person-separation index, 0.93). In stage 2, the modified Rasch-scored NEI VFQ-28-R was assessed. It comprised two broad domains: Activity Limitation (19 items) and Socio-Emotional Functioning (nine items). The NEI VFQ-28-R demonstrated improved performance with fewer disordered response thresholds (no items), less item misfit (three items) and improved population targeting (reduced ceiling effect) compared with the NEI VFQ-25. Compared with the original version, the proposed NEI VFQ-28-R, with Rasch-based scoring and a two-domain structure, appears to offer improved psychometric performance and interpretability of the vision-related quality of life scale for the population analysed.
Linguistic diversity in a deaf prison population: implications for due process.
Miller, Katrina R
2004-01-01
The entire deaf prison population in the state of Texas formed the basis for this research. The linguistic skills of prison inmates were assessed using the following measures: (1) Kannapell's categories of bilingualism, (2) adaptation of the diagnostic criteria for Primitive Personality Disorder, (3) reading scores on the Test of Adult Basic Education, and (4) an evaluation of sign language use and skills by a certified sign language interpreter who had worked with deaf inmates for the past 17 years. Deaf inmates with reading scores below the federal standard for literacy (grade level 2.9) were the group most likely to demonstrate linguistic incompetence to stand trial, meaning that they probably lacked the ability to understand the charges against them and/or were unable to participate in their own defenses. Based on the language abilities and reading scores of this population, up to 50% of deaf state prison inmates may not have received due process throughout their arrest and adjudication. Despite their adjudicative and/or linguistic incompetence, these individuals were convicted in many cases, possibly violating their constitutional rights and their rights under the Americans with Disabilities Act.
Efficacy of computer-based video and simulation in ultrasound-guided regional anesthesia training.
Woodworth, Glenn E; Chen, Elliza M; Horn, Jean-Louis E; Aziz, Michael F
2014-05-01
To determine the effectiveness of a short educational video and simulation on improvement of ultrasound (US) image acquisition and interpretation skills. Prospective, randomized study. University medical center. 28 anesthesia residents and community anesthesiologists with varied ultrasound experience were randomized to teaching video with interactive simulation or sham video groups. Participants were assessed preintervention and postintervention on their ability to identify the sciatic nerve and other anatomic structures on static US images, as well as their ability to locate the sciatic nerve with US on live models. Pretest written test scores correlated with reported US block experience (Kendall tau rank r = 0.47) and with live US scanning scores (r = 0.64). The teaching video and simulation significantly improved scores on the written examination (P < 0.001); however, they did not significantly improve live US scanning skills. A short educational video with interactive simulation significantly improved knowledge of US anatomy, but failed to improve hands-on performance of US scanning to localize the nerve. Copyright © 2014 Elsevier Inc. All rights reserved.
Bowden, Stephen C; Lissner, Dianne; McCarthy, Kerri A L; Weiss, Lawrence G; Holdnack, James A
2007-10-01
Equivalence of the psychological model underlying Wechsler Adult Intelligence Scale-Third Edition (WAIS-III) scores obtained in the United States and Australia was examined in this study. Examination of metric invariance involves testing the hypothesis that all components of the measurement model relating observed scores to latent variables are numerically equal in different samples. The assumption of metric invariance is necessary for interpretation of scores derived from research studies that seek to generalize patterns of convergent and divergent validity and patterns of deficit or disability. An Australian community volunteer sample was compared to the US standardization data. A pattern of strict metric invariance was observed across samples. In addition, when the effects of different demographic characteristics of the US and Australian samples were included, structural parameters reflecting values of the latent cognitive variables were found not to differ. These results provide important evidence for the equivalence of measurement of core cognitive abilities with the WAIS-III and suggest that latent cognitive abilities in the US and Australia do not differ.
Haemophilia Joint Health Score in healthy adults playing sports.
Sluiter, D; Foppen, W; de Kleijn, P; Fischer, K
2014-03-01
To evaluate outcome of prophylactic clotting factor replacement in children with haemophilia, the Haemophilia Joint Health Score (HJHS) was developed aiming at scoring early joint changes in children aged 4-18. The HJHS has been used for adults on long-term prophylaxis but interpretation of small changes remains difficult. Some changes in these patients may be due to sports-related injuries. Evaluation of HJHS score in healthy adults playing sports could improve the interpretation of this score in haemophilic patients. The aim of this study was to evaluate the HJHS scores in a cohort of young, healthy men participating in sports. Concomitant with a project collecting MRI images of ankles and knees in normal young adults, HJHS scores were assessed in 30 healthy men aged 18-26, participating in sports one to three times per week. One physiotherapist assessed their clinical function using the HJHS 2.1. History of joint injuries was documented. MRI images were scored by a single radiologist, using the International Prophylaxis Study Group additive MRI score. Median age of the study group was 24.3 years (range 19.0-26.4) and median frequency of sports activities was three times per week (range 1-4). Six joints (five knees, one ankle) had a history of sports-related injury. The median overall HJHS score was 0 out of 124 (range 0-3), with 60% of subjects showing no abnormalities on HJHS. All joints were normal on MRI. These results suggest that frequent sports participation and related injuries are not related with abnormalities in HJHS scores. © 2013 John Wiley & Sons Ltd.
A cross-sectional study examining factors related to critical thinking in nursing.
Lang, Gary Morris; Beach, Nick Lee; Patrician, Patricia A; Martin, Cheryl
2013-01-01
The purpose of this study was to examine critical thinking skills among registered nurses who work in a military hospital. Sixty-five nurses were administered the Health Sciences Reasoning Test to obtain scores in inductive reasoning, deductive reasoning, interpretation, analysis, and evaluation skills. Results showed no significant association between critical thinking skills and years of experience; however, differences were identified among racial/ethnic groups. It is hoped that findings from this study create a platform for dialogue among staff development nurses who are best situated to develop strategies that address these issues.
Approximate string matching algorithms for limited-vocabulary OCR output correction
NASA Astrophysics Data System (ADS)
Lasko, Thomas A.; Hauser, Susan E.
2000-12-01
Five methods for matching words mistranslated by optical character recognition to their most likely match in a reference dictionary were tested on data from the archives of the National Library of Medicine. The methods, including an adaptation of the cross correlation algorithm, the generic edit distance algorithm, the edit distance algorithm with a probabilistic substitution matrix, Bayesian analysis, and Bayesian analysis on an actively thinned reference dictionary were implemented and their accuracy rates compared. Of the five, the Bayesian algorithm produced the most correct matches (87%), and had the advantage of producing scores that have a useful and practical interpretation.
Yuan, Chao; Wang, Xue-Min; Galzote, Carlos; Tan, Yi-Mei; Bhagat, Kamlesh V; Yuan, Zhi-Kang; Du, Jian-Fei; Tan, Yuan
2013-06-01
Human repeated insult patch test (HRIPT) is regarded as one of the confirmatory test in determining the safety of skin sensitizers. A number of important factors should be considered when conducting and interpreting the results of the HRIPT. To investigate for probable critical factors that influence the results of HRIPT with the same protocol in Shanghai and Mumbai. Two HRIPTs were carried out in Shanghai and Mumbai in 2011. Six identical products and 1% sodium lauryl sulfate were tested. Two Chinese dermatologists performed the grading in the two cities. Climate conditions of Shanghai and Mumbai were also recorded. For four lower reaction ratio products, cumulative irritation scores in the induction phase were higher in individuals whose ethnicity was Indian rather than Chinese. Reaction ratio of the same four products was highly correlated to the climatic parameters. The other two higher reaction ratio products and the positive control had no difference between the two ethnicities. Greater attention ought to be paid to the impact of climate on the results of HRIPT, especially for the mild irritation cosmetics when giving the interpretation. Greater emphasis also needs to be placed on the ethnicity of the subjects. Crown Copyright © 2013. Published by Elsevier Inc. All rights reserved.
Broad-Enrich: functional interpretation of large sets of broad genomic regions.
Cavalcante, Raymond G; Lee, Chee; Welch, Ryan P; Patil, Snehal; Weymouth, Terry; Scott, Laura J; Sartor, Maureen A
2014-09-01
Functional enrichment testing facilitates the interpretation of Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) data in terms of pathways and other biological contexts. Previous methods developed and used to test for key gene sets affected in ChIP-seq experiments treat peaks as points, and are based on the number of peaks associated with a gene or a binary score for each gene. These approaches work well for transcription factors, but histone modifications often occur over broad domains, and across multiple genes. To incorporate the unique properties of broad domains into functional enrichment testing, we developed Broad-Enrich, a method that uses the proportion of each gene's locus covered by a peak. We show that our method has a well-calibrated false-positive rate, performing well with ChIP-seq data having broad domains compared with alternative approaches. We illustrate Broad-Enrich with 55 ENCODE ChIP-seq datasets using different methods to define gene loci. Broad-Enrich can also be applied to other datasets consisting of broad genomic domains such as copy number variations. http://broad-enrich.med.umich.edu for Web version and R package. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Relationship of self-reported mysticism with depression and anxiety in Iranian Muslims.
Ghorbani, Nima; Watson, P J; Rostami, Reza
2007-04-01
This study examined relationships of self-reported Mysticism with dispositional Depression and Anxiety in Iranian Muslims. The sample contained 80 women and 51 men undergraduates who volunteered to participate (M age=20.5 yr., SD= 2.0). Participants responded to the Hood Mysticism Scale and to the Costello and Comrey Depression and Anxiety Scales. Scores on the Religious Interpretation dimension of mystical experience correlated negatively with those on Depression, explained a similar relationship observed for Extrovertive Mysticism, and moderated the otherwise positive relationship between Introvertive Mysticism and Anxiety. Moderation occurred when Introvertive Mysticism correlated negatively rather than positively with Anxiety in those who scored high on Religious Interpretation and very high on the Introvertive factor. These data suggested possibilities for reconciling conflicts that have appeared between philosophical interpretations of Introvertive Mysticism and previous self-report data.
Factor Analysis of the Kuder Occupational Interest Survey
ERIC Educational Resources Information Center
Zytowski, Donald G.
1976-01-01
The scale scores of Kuder Occupational Interest Survey (KOIS) profiles of 1,084 males and females were converted to intraindividual form by means of a z-score transformation. The results are discussed in terms of their utility for interpretation of high ranking scales. (Author)
Yue, Lilly Q
2012-01-01
In the evaluation of medical products, including drugs, biological products, and medical devices, comparative observational studies could play an important role when properly conducted randomized, well-controlled clinical trials are infeasible due to ethical or practical reasons. However, various biases could be introduced at every stage and into every aspect of the observational study, and consequently the interpretation of the resulting statistical inference would be of concern. While there do exist statistical techniques for addressing some of the challenging issues, often based on propensity score methodology, these statistical tools probably have not been as widely employed in prospectively designing observational studies as they should be. There are also times when they are implemented in an unscientific manner, such as performing propensity score model selection for a dataset involving outcome data in the same dataset, so that the integrity of observational study design and the interpretability of outcome analysis results could be compromised. In this paper, regulatory considerations on prospective study design using propensity scores are shared and illustrated with hypothetical examples.
Thamboo, Andrew; Velasquez, Nathalia; Habib, Al-Rahim R; Zarabanda, David; Paknezhad, Hassan; Nayak, Jayakar V
2017-08-01
The validated Empty Nose Syndrome 6-Item Questionnaire (ENS6Q) identifies empty nose syndrome (ENS) patients. The unvalidated cotton test assesses improvement in ENS-related symptoms. By first validating the cotton test using the ENS6Q, we define the minimal clinically important difference (MCID) score for the ENS6Q. Individual case-control study. Fifteen patients diagnosed with ENS and 18 controls with non-ENS sinonasal conditions underwent office cotton placement. Both groups completed ENS6Q testing in three conditions-precotton, cotton in situ, and postcotton-to measure the reproducibility of ENS6Q scoring. Participants also completed a five-item transition scale ranging from "much better" to "much worse" to rate subjective changes in nasal breathing with and without cotton placement. Mean changes for each transition point, and the ENS6Q MCID, were then calculated. In the precotton condition, significant differences (P < .001) in all ENS6Q questions between ENS and controls were noted. With cotton in situ, nearly all prior ENS6Q differences normalized between ENS and control patients. For ENS patients, the changes in the mean differences between the precotton and cotton in situ conditions compared to postcotton versus cotton in situ conditions were insignificant among individuals. Including all 33 participants, the mean change in the ENS6Q between the parameters "a little better" and "about the same" was 4.25 (standard deviation [SD] = 5.79) and -2.00 (SD = 3.70), giving an MCID of 6.25. Cotton testing is a validated office test to assess for ENS patients. Cotton testing also helped to determine the MCID of the ENS6Q, which is a 7-point change from the baseline ENS6Q score. 3b. Laryngoscope, 127:1746-1752, 2017. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.
[Streptococcal tonsillopharyngitis: clinical vs. microbiological diagnosis].
Boccazzi, A; Garotta, M; Pontari, S; Agostoni, C V
2011-06-01
This study aimed to evaluate the role of clinical diagnosis vs. rapid antigen detection tests (RADT) in identifying streptococcal vs. non-streptococcal cases of acute pharyngitis (AP) with respect to a scoring schedule. The Breese scoring system, modified by eliminating the count of peripheral WBC, was used in the study. At enrolment, cases of AP observed by office-based pediatricians were judged on a clinical basis as possibly of streptococcal or of non-streptococcal origin and a clinical score recorded. At the end of the visit and following completion of the clinical score to document the presence/absence of a group A beta haemolytic streptococcus (GABHS), a confirmatory RADT was performed. In RADT negative cases a standard throat swab and culture were performed. In all, 629 children presenting with AP were enrolled in the study. A correct clinical diagnosis was predicted on the basis of the clinical observation in 74.2% of cases (with a sensitivity of 81.1% and specificity of 70.5%). In cases judged as "streptococcal", a mean score of 27.6 was recorded both in those patients with a positive or negative RADT/throat swab for GABHS. By contrast, among cases considered of non-streptococcal aetiology, negative RADT/culture had a mean score of 24.3 compared to a mean score of 25 in those with a positive RADT/culture. Intragroup score differences were not significant, while intergroup differences were highly significant. Optimization of AP treatment requires careful identification of streptococcal cases, avoiding unnecessary antibiotic treatment which would contribute to enhancing antibiotic resistance and increase medical treatment costs. We document that clinical observation alone, although performed by skilled pediatricians, will misdiagnose a sizeable percentage of cases. As indicated by this study, scores may suffer from a subjective interpretative bias in grading the severity of signs and symptoms.
Cardiac Society of Australia and New Zealand Position Statement: Coronary Artery Calcium Scoring.
Liew, Gary; Chow, Clara; van Pelt, Niels; Younger, John; Jelinek, Michael; Chan, Jonathan; Hamilton-Craig, Christian
2017-12-01
Coronary Artery Calcium Scoring (CAC) is a non-invasive quantitation of coronary artery calcification using computed tomography (CT). It is a marker of atherosclerotic plaque burden and an independent predictor of future myocardial infarction and mortality. Coronary Artery Calcium Scoring provides incremental risk information beyond traditional risk calculators (eg. Framingham Risk Score). Its use for risk stratification is confined to primary prevention of cardiovascular events, and can be considered as "individualised coronary risk scoring" for those not considered to be of high or low risk. Medical practitioners should carefully counsel patients prior to CAC. Coronary Artery Calcium Scoring should only be undertaken if an alteration in therapy including embarking on pharmacotherapy is being considered based on the test result. Patient Groups to Consider Coronary Calcium Scoring: Patient Groups in Whom Coronary Calcium Scoring Should Not be Considered: Coronary Artery Calcium Scoring is not recommended for patients who are: Interpretation of CAC CAC=0 A zero score confers a very low risk of death, <1% at 10 years. CAC=1-100 Low risk, <10% CAC=101-400 Intermediate risk, 10-20% CAC=101-400 & >75th centile. Moderately high risk, 15-20% CAC >400 High risk, >20% Management Recommendations Based on CAC Optimal diet and lifestyle measures are encouraged in all risk groups and form the basis of primary prevention strategies. Patients with moderately-high or high risk based on CAC score are recommended to receive preventative medical therapy such as aspirin and statins. The evidence for pharmacotherapy is less robust in patients at intermediate levels of CAC 100-400, with modest benefit for aspirin use; though statins may be reasonable if they are above 75th centile. Aspirin and statins are generally not recommended in patients with CAC <100. Repeat CAC Testing In patients with a CAC of 0, a repeat CAC may be considered in 5 years but not sooner. In patients with positive calcium score, routine re-scanning is not currently recommended. However, an annual increase in CAC of >15% or annual increase of CAC >100 units are predictive of future myocardial infarction and mortality. Cost Effectiveness of CAC Based Primary Prevention Recommendations: There is currently no data in Australia and New Zealand that CAC is cost-effective in informing primary prevention decisions. Given the cost of testing is currently borne entirely by the patient, discussion regarding the implications of CAC results should occur before CAC is recommended and undertaken. Copyright © 2017 Australian and New Zealand Society of Cardiac and Thoracic Surgeons (ANZSCTS) and the Cardiac Society of Australia and New Zealand (CSANZ). Published by Elsevier B.V. All rights reserved.
INTERPRETING PHYSICAL AND BEHAVIORAL HEALTH SCORES FROM NEW WORK DISABILITY INSTRUMENTS
Marfeo, Elizabeth E.; Ni, Pengsheng; Chan, Leighton; Rasch, Elizabeth K.; McDonough, Christine M.; Brandt, Diane E.; Bogusz, Kara; Jette, Alan M.
2015-01-01
Objective To develop a system to guide interpretation of scores generated from 2 new instruments measuring work-related physical and behavioral health functioning (Work Disability – Physical Function (WD-PF) and WD – Behavioral Function (WD-BH)). Design Cross-sectional, secondary data from 3 independent samples to develop and validate the functional levels for physical and behavioral health functioning. Subjects Physical group: 999 general adult subjects, 1,017 disability applicants and 497 work-disabled subjects. Behavioral health group: 1,000 general adult subjects, 1,015 disability applicants and 476 work-disabled subjects. Methods Three-phase analytic approach including item mapping, a modified-Delphi technique, and known-groups validation analysis were used to develop and validate cut-points for functional levels within each of the WD-PF and WD-BH instrument’s scales. Results Four and 5 functional levels were developed for each of the scales in the WD-PF and WD-BH instruments. Distribution of the comparative samples was in the expected direction: the general adult samples consistently demonstrated scores at higher functional levels compared with the claimant and work-disabled samples. Conclusion Using an item-response theory-based methodology paired with a qualitative process appears to be a feasible and valid approach for translating the WD-BH and WD-PF scores into meaningful levels useful for interpreting a person’s work-related physical and behavioral health functioning. PMID:25729901
Di Giacomo, Daniela; Gaildrat, Pascaline; Abuli, Anna; Abdat, Julie; Frébourg, Thierry; Tosi, Mario; Martins, Alexandra
2013-11-01
Exonic variants can alter pre-mRNA splicing either by changing splice sites or by modifying splicing regulatory elements. Often these effects are difficult to predict and are only detected by performing RNA analyses. Here, we analyzed, in a minigene assay, 26 variants identified in the exon 7 of BRCA2, a cancer predisposition gene. Our results revealed eight new exon skipping mutations in this exon: one directly altering the 5' splice site and seven affecting potential regulatory elements. This brings the number of splicing regulatory mutations detected in BRCA2 exon 7 to a total of 11, a remarkably high number considering the total number of variants reported in this exon (n = 36), all tested in our minigene assay. We then exploited this large set of splicing data to test the predictive value of splicing regulator hexamers' scores recently established by Ke et al. (). Comparisons of hexamer-based predictions with our experimental data revealed high sensitivity in detecting variants that increased exon skipping, an important feature for prescreening variants before RNA analysis. In conclusion, hexamer scores represent a promising tool for predicting the biological consequences of exonic variants and may have important applications for the interpretation of variants detected by high-throughput sequencing. © 2013 WILEY PERIODICALS, INC.
Gervais, Roger O; Ben-Porath, Yossef S; Wygant, Dustin B; Green, Paul
2008-12-01
The MMPI-2 Response Bias Scale (RBS) is designed to detect response bias in forensic neuropsychological and disability assessment settings. Validation studies have demonstrated that the scale is sensitive to cognitive response bias as determined by failure on the Word Memory Test (WMT) and other symptom validity tests. Exaggerated memory complaints are a common feature of cognitive response bias. The present study was undertaken to determine the extent to which the RBS is sensitive to memory complaints and how it compares in this regard to other MMPI-2 validity scales and indices. This archival study used MMPI-2 and Memory Complaints Inventory (MCI) data from 1550 consecutive non-head-injury disability-related referrals to the first author's private practice. ANOVA results indicated significant increases in memory complaints across increasing RBS score ranges with large effect sizes. Regression analyses indicated that the RBS was a better predictor of the mean memory complaints score than the F, F(B), and F(P) validity scales and the FBS. There was no correlation between the RBS and the CVLT, an objective measure of verbal memory. These findings suggest that elevated scores on the RBS are associated with over-reporting of memory problems, which provides further external validation of the RBS as a sensitive measure of cognitive response bias. Interpretive guidelines for the RBS are provided.