error scoring system: Topics by Science.gov

Sample records for error scoring system

Intra-Rater and Inter-Rater Reliability of the Balance Error Scoring System in Pre-Adolescent School Children

ERIC Educational Resources Information Center

Sheehan, Dwayne P.; Lafave, Mark R.; Katz, Larry

2011-01-01

This study was designed to test the intra- and inter-rater reliability of the University of North Carolina's Balance Error Scoring System in 9- and 10-year-old children. Additionally, a modified version of the Balance Error Scoring System was tested to determine if it was more sensitive in this population ("raw scores"). Forty-six…
Monte Carlo simulation of expert judgments on human errors in chemical analysis--a case study of ICP-MS.

PubMed

Kuselman, Ilya; Pennecchi, Francesca; Epstein, Malka; Fajgelj, Ales; Ellison, Stephen L R

2014-12-01

Monte Carlo simulation of expert judgments on human errors in a chemical analysis was used for determination of distributions of the error quantification scores (scores of likelihood and severity, and scores of effectiveness of a laboratory quality system in prevention of the errors). The simulation was based on modeling of an expert behavior: confident, reasonably doubting and irresolute expert judgments were taken into account by means of different probability mass functions (pmfs). As a case study, 36 scenarios of human errors which may occur in elemental analysis of geological samples by ICP-MS were examined. Characteristics of the score distributions for three pmfs of an expert behavior were compared. Variability of the scores, as standard deviation of the simulated score values from the distribution mean, was used for assessment of the score robustness. A range of the score values, calculated directly from elicited data and simulated by a Monte Carlo method for different pmfs, was also discussed from the robustness point of view. It was shown that robustness of the scores, obtained in the case study, can be assessed as satisfactory for the quality risk management and improvement of a laboratory quality system against human errors. Copyright © 2014 Elsevier B.V. All rights reserved.
Improvement of the Mair scoring system using structural equations modeling for classifying the diagnostic adequacy of cytology material from thyroid lesions.

PubMed

Kulkarni, H R; Kamal, M M; Arjune, D G

1999-12-01

The scoring system developed by Mair et al. (Acta Cytol 1989;33:809-813) is frequently used to grade the quality of cytology smears. Using a one-factor analytic structural equations model, we demonstrate that the errors in measurement of the parameters used in the Mair scoring system are highly and significantly correlated. We recommend the use of either a multiplicative scoring system, using linear scores, or an additive scoring system, using exponential scores, to correct for the correlated errors. We suggest that the 0, 1, and 2 points used in the Mair scoring system be replaced by 1, 2, and 4, respectively. Using data on fine-needle biopsies of 200 thyroid lesions by both fine-needle aspiration (FNA) and fine-needle capillary sampling (FNC), we demonstrate that our modification of the Mair scoring system is more sensitive and more consistent with the structural equations model. Therefore, we recommend that the modified Mair scoring system be used for classifying the diagnostic adequacy of cytology smears. Diagn. Cytopathol. 1999;21:387-393. Copyright 1999 Wiley-Liss, Inc.
Automated Quantification of the Landing Error Scoring System With a Markerless Motion-Capture System.

PubMed

Mauntel, Timothy C; Padua, Darin A; Stanley, Laura E; Frank, Barnett S; DiStefano, Lindsay J; Peck, Karen Y; Cameron, Kenneth L; Marshall, Stephen W

2017-11-01

The Landing Error Scoring System (LESS) can be used to identify individuals with an elevated risk of lower extremity injury. The limitation of the LESS is that raters identify movement errors from video replay, which is time-consuming and, therefore, may limit its use by clinicians. A markerless motion-capture system may be capable of automating LESS scoring, thereby removing this obstacle. To determine the reliability of an automated markerless motion-capture system for scoring the LESS. Cross-sectional study. United States Military Academy. A total of 57 healthy, physically active individuals (47 men, 10 women; age = 18.6 ± 0.6 years, height = 174.5 ± 6.7 cm, mass = 75.9 ± 9.2 kg). Participants completed 3 jump-landing trials that were recorded by standard video cameras and a depth camera. Their movement quality was evaluated by expert LESS raters (standard video recording) using the LESS rubric and by software that automates LESS scoring (depth-camera data). We recorded an error for a LESS item if it was present on at least 2 of 3 jump-landing trials. We calculated κ statistics, prevalence- and bias-adjusted κ (PABAK) statistics, and percentage agreement for each LESS item. Interrater reliability was evaluated between the 2 expert rater scores and between a consensus expert score and the markerless motion-capture system score. We observed reliability between the 2 expert LESS raters (average κ = 0.45 ± 0.35, average PABAK = 0.67 ± 0.34; percentage agreement = 0.83 ± 0.17). The markerless motion-capture system had similar reliability with consensus expert scores (average κ = 0.48 ± 0.40, average PABAK = 0.71 ± 0.27; percentage agreement = 0.85 ± 0.14). However, reliability was poor for 5 LESS items in both LESS score comparisons. A markerless motion-capture system had the same level of reliability as expert LESS raters, suggesting that an automated system can accurately assess movement. Therefore, clinicians can use the markerless motion-capture system to reliably score the LESS without being limited by the time requirements of manual LESS scoring.
Towards reporting standards for neuropsychological study results: A proposal to minimize communication errors with standardized qualitative descriptors for normalized test scores.

PubMed

Schoenberg, Mike R; Rum, Ruba S

2017-11-01

Rapid, clear and efficient communication of neuropsychological results is essential to benefit patient care. Errors in communication are a lead cause of medical errors; nevertheless, there remains a lack of consistency in how neuropsychological scores are communicated. A major limitation in the communication of neuropsychological results is the inconsistent use of qualitative descriptors for standardized test scores and the use of vague terminology. PubMed search from 1 Jan 2007 to 1 Aug 2016 to identify guidelines or consensus statements for the description and reporting of qualitative terms to communicate neuropsychological test scores was conducted. The review found the use of confusing and overlapping terms to describe various ranges of percentile standardized test scores. In response, we propose a simplified set of qualitative descriptors for normalized test scores (Q-Simple) as a means to reduce errors in communicating test results. The Q-Simple qualitative terms are: 'very superior', 'superior', 'high average', 'average', 'low average', 'borderline' and 'abnormal/impaired'. A case example illustrates the proposed Q-Simple qualitative classification system to communicate neuropsychological results for neurosurgical planning. The Q-Simple qualitative descriptor system is aimed as a means to improve and standardize communication of standardized neuropsychological test scores. Research are needed to further evaluate neuropsychological communication errors. Conveying the clinical implications of neuropsychological results in a manner that minimizes risk for communication errors is a quintessential component of evidence-based practice. Copyright © 2017 Elsevier B.V. All rights reserved.
Evaluating the Effective Factors for Reporting Medical Errors among Midwives Working at Teaching Hospitals Affiliated to Isfahan University of Medical Sciences.

PubMed

Khorasani, Fahimeh; Beigi, Marjan

2017-01-01

Recently, evaluation and accreditation system of hospitals has had a special emphasis on reporting malpractices and sharing errors or lessons learnt from errors, but still due to lack of promotion of systematic approach for solving problems from the same system, this issue has remained unattended. This study was conducted to determine the effective factors for reporting medical errors among midwives. This project was a descriptive cross-sectional observational study. Data gathering tools were a standard checklist and two researcher-made questionnaires. Sampling for this study was conducted from all the midwives who worked at teaching hospitals affiliated to Isfahan University of Medical Sciences through census method (convenient) and lasted for 3 months. Data were analyzed using descriptive and inferential statistics through SPSS 16. Results showed that 79.1% of the staff reported errors and the highest rate of errors was in the process of patients' tests. In this study, the mean score of midwives' knowledge about the errors was 79.1 and the mean score of their attitude toward reporting errors was 70.4. There was a direct relation between the score of errors' knowledge and attitude in the midwifery staff and reporting errors. Based on the results of this study about the appropriate knowledge and attitude of midwifery staff regarding errors and action toward reporting them, it is recommended to strengthen the system when it comes to errors and hospitals risks.
Error Rates in Measuring Teacher and School Performance Based on Student Test Score Gains. NCEE 2010-4004

ERIC Educational Resources Information Center

Schochet, Peter Z.; Chiang, Hanley S.

2010-01-01

This paper addresses likely error rates for measuring teacher and school performance in the upper elementary grades using value-added models applied to student test score gain data. Using realistic performance measurement system schemes based on hypothesis testing, we develop error rate formulas based on OLS and Empirical Bayes estimators.…
Associations between communication climate and the frequency of medical error reporting among pharmacists within an inpatient setting.

PubMed

Patterson, Mark E; Pace, Heather A; Fincham, Jack E

2013-09-01

Although error-reporting systems enable hospitals to accurately track safety climate through the identification of adverse events, these systems may be underused within a work climate of poor communication. The objective of this analysis is to identify the extent to which perceived communication climate among hospital pharmacists impacts medical error reporting rates. This cross-sectional study used survey responses from more than 5000 pharmacists responding to the 2010 Hospital Survey on Patient Safety Culture (HSOPSC). Two composite scores were constructed for "communication openness" and "feedback and about error," respectively. Error reporting frequency was defined from the survey question, "In the past 12 months, how many event reports have you filled out and submitted?" Multivariable logistic regressions were used to estimate the likelihood of medical error reporting conditional upon communication openness or feedback levels, controlling for pharmacist years of experience, hospital geographic region, and ownership status. Pharmacists with higher communication openness scores compared with lower scores were 40% more likely to have filed or submitted a medical error report in the past 12 months (OR, 1.4; 95% CI, 1.1-1.7; P = 0.004). In contrast, pharmacists with higher communication feedback scores were not any more likely than those with lower scores to have filed or submitted a medical report in the past 12 months (OR, 1.0; 95% CI, 0.8-1.3; P = 0.97). Hospital work climates that encourage pharmacists to freely communicate about problems related to patient safety is conducive to medical error reporting. The presence of feedback infrastructures about error may not be sufficient to induce error-reporting behavior.
Syntactic error modeling and scoring normalization in speech recognition

NASA Technical Reports Server (NTRS)

Olorenshaw, Lex

1991-01-01

The objective was to develop the speech recognition system to be able to detect speech which is pronounced incorrectly, given that the text of the spoken speech is known to the recognizer. Research was performed in the following areas: (1) syntactic error modeling; (2) score normalization; and (3) phoneme error modeling. The study into the types of errors that a reader makes will provide the basis for creating tests which will approximate the use of the system in the real world. NASA-Johnson will develop this technology into a 'Literacy Tutor' in order to bring innovative concepts to the task of teaching adults to read.
Accuracy and Efficiency of Recording Pediatric Early Warning Scores Using an Electronic Physiological Surveillance System Compared With Traditional Paper-Based Documentation.

PubMed

Sefton, Gerri; Lane, Steven; Killen, Roger; Black, Stuart; Lyon, Max; Ampah, Pearl; Sproule, Cathryn; Loren-Gosling, Dominic; Richards, Caitlin; Spinty, Jean; Holloway, Colette; Davies, Coral; Wilson, April; Chean, Chung Shen; Carter, Bernie; Carrol, E D

2017-05-01

Pediatric Early Warning Scores are advocated to assist health professionals to identify early signs of serious illness or deterioration in hospitalized children. Scores are derived from the weighting applied to recorded vital signs and clinical observations reflecting deviation from a predetermined "norm." Higher aggregate scores trigger an escalation in care aimed at preventing critical deterioration. Process errors made while recording these data, including plotting or calculation errors, have the potential to impede the reliability of the score. To test this hypothesis, we conducted a controlled study of documentation using five clinical vignettes. We measured the accuracy of vital sign recording, score calculation, and time taken to complete documentation using a handheld electronic physiological surveillance system, VitalPAC Pediatric, compared with traditional paper-based charts. We explored the user acceptability of both methods using a Web-based survey. Twenty-three staff participated in the controlled study. The electronic physiological surveillance system improved the accuracy of vital sign recording, 98.5% versus 85.6%, P < .02, Pediatric Early Warning Score calculation, 94.6% versus 55.7%, P < .02, and saved time, 68 versus 98 seconds, compared with paper-based documentation, P < .002. Twenty-nine staff completed the Web-based survey. They perceived that the electronic physiological surveillance system offered safety benefits by reducing human error while providing instant visibility of recorded data to the entire clinical team.
Correlation of Head Impacts to Change in Balance Error Scoring System Scores in Division I Men's Lacrosse Players.

PubMed

Miyashita, Theresa L; Diakogeorgiou, Eleni; Marrie, Kaitlyn

Investigation into the effect of cumulative subconcussive head impacts has yielded various results in the literature, with many supporting a link to neurological deficits. Little research has been conducted on men's lacrosse and associated balance deficits from head impacts. (1) Athletes will commit more errors on the postseason Balance Error Scoring System (BESS) test. (2) There will be a positive correlation to change in BESS scores and head impact exposure data. Prospective longitudinal study. Level 3. Thirty-four Division I men's lacrosse players (age, 19.59 ± 1.42 years) wore helmets instrumented with a sensor to collect head impact exposure data over the course of a competitive season. Players completed a BESS test at the start and end of the competitive season. The number of errors from pre- to postseason increased during the double-leg stance on foam ( P < 0.001), tandem stance on foam ( P = 0.009), total number of errors on a firm surface ( P = 0.042), and total number of errors on a foam surface ( P = 0.007). There were significant correlations only between the total errors on a foam surface and linear acceleration ( P = 0.038, r = 0.36), head injury criteria ( P = 0.024, r = 0.39), and Gadd Severity Index scores ( P = 0.031, r = 0.37). Changes in the total number of errors on a foam surface may be considered a sensitive measure to detect balance deficits associated with cumulative subconcussive head impacts sustained over the course of 1 lacrosse season, as measured by average linear acceleration, head injury criteria, and Gadd Severity Index scores. If there is microtrauma to the vestibular system due to repetitive subconcussive impacts, only an assessment that highly stresses the vestibular system may be able to detect these changes. Cumulative subconcussive impacts may result in neurocognitive dysfunction, including balance deficits, which are associated with an increased risk for injury. The development of a strategy to reduce total number of head impacts may curb the associated sequelae. Incorporation of a modified BESS test, firm surface only, may not be recommended as it may not detect changes due to repetitive impacts over the course of a competitive season.
Identification of patient information corruption in the intensive care unit: using a scoring tool to direct quality improvements in handover.

PubMed

Pickering, Brian W; Hurley, Killian; Marsh, Brian

2009-11-01

To use a handover assessment tool for identifying patient information corruption and objectively evaluating interventions designed to reduce handover errors and improve medical decision making. The continuous monitoring, intervention, and evaluation of the patient in modern intensive care unit practice generates large quantities of information, the platform on which medical decisions are made. Information corruption, defined as errors of distortion/omission compared with the medical record, may result in medical judgment errors. Identifying these errors may lead to quality improvements in intensive care unit care delivery and safety. Handover assessment instrument development study divided into two phases by the introduction of a handover intervention. Closed, 17-bed, university-affiliated mixed surgical/medical intensive care unit. Senior and junior medical members of the intensive care unit team. Electronic handover page. Study subjects were asked to recall clinical information commonly discussed at handover on individual patients. The handover score measured the percentage of information correctly retained for each individual doctor-patient interaction. The clinical intention score, a subjective measure of medical judgment, was graded (1-5) by three blinded intensive care unit experts. A total of 137 interactions were scored. Median (interquartile range) handover scores for phases 1 and 2 were 79.07% (67.44-84.50) and 83.72% (76.16-88.37), respectively. Score variance was reduced by the handover intervention (p < .05). Increasing median handover scores, 68.60 to 83.72, were associated with increases in clinical intention scores from 1 to 5 (chi-square = 23.59, df = 4, p < .0001). When asked to recall clinical information discussed at handover, medical members of the intensive care unit team provide data that are significantly corrupted compared with the medical record. Low subjective clinical judgment scores are significant associated with low handover scores. The handover/clinical intention scores may, therefore, be useful screening tools for intensive care unit system vulnerability to medical error. Additionally, handover instruments can identify interventions that reduce system vulnerability to error and may be used to guide quality improvements in handover practice.
Qualitative and quantitative assessment of degeneration of cervical intervertebral discs and facet joints.

PubMed

Walraevens, Joris; Liu, Baoge; Meersschaert, Joke; Demaerel, Philippe; Delye, Hans; Depreitere, Bart; Vander Sloten, Jos; Goffin, Jan

2009-03-01

Degeneration of intervertebral discs and facet joints is one of the most frequently encountered spinal disorders. In order to describe and quantify degeneration and evaluate a possible relationship between degeneration and biomechanical parameters, e.g., the intervertebral range of motion and intradiscal pressure, a scoring system for degeneration is mandatory. However, few scoring systems for the assessment of degeneration of the cervical spine exist. Therefore, two separate objective scoring systems to qualitatively and quantitatively assess the degree of cervical intervertebral disc and facet joint degeneration were developed and validated. The scoring system for cervical disc degeneration consists of three variables which are individually scored on neutral lateral radiographs: "height loss" (0-4 points), "anterior osteophytes" (0-3 points) and "endplate sclerosis" (0-2 points). The scoring system for facet joint degeneration consists of four variables which are individually scored on neutral computed tomography scans: "hypertrophy" (0-2 points), "osteophytes" (0-1 point), "irregularity" on the articular surface (0-1 point) and "joint space narrowing" (0-1 point). Each variable contributes with varying importance to the overall degeneration score (max 9 points for the scoring system of cervical disc degeneration and max 5 points for facet joint degeneration). Degeneration of 20 discs and facet joints of 20 patients was blindly assessed by four raters: two neurosurgeons (one senior and one junior) and two radiologists (one senior and one junior), firstly based on first subjective impression and secondly using the scoring systems. Measurement errors and inter- and intra-rater agreement were determined. The measurement error of the scoring system for cervical disc degeneration was 11.1 versus 17.9% of the subjective impression results. This scoring system showed excellent intra-rater agreement (ICC = 0.86, 0.75-0.93) and excellent inter-rater agreement (ICC = 0.78, 0.64-0.88). Surgeons as well as radiologists and seniors as well as juniors obtained excellent inter- and intra-rater agreement. The measurement error of the scoring system for cervical facet joint degeneration was 20.1 versus 24.2% of the subjective impression results. This scoring system showed good intra-rater agreement (ICC = 0.71, 0.42-0.89) and fair inter-rater agreement (ICC = 0.49, 0.26-0.74). Both scoring systems fulfilled the criteria for recommendation proposed by Kettler and Wilke. Our scoring systems can be reliable and objective tools for assessing cervical disc and facet joint degeneration. Moreover, the scoring system of cervical disc degeneration was shown to be experience- and discipline-independent.
Interobserver Reliability of the Total Body Score System for Quantifying Human Decomposition.

PubMed

Dabbs, Gretchen R; Connor, Melissa; Bytheway, Joan A

2016-03-01

Several authors have tested the accuracy of the Total Body Score (TBS) method for quantifying decomposition, but none have examined the reliability of the method as a scoring system by testing interobserver error rates. Sixteen participants used the TBS system to score 59 observation packets including photographs and written descriptions of 13 human cadavers in different stages of decomposition (postmortem interval: 2-186 days). Data analysis used a two-way random model intraclass correlation in SPSS (v. 17.0). The TBS method showed "almost perfect" agreement between observers, with average absolute correlation coefficients of 0.990 and average consistency correlation coefficients of 0.991. While the TBS method may have sources of error, scoring reliability is not one of them. Individual component scores were examined, and the influences of education and experience levels were investigated. Overall, the trunk component scores were the least concordant. Suggestions are made to improve the reliability of the TBS method. © 2016 American Academy of Forensic Sciences.
WE-H-BRC-09: Simulated Errors in Mock Radiotherapy Plans to Quantify the Effectiveness of the Physics Plan Review

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gopan, O; Kalet, A; Smith, W

2016-06-15

Purpose: A standard tool for ensuring the quality of radiation therapy treatments is the initial physics plan review. However, little is known about its performance in practice. The goal of this study is to measure the effectiveness of physics plan review by introducing simulated errors into “mock” treatment plans and measuring the performance of plan review by physicists. Methods: We generated six mock treatment plans containing multiple errors. These errors were based on incident learning system data both within the department and internationally (SAFRON). These errors were scored for severity and frequency. Those with the highest scores were included inmore » the simulations (13 errors total). Observer bias was minimized using a multiple co-correlated distractor approach. Eight physicists reviewed these plans for errors, with each physicist reviewing, on average, 3/6 plans. The confidence interval for the proportion of errors detected was computed using the Wilson score interval. Results: Simulated errors were detected in 65% of reviews [51–75%] (95% confidence interval [CI] in brackets). The following error scenarios had the highest detection rates: incorrect isocenter in DRRs/CBCT (91% [73–98%]) and a planned dose different from the prescribed dose (100% [61–100%]). Errors with low detection rates involved incorrect field parameters in record and verify system (38%, [18–61%]) and incorrect isocenter localization in planning system (29% [8–64%]). Though pre-treatment QA failure was reliably identified (100%), less than 20% of participants reported the error that caused the failure. Conclusion: This is one of the first quantitative studies of error detection. Although physics plan review is a key safety measure and can identify some errors with high fidelity, others errors are more challenging to detect. This data will guide future work on standardization and automation. Creating new checks or improving existing ones (i.e., via automation) will help in detecting those errors with low detection rates.« less
Accuracy and Efficiency of Recording Pediatric Early Warning Scores Using an Electronic Physiological Surveillance System Compared With Traditional Paper-Based Documentation

PubMed Central

Sefton, Gerri; Lane, Steven; Killen, Roger; Black, Stuart; Lyon, Max; Ampah, Pearl; Sproule, Cathryn; Loren-Gosling, Dominic; Richards, Caitlin; Spinty, Jean; Holloway, Colette; Davies, Coral; Wilson, April; Chean, Chung Shen; Carter, Bernie; Carrol, E.D.

2017-01-01

Pediatric Early Warning Scores are advocated to assist health professionals to identify early signs of serious illness or deterioration in hospitalized children. Scores are derived from the weighting applied to recorded vital signs and clinical observations reflecting deviation from a predetermined “norm.” Higher aggregate scores trigger an escalation in care aimed at preventing critical deterioration. Process errors made while recording these data, including plotting or calculation errors, have the potential to impede the reliability of the score. To test this hypothesis, we conducted a controlled study of documentation using five clinical vignettes. We measured the accuracy of vital sign recording, score calculation, and time taken to complete documentation using a handheld electronic physiological surveillance system, VitalPAC Pediatric, compared with traditional paper-based charts. We explored the user acceptability of both methods using a Web-based survey. Twenty-three staff participated in the controlled study. The electronic physiological surveillance system improved the accuracy of vital sign recording, 98.5% versus 85.6%, P < .02, Pediatric Early Warning Score calculation, 94.6% versus 55.7%, P < .02, and saved time, 68 versus 98 seconds, compared with paper-based documentation, P < .002. Twenty-nine staff completed the Web-based survey. They perceived that the electronic physiological surveillance system offered safety benefits by reducing human error while providing instant visibility of recorded data to the entire clinical team. PMID:27832032
Reliability and Construct Validity of Limits of Stability Test in Adolescents Using a Portable Forceplate System.

PubMed

Alsalaheen, Bara; Haines, Jamie; Yorke, Amy; Broglio, Steven P

2015-12-01

To examine the reliability, convergent, and discriminant validity of the limits of stability (LOS) test to assess dynamic postural stability in adolescents using a portable forceplate system. Cross-sectional reliability observational study. School setting. Adolescents (N=36) completed all measures during the first session. To examine the reliability of the LOS test, a subset of 15 participants repeated the LOS test after 1 week. Not applicable. Outcome measurements included the LOS test, Balance Error Scoring System, Instrumented Balance Error Scoring System, and Modified Clinical Test for Sensory Interaction on Balance. A significant relation was observed among LOS composite scores (r=.36-.87, P<.05). However, no relation was observed between LOS and static balance outcome measurements. The reliability of the LOS composite scores ranged from moderate to good (intraclass correlation coefficient model 2,1=.73-.96). The results suggest that the LOS composite scores provide unique information about dynamic postural stability, and the LOS test completed at 100% of the theoretical limit appeared to be a reliable test of dynamic postural stability in adolescents. Clinicians should use dynamic balance measurement as part of their balance assessment and should not use static balance testing (eg, Balance Error Scoring System) to make inferences about dynamic balance, especially when balance assessment is used to determine rehabilitation outcomes, or when making return to play decisions after injury. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Longitudinal Improvement in Balance Error Scoring System Scores among NCAA Division-I Football Athletes.

PubMed

Mathiasen, Ross; Hogrefe, Christopher; Harland, Kari; Peterson, Andrew; Smoot, M Kyle

2018-02-15

The Balance Error Scoring System (BESS) is a commonly used concussion assessment tool. Recent studies have questioned the stability and reliability of baseline BESS scores. The purpose of this longitudinal prospective cohort study is to examine differences in yearly baseline BESS scores in athletes participating on an NCAA Division-I football team. NCAA Division-I freshman football athletes were videotaped performing the BESS test at matriculation and after 1 year of participation in the football program. Twenty-three athletes were enrolled in year 1 of the study, and 25 athletes were enrolled in year 2. Those athletes enrolled in year 1 were again videotaped after year 2 of the study. The paired t-test was used to assess for change in score over time for the firm surface, foam surface, and the cumulative BESS score. Additionally, inter- and intrarater reliability values were calculated. Cumulative errors on the BESS significantly decreased from a mean of 20.3 at baseline to 16.8 after 1 year of participation. The mean number of errors following the second year of participation was 15.0. Inter-rater reliability for the cumulative score ranged from 0.65 to 0.75. Intrarater reliability was 0.81. After 1 year of participation, there is a statistically and clinically significant improvement in BESS scores in an NCAA Division-I football program. Although additional improvement in BESS scores was noted after a second year of participation, it did not reach statistical significance. Football athletes should undergo baseline BESS testing at least yearly if the BESS is to be optimally useful as a diagnostic test for concussion.
Examining the Relationship Between the Functional Movement Screen and the Landing Error Scoring System in an Active, Male Collegiate Population.

PubMed

Everard, Eoin M; Harrison, Andrew J; Lyons, Mark

2017-05-01

Everard, EM, Harrison, AJ, and Lyons, M. Examining the relationship between the functional movement screen and the landing error scoring system in an active, male collegiate population. J Strength Cond Res 31(5): 1265-1272, 2017-In recent years, there has been an increasing focus on movement screening as the principal aspect of preparticipation testing. Two of the most common movement screening tools are the Functional Movement Screen (FMS) and the Landing Error Scoring System (LESS). Several studies have examined the reliability and validity of these tools, but so far, there have been no studies comparing the results of these 2 screening tools against each other. Therefore, the purpose of this study was to determine the relationship between FMS scores and LESS scores. Ninety-eight male college athletes actively competing in sport (Gaelic games, soccer, athletics, boxing/mixed martial arts, Olympic weightlifting) participated in the study and performed the FMS and LESS screens. Both the 21-point and 100-point scoring systems were used to score the FMS. Spearman's correlation coefficients were used to determine the relationship between the 2 screening scores. The results showed a significant moderate correlation between FMS and LESS scores (rho 100 and 21 point = -0.528; -0.487; p < 0.001). In addition, r values of 0.26 and 0.23 indicate a poor shared variance between the 2 screens. The results indicate that performing well in one of the screens does not necessarily equate to performing well in the other. This has practical implications as it highlights that both screens may assess different movement patterns and should not be used as a substitute for each other.
Syntactic error modeling and scoring normalization in speech recognition: Error modeling and scoring normalization in the speech recognition task for adult literacy training

NASA Technical Reports Server (NTRS)

Olorenshaw, Lex; Trawick, David

1991-01-01

The purpose was to develop a speech recognition system to be able to detect speech which is pronounced incorrectly, given that the text of the spoken speech is known to the recognizer. Better mechanisms are provided for using speech recognition in a literacy tutor application. Using a combination of scoring normalization techniques and cheater-mode decoding, a reasonable acceptance/rejection threshold was provided. In continuous speech, the system was tested to be able to provide above 80 pct. correct acceptance of words, while correctly rejecting over 80 pct. of incorrectly pronounced words.

Anatomic, clinical, and neuropsychological correlates of spelling errors in primary progressive aphasia.

PubMed

Shim, Hyungsub; Hurley, Robert S; Rogalski, Emily; Mesulam, M-Marsel

2012-07-01

This study evaluates spelling errors in the three subtypes of primary progressive aphasia (PPA): agrammatic (PPA-G), logopenic (PPA-L), and semantic (PPA-S). Forty-one PPA patients and 36 age-matched healthy controls were administered a test of spelling. The total number of errors and types of errors in spelling to dictation of regular words, exception words and nonwords, were recorded. Error types were classified based on phonetic plausibility. In the first analysis, scores were evaluated by clinical diagnosis. Errors in spelling exception words and phonetically plausible errors were seen in PPA-S. Conversely, PPA-G was associated with errors in nonword spelling and phonetically implausible errors. In the next analysis, spelling scores were correlated to other neuropsychological language test scores. Significant correlations were found between exception word spelling and measures of naming and single word comprehension. Nonword spelling correlated with tests of grammar and repetition. Global language measures did not correlate significantly with spelling scores, however. Cortical thickness analysis based on MRI showed that atrophy in several language regions of interest were correlated with spelling errors. Atrophy in the left supramarginal gyrus and inferior frontal gyrus (IFG) pars orbitalis correlated with errors in nonword spelling, while thinning in the left temporal pole and fusiform gyrus correlated with errors in exception word spelling. Additionally, phonetically implausible errors in regular word spelling correlated with thinning in the left IFG pars triangularis and pars opercularis. Together, these findings suggest two independent systems for spelling to dictation, one phonetic (phoneme to grapheme conversion), and one lexical (whole word retrieval). Copyright © 2012 Elsevier Ltd. All rights reserved.
A Risk Score Model for Evaluation and Management of Patients with Thyroid Nodules.

PubMed

Zhang, Yongwen; Meng, Fanrong; Hong, Lianqing; Chu, Lanfang

2018-06-12

The study is aimed to establish a simplified and practical tool for analyzing thyroid nodules. A novel risk score model was designed, risk factors including patient history, patient characteristics, physical examination, symptoms of compression, thyroid function, ultrasonography (US) of thyroid and cervical lymph nodes were evaluated and classified into high risk factors, intermediate risk factors, and low risk factors. A total of 243 thyroid nodules in 162 patients were assessed with risk score system and Thyroid Imaging-Reporting and Data System (TI-RADS). The diagnostic performance of risk score system and TI-RADS was compared. The accuracy in the diagnosis of thyroid nodules was 89.3% for risk score system, 74.9% for TI-RADS respectively. The specificity, accuracy and positive predictive value (PPV) of risk score system were significantly higher than the TI-RADS system (χ 2 =26.287, 17.151, 11.983; p <0.05), statistically significant differences were not observed in the sensitivity and negative predictive value (NPV) between the risk score system and TI-RADS (χ 2 =1.276, 0.290; p>0.05). The area under the curve (AUC) for risk score diagnosis system was 0.963, standard error 0.014, 95% confidence interval (CI)=0.934-0.991, the AUC for TI-RADS diagnosis system was 0.912 with standard error 0.021, 95% CI=0.871-0.953, the AUC for risk score system was significantly different from that of TI-RADS (Z=2.02; p <0.05). Risk score model is a reliable, simplified and cost-effective diagnostic tool used in diagnosis of thyroid cancer. The higher the score is, the higher the risk of malignancy will be. © Georg Thieme Verlag KG Stuttgart · New York.
Can binary early warning scores perform as well as standard early warning scores for discriminating a patient's risk of cardiac arrest, death or unanticipated intensive care unit admission?

PubMed

Jarvis, Stuart; Kovacs, Caroline; Briggs, Jim; Meredith, Paul; Schmidt, Paul E; Featherstone, Peter I; Prytherch, David R; Smith, Gary B

2015-08-01

Although the weightings to be summed in an early warning score (EWS) calculation are small, calculation and other errors occur frequently, potentially impacting on hospital efficiency and patient care. Use of a simpler EWS has the potential to reduce errors. We truncated 36 published 'standard' EWSs so that, for each component, only two scores were possible: 0 when the standard EWS scored 0 and 1 when the standard EWS scored greater than 0. Using 1564,153 vital signs observation sets from 68,576 patient care episodes, we compared the discrimination (measured using the area under the receiver operator characteristic curve--AUROC) of each standard EWS and its truncated 'binary' equivalent. The binary EWSs had lower AUROCs than the standard EWSs in most cases, although for some the difference was not significant. One system, the binary form of the National Early Warning System (NEWS), had significantly better discrimination than all standard EWSs, except for NEWS. Overall, Binary NEWS at a trigger value of 3 would detect as many adverse outcomes as are detected by NEWS using a trigger of 5, but would require a 15% higher triggering rate. The performance of Binary NEWS is only exceeded by that of standard NEWS. It may be that Binary NEWS, as a simplified system, can be used with fewer errors. However, its introduction could lead to significant increases in workload for ward and rapid response team staff. The balance between fewer errors and a potentially greater workload needs further investigation. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
The Automated Assessment of Postural Stability: Balance Detection Algorithm.

PubMed

Napoli, Alessandro; Glass, Stephen M; Tucker, Carole; Obeid, Iyad

2017-12-01

Impaired balance is a common indicator of mild traumatic brain injury, concussion and musculoskeletal injury. Given the clinical relevance of such injuries, especially in military settings, it is paramount to develop more accurate and reliable on-field evaluation tools. This work presents the design and implementation of the automated assessment of postural stability (AAPS) system, for on-field evaluations following concussion. The AAPS is a computer system, based on inexpensive off-the-shelf components and custom software, that aims to automatically and reliably evaluate balance deficits, by replicating a known on-field clinical test, namely, the Balance Error Scoring System (BESS). The AAPS main innovation is its balance error detection algorithm that has been designed to acquire data from a Microsoft Kinect ® sensor and convert them into clinically-relevant BESS scores, using the same detection criteria defined by the original BESS test. In order to assess the AAPS balance evaluation capability, a total of 15 healthy subjects (7 male, 8 female) were required to perform the BESS test, while simultaneously being tracked by a Kinect 2.0 sensor and a professional-grade motion capture system (Qualisys AB, Gothenburg, Sweden). High definition videos with BESS trials were scored off-line by three experienced observers for reference scores. AAPS performance was assessed by comparing the AAPS automated scores to those derived by three experienced observers. Our results show that the AAPS error detection algorithm presented here can accurately and precisely detect balance deficits with performance levels that are comparable to those of experienced medical personnel. Specifically, agreement levels between the AAPS algorithm and the human average BESS scores ranging between 87.9% (single-leg on foam) and 99.8% (double-leg on firm ground) were detected. Moreover, statistically significant differences in balance scores were not detected by an ANOVA test with alpha equal to 0.05. Despite some level of disagreement between human and AAPS-generated scores, the use of an automated system yields important advantages over currently available human-based alternatives. These results underscore the value of using the AAPS, that can be quickly deployed in the field and/or in outdoor settings with minimal set-up time. Finally, the AAPS can record multiple error types and their time course with extremely high temporal resolution. These features are not achievable by humans, who cannot keep track of multiple balance errors with such a high resolution. Together, these results suggest that computerized BESS calculation may provide more accurate and consistent measures of balance than those derived from human experts.
Does Field Reliability for Static-99 Scores Decrease as Scores Increase?

PubMed Central

Rice, Amanda K.; Boccaccini, Marcus T.; Harris, Paige B.; Hawes, Samuel W.

2015-01-01

This study examined the field reliability of Static-99 (Hanson & Thornton, 2000) scores among 21,983 sex offenders and focused on whether rater agreement decreased as scores increased. As expected, agreement was lowest for high-scoring offenders. Initial and most recent Static-99 scores were identical for only about 40% of offenders who had been assigned a score of 6 during their initial evaluations, but for more than 60% of offenders who had been assigned a score of 2 or lower. In addition, the size of the difference between scores increased as scores increased, with pairs of scores differing by 2 or more points for about 30% of offenders scoring in the high-risk range. Because evaluators and systems use high Static-99 scores to identify sexual offenders who may require intensive supervision or even postrelease civil commitment, it is important to recognize that there may be more measurement error for high scores than low scores and to consider adopting procedures for minimizing or accounting for measurement error. PMID:24932647
SU-E-T-192: FMEA Severity Scores - Do We Really Know?

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tonigan, J; Johnson, J; Kry, S

2014-06-01

Purpose: Failure modes and effects analysis (FMEA) is a subjective risk mitigation technique that has not been applied to physics-specific quality management practices. There is a need for quantitative FMEA data as called for in the literature. This work focuses specifically on quantifying FMEA severity scores for physics components of IMRT delivery and comparing to subjective scores. Methods: Eleven physical failure modes (FMs) for head and neck IMRT dose calculation and delivery are examined near commonly accepted tolerance criteria levels. Phantom treatment planning studies and dosimetry measurements (requiring decommissioning in several cases) are performed to determine the magnitude of dosemore » delivery errors for the FMs (i.e., severity of the FM). Resultant quantitative severity scores are compared to FMEA scores obtained through an international survey and focus group studies. Results: Physical measurements for six FMs have resulted in significant PTV dose errors up to 4.3% as well as close to 1 mm significant distance-to-agreement error between PTV and OAR. Of the 129 survey responses, the vast majority of the responders used Varian machines with Pinnacle and Eclipse planning systems. The average years of experience was 17, yet familiarity with FMEA less than expected. Survey reports perception of dose delivery error magnitude varies widely, in some cases 50% difference in dose delivery error expected amongst respondents. Substantial variance is also seen for all FMs in occurrence, detectability, and severity scores assigned with average variance values of 5.5, 4.6, and 2.2, respectively. Survey shows for MLC positional FM(2mm) average of 7.6% dose error expected (range 0–50%) compared to 2% error seen in measurement. Analysis of ranking in survey, treatment planning studies, and quantitative value comparison will be presented. Conclusion: Resultant quantitative severity scores will expand the utility of FMEA for radiotherapy and verify accuracy of FMEA results compared to highly variable subjective scores.« less
Nurses' systems thinking competency, medical error reporting, and the occurrence of adverse events: a cross-sectional study.

PubMed

Hwang, Jee-In; Park, Hyeoun-Ae

2017-12-01

Healthcare professionals' systems thinking is emphasized for patient safety. To report nurses' systems thinking competency, and its relationship with medical error reporting and the occurrence of adverse events. A cross-sectional survey using a previously validated Systems Thinking Scale (STS), was conducted. Nurses from two teaching hospitals were invited to participate in the survey. There were 407 (60.3%) completed surveys. The mean STS score was 54.5 (SD 7.3) out of 80. Nurses with higher STS scores were more likely to report medical errors (odds ratio (OR) = 1.05; 95% confidence interval (CI) = 1.02-1.08) and were less likely to be involved in the occurrence of adverse events (OR = 0.96; 95% CI = 0.93-0.98). Nurses showed moderate systems thinking competency. Systems thinking was a significant factor associated with patient safety. Impact Statement: The findings of this study highlight the importance of enhancing nurses' systems thinking capacity to promote patient safety.
Decreasing scoring errors on Wechsler Scale Vocabulary, Comprehension, and Similarities subtests: a preliminary study.

PubMed

Linger, Michele L; Ray, Glen E; Zachar, Peter; Underhill, Andrea T; LoBello, Steven G

2007-10-01

Studies of graduate students learning to administer the Wechsler scales have generally shown that training is not associated with the development of scoring proficiency. Many studies report on the reduction of aggregated administration and scoring errors, a strategy that does not highlight the reduction of errors on subtests identified as most prone to error. This study evaluated the development of scoring proficiency specifically on the Wechsler (WISC-IV and WAIS-III) Vocabulary, Comprehension, and Similarities subtests during training by comparing a set of 'early test administrations' to 'later test administrations.' Twelve graduate students enrolled in an intelligence-testing course participated in the study. Scoring errors (e.g., incorrect point assignment) were evaluated on the students' actual practice administration test protocols. Errors on all three subtests declined significantly when scoring errors on 'early' sets of Wechsler scales were compared to those made on 'later' sets. However, correcting these subtest scoring errors did not cause significant changes in subtest scaled scores. Implications for clinical instruction and future research are discussed.
Automated Error Detection in Physiotherapy Training.

PubMed

Jovanović, Marko; Seiffarth, Johannes; Kutafina, Ekaterina; Jonas, Stephan M

2018-01-01

Manual skills teaching, such as physiotherapy education, requires immediate teacher feedback for the students during the learning process, which to date can only be performed by expert trainers. A machine-learning system trained only on correct performances to classify and score performed movements, to identify sources of errors in the movement and give feedback to the learner. We acquire IMU and sEMG sensor data from a commercial-grade wearable device and construct an HMM-based model for gesture classification, scoring and feedback giving. We evaluate the model on publicly available and self-generated data of an exemplary movement pattern executions. The model achieves an overall accuracy of 90.71% on the public dataset and 98.9% on our dataset. An AUC of 0.99 for the ROC of the scoring method could be achieved to discriminate between correct and untrained incorrect executions. The proposed system demonstrated its suitability for scoring and feedback in manual skills training.
What Are Error Rates for Classifying Teacher and School Performance Using Value-Added Models?

ERIC Educational Resources Information Center

Schochet, Peter Z.; Chiang, Hanley S.

2013-01-01

This article addresses likely error rates for measuring teacher and school performance in the upper elementary grades using value-added models applied to student test score gain data. Using a realistic performance measurement system scheme based on hypothesis testing, the authors develop error rate formulas based on ordinary least squares and…
Observation chart design features affect the detection of patient deterioration: a systematic experimental evaluation.

PubMed

Christofidis, Melany J; Hill, Andrew; Horswill, Mark S; Watson, Marcus O

2016-01-01

To systematically evaluate the impact of several design features on chart-users' detection of patient deterioration on observation charts with early-warning scoring-systems. Research has shown that observation chart design affects the speed and accuracy with which abnormal observations are detected. However, little is known about the contribution of individual design features to these effects. A 2 × 2 × 2 × 2 mixed factorial design, with data-recording format (drawn dots vs. written numbers), scoring-system integration (integrated colour-based system vs. non-integrated tabular system) and scoring-row placement (grouped vs. separate) varied within-participants and scores (present vs. absent) varied between-participants by random assignment. 205 novice chart-users, tested between March 2011-March 2014, completed 64 trials where they saw real patient data presented on an observation chart. Each participant saw eight cases (four containing abnormal observations) on each of eight designs (which represented a factorial combination of the within-participants variables). On each trial, they assessed whether any of the observations were physiologically abnormal, or whether all observations were normal. Response times and error rates were recorded for each design. Participants responded faster (scores present and absent) and made fewer errors (scores absent) using drawn-dot (vs. written-number) observations and an integrated colour-based (vs. non-integrated tabular) scoring-system. Participants responded faster using grouped (vs. separate) scoring-rows when scores were absent, but separate scoring-rows when scores were present. Our findings suggest that several individual design features can affect novice chart-users' ability to detect patient deterioration. More broadly, the study further demonstrates the need to evaluate chart designs empirically. © 2015 John Wiley & Sons Ltd.
Qualitative Dimensions in Scoring the Rey Visual Memory Test of Malingering.

ERIC Educational Resources Information Center

Griffin, G. A. Elmer; And Others

1996-01-01

A new qualitative scoring system for the Rey Visual Memory Test was tested for its ability to distinguish between malingerers and nonmalingerers. The new system, based on the types of errors made, was able to distinguish between 53 psychiatrically disabled and 64 normal nonmalingerers, and between nonmalingerers and 91 possible malingerers. (SLD)
Relationships of Measurement Error and Prediction Error in Observed-Score Regression

ERIC Educational Resources Information Center

Moses, Tim

2012-01-01

The focus of this paper is assessing the impact of measurement errors on the prediction error of an observed-score regression. Measures are presented and described for decomposing the linear regression's prediction error variance into parts attributable to the true score variance and the error variances of the dependent variable and the predictor…
Synoptic scale forecast skill and systematic errors in the MASS 2.0 model. [Mesoscale Atmospheric Simulation System

NASA Technical Reports Server (NTRS)

Koch, S. E.; Skillman, W. C.; Kocin, P. J.; Wetzel, P. J.; Brill, K. F.

1985-01-01

The synoptic scale performance characteristics of MASS 2.0 are determined by comparing filtered 12-24 hr model forecasts to same-case forecasts made by the National Meteorological Center's synoptic-scale Limited-area Fine Mesh model. Characteristics of the two systems are contrasted, and the analysis methodology used to determine statistical skill scores and systematic errors is described. The overall relative performance of the two models in the sample is documented, and important systematic errors uncovered are presented.
Evaluation of precipitation forecasts from 3D-Var and hybrid GSI-based system during Indian summer monsoon 2015

NASA Astrophysics Data System (ADS)

Singh, Sanjeev Kumar; Prasad, V. S.

2018-02-01

This paper presents a systematic investigation of medium-range rainfall forecasts from two versions of the National Centre for Medium Range Weather Forecasting (NCMRWF)-Global Forecast System based on three-dimensional variational (3D-Var) and hybrid analysis system namely, NGFS and HNGFS, respectively, during Indian summer monsoon (June-September) 2015. The NGFS uses gridpoint statistical interpolation (GSI) 3D-Var data assimilation system, whereas HNGFS uses hybrid 3D ensemble-variational scheme. The analysis includes the evaluation of rainfall fields and comparisons of rainfall using statistical score such as mean precipitation, bias, correlation coefficient, root mean square error and forecast improvement factor. In addition to these, categorical scores like Peirce skill score and bias score are also computed to describe particular aspects of forecasts performance. The comparison results of mean precipitation reveal that both the versions of model produced similar large-scale feature of Indian summer monsoon rainfall for day-1 through day-5 forecasts. The inclusion of fully flow-dependent background error covariance significantly improved the wet biases in HNGFS over the Indian Ocean. The forecast improvement factor and Peirce skill score in the HNGFS have also found better than NGFS for day-1 through day-5 forecasts.
Comparison of Static and Dynamic Balance in Female Collegiate Soccer, Basketball, and Gymnastics Athletes

PubMed Central

Bressel, Eadric; Yonker, Joshua C; Kras, John; Heath, Edward M

2007-01-01

Context: How athletes from different sports perform on balance tests is not well understood. When prescribing balance exercises to athletes in different sports, it may be important to recognize performance variations. Objective: To compare static and dynamic balance among collegiate athletes competing or training in soccer, basketball, and gymnastics. Design: A quasi-experimental, between-groups design. Independent variables included limb (dominant and nondominant) and sport played. Setting: A university athletic training facility. Patients or Other Participants: Thirty-four female volunteers who competed in National Collegiate Athletic Association Division I soccer (n = 11), basketball (n = 11), or gymnastics (n = 12). Intervention(s): To assess static balance, participants performed 3 stance variations (double leg, single leg, and tandem leg) on 2 surfaces (stiff and compliant). For assessment of dynamic balance, participants performed multidirectional maximal single-leg reaches from a unilateral base of support. Main Outcome Measure(s): Errors from the Balance Error Scoring System and normalized leg reach distances from the Star Excursion Balance Test were used to assess static and dynamic balance, respectively. Results: Balance Error Scoring System error scores for the gymnastics group were 55% lower than for the basketball group (P = .01), and Star Excursion Balance Test scores were 7% higher in the soccer group than the basketball group (P = .04). Conclusions: Gymnasts and soccer players did not differ in terms of static and dynamic balance. In contrast, basketball players displayed inferior static balance compared with gymnasts and inferior dynamic balance compared with soccer players. PMID:17597942
Effects of Diaphragmatic Breathing Patterns on Balance: A Preliminary Clinical Trial.

PubMed

Stephens, Rylee J; Haas, Mitchell; Moore, William L; Emmil, Jordan R; Sipress, Jayson A; Williams, Alex

The purpose of this study was to determine the feasibility of performing a larger study to determine if training in diaphragmatic breathing influences static and dynamic balance. A group of 13 healthy persons (8 men, 5 women), who were staff, faculty, or students at the University of Western States participated in an 8-week breathing and balance study using an uncontrolled clinical trial design. Participants were given a series of breathing exercises to perform weekly in the clinic and at home. Balance and breathing were assessed at the weekly clinic sessions. Breathing was evaluated with Liebenson's breathing assessment, static balance with the Modified Balance Error Scoring System, and dynamic balance with OptoGait's March in Place protocol. Improvement was noted in mean diaphragmatic breathing scores (1.3 to 2.6, P < .001), number of single-leg stance balance errors (7.1 to 3.8, P = .001), and tandem stance balance errors (3.2 to 0.9, P = .039). A decreasing error rate in single-leg stance was associated with improvement in breathing score within participants over the 8 weeks of the study (-1.4 errors/unit breathing score change, P < .001). Tandem stance performance did not reach statistical significance (-0.5 error/unit change, P = .118). Dynamic balance was insensitive to balance change, being error free for all participants throughout the study. This proof-of-concept study indicated that promotion of a costal-diaphragmatic breathing pattern may be associated with improvement in balance and suggests that a study of this phenomenon using an experimental design is feasible. Copyright © 2017. Published by Elsevier Inc.
Scoring Methods in the International Land Benchmarking (ILAMB) Package

NASA Astrophysics Data System (ADS)

Collier, N.; Hoffman, F. M.; Keppel-Aleks, G.; Lawrence, D. M.; Mu, M.; Riley, W. J.; Randerson, J. T.

2017-12-01

The International Land Model Benchmarking (ILAMB) project is a model-data intercomparison and integration project designed to improve the performance of the land component of Earth system models. This effort is disseminated in the form of a python package which is openly developed (https://bitbucket.org/ncollier/ilamb). ILAMB is more than a workflow system that automates the generation of common scalars and plot comparisons to observational data. We aim to provide scientists and model developers with a tool to gain insight into model behavior. Thus, a salient feature of the ILAMB package is our synthesis methodology, which provides users with a high-level understanding of model performance. Within ILAMB, we calculate a non-dimensional score of a model's performance in a given dimension of the physics, chemistry, or biology with respect to an observational dataset. For example, we compare the Fluxnet-MTE Gross Primary Productivity (GPP) product against model output in the corresponding historical period. We compute common statistics such as the bias, root mean squared error, phase shift, and spatial distribution. We take these measures and find relative errors by normalizing the values, and then use the exponential to map this relative error to the unit interval. This allows for the scores to be combined into an overall score representing multiple aspects of model performance. In this presentation we give details of this process as well as a proposal for tuning the exponential mapping to make scores more cross comparable. However, as many models are calibrated using these scalar measures with respect to observational datasets, we also score the relationships among relevant variables in the model. For example, in the case of GPP, we also consider its relationship to precipitation, evapotranspiration, and temperature. We do this by creating a mean response curve and a two-dimensional distribution based on the observational data and model results. The response curves are then scored using a relative measure of the root mean squared error and the exponential as before. The distributions are scored using the so-called Hellinger distance, a statistical measure for how well one distribution is represented by another, and included in the model's overall score.
Conditional Standard Errors of Measurement for Composite Scores Using IRT

ERIC Educational Resources Information Center

Kolen, Michael J.; Wang, Tianyou; Lee, Won-Chan

2012-01-01

Composite scores are often formed from test scores on educational achievement test batteries to provide a single index of achievement over two or more content areas or two or more item types on that test. Composite scores are subject to measurement error, and as with scores on individual tests, the amount of error variability typically depends on…
Can Functional Movement Assessment Predict Football Head Impact Biomechanics?

PubMed

Ford, Julia M; Campbell, Kody R; Ford, Cassie B; Boyd, Kenneth E; Padua, Darin A; Mihalik, Jason P

2018-06-01

The purposes of this study was to determine functional movement assessments' ability to predict head impact biomechanics in college football players and to determine whether head impact biomechanics could explain preseason to postseason changes in functional movement performance. Participants (N = 44; mass, 109.0 ± 20.8 kg; age, 20.0 ± 1.3 yr) underwent two preseason and postseason functional movement assessment screenings: 1) Fusionetics Movement Efficiency Test and 2) Landing Error Scoring System (LESS). Fusionetics is scored 0 to 100, and participants were categorized into the following movement quality groups as previously published: good (≥75), moderate (50-75), and poor (<50). The LESS is scored 0 to 17, and participants were categorized into the following previously published movement quality groups: good (≤5 errors), moderate (6-7 errors), and poor (>7 errors). The Head Impact Telemetry (HIT) System measured head impact frequency and magnitude (linear acceleration and rotational acceleration). An encoder with six single-axis accelerometers was inserted between the padding of a commercially available Riddell football helmet. We used random intercepts general linear-mixed models to analyze our data. There were no effects of preseason movement assessment group on the two Head Impact Telemetry System impact outcomes: linear acceleration and rotational acceleration. Head impact frequency did not significantly predict preseason to postseason score changes obtained from the Fusionetics (F1,36 = 0.22, P = 0.643, R = 0.006) or the LESS (F1,36 < 0.01, P = 0.988, R < 0.001) assessments. Previous research has demonstrated an association between concussion and musculoskeletal injury, as well as functional movement assessment performance and musculoskeletal injury. The functional movement assessments chosen may not be sensitive enough to detect neurological and neuromuscular differences within the sample and subtle changes after sustaining head impacts.

Using Student Test Scores to Measure Teacher Performance: Some Problems in the Design and Implementation of Evaluation Systems

ERIC Educational Resources Information Center

Ballou, Dale; Springer, Matthew G.

2015-01-01

Our aim in this article is to draw attention to some underappreciated problems in the design and implementation of evaluation systems that incorporate value-added measures. We focus on four: (1) taking into account measurement error in teacher assessments, (2) revising teachers' scores as more information becomes available about their students,…
Near field communications technology and the potential to reduce medication errors through multidisciplinary application

PubMed Central

Pegler, Joe; Lehane, Elaine; Livingstone, Vicki; McCarthy, Nora; Sahm, Laura J.; Tabirca, Sabin; O’Driscoll, Aoife; Corrigan, Mark

2016-01-01

Background Patient safety requires optimal management of medications. Electronic systems are encouraged to reduce medication errors. Near field communications (NFC) is an emerging technology that may be used to develop novel medication management systems. Methods An NFC-based system was designed to facilitate prescribing, administration and review of medications commonly used on surgical wards. Final year medical, nursing, and pharmacy students were recruited to test the electronic system in a cross-over observational setting on a simulated ward. Medication errors were compared against errors recorded using a paper-based system. Results A significant difference in the commission of medication errors was seen when NFC and paper-based medication systems were compared. Paper use resulted in a mean of 4.09 errors per prescribing round while NFC prescribing resulted in a mean of 0.22 errors per simulated prescribing round (P=0.000). Likewise, medication administration errors were reduced from a mean of 2.30 per drug round with a Paper system to a mean of 0.80 errors per round using NFC (P<0.015). A mean satisfaction score of 2.30 was reported by users, (rated on seven-point scale with 1 denoting total satisfaction with system use and 7 denoting total dissatisfaction). Conclusions An NFC based medication system may be used to effectively reduce medication errors in a simulated ward environment. PMID:28293602
Near field communications technology and the potential to reduce medication errors through multidisciplinary application.

PubMed

O'Connell, Emer; Pegler, Joe; Lehane, Elaine; Livingstone, Vicki; McCarthy, Nora; Sahm, Laura J; Tabirca, Sabin; O'Driscoll, Aoife; Corrigan, Mark

2016-01-01

Patient safety requires optimal management of medications. Electronic systems are encouraged to reduce medication errors. Near field communications (NFC) is an emerging technology that may be used to develop novel medication management systems. An NFC-based system was designed to facilitate prescribing, administration and review of medications commonly used on surgical wards. Final year medical, nursing, and pharmacy students were recruited to test the electronic system in a cross-over observational setting on a simulated ward. Medication errors were compared against errors recorded using a paper-based system. A significant difference in the commission of medication errors was seen when NFC and paper-based medication systems were compared. Paper use resulted in a mean of 4.09 errors per prescribing round while NFC prescribing resulted in a mean of 0.22 errors per simulated prescribing round (P=0.000). Likewise, medication administration errors were reduced from a mean of 2.30 per drug round with a Paper system to a mean of 0.80 errors per round using NFC (P<0.015). A mean satisfaction score of 2.30 was reported by users, (rated on seven-point scale with 1 denoting total satisfaction with system use and 7 denoting total dissatisfaction). An NFC based medication system may be used to effectively reduce medication errors in a simulated ward environment.
Standard Errors and Confidence Intervals of Norm Statistics for Educational and Psychological Tests.

PubMed

Oosterhuis, Hannah E M; van der Ark, L Andries; Sijtsma, Klaas

2016-11-14

Norm statistics allow for the interpretation of scores on psychological and educational tests, by relating the test score of an individual test taker to the test scores of individuals belonging to the same gender, age, or education groups, et cetera. Given the uncertainty due to sampling error, one would expect researchers to report standard errors for norm statistics. In practice, standard errors are seldom reported; they are either unavailable or derived under strong distributional assumptions that may not be realistic for test scores. We derived standard errors for four norm statistics (standard deviation, percentile ranks, stanine boundaries and Z-scores) under the mild assumption that the test scores are multinomially distributed. A simulation study showed that the standard errors were unbiased and that corresponding Wald-based confidence intervals had good coverage. Finally, we discuss the possibilities for applying the standard errors in practical test use in education and psychology. The procedure is provided via the R function check.norms, which is available in the mokken package.
Scoring systems for the Clock Drawing Test: A historical review

PubMed Central

Spenciere, Bárbara; Alves, Heloisa; Charchat-Fichman, Helenice

2017-01-01

The Clock Drawing Test (CDT) is a simple neuropsychological screening instrument that is well accepted by patients and has solid psychometric properties. Several different CDT scoring methods have been developed, but no consensus has been reached regarding which scoring method is the most accurate. This article reviews the literature on these scoring systems and the changes they have undergone over the years. Historically, different types of scoring systems emerged. Initially, the focus was on screening for dementia, and the methods were both quantitative and semi-quantitative. Later, the need for an early diagnosis called for a scoring system that can detect subtle errors, especially those related to executive function. Therefore, qualitative analyses began to be used for both differential and early diagnoses of dementia. A widely used qualitative method was proposed by Rouleau et al. (1992). Tracing the historical path of these scoring methods is important for developing additional scoring systems and furthering dementia prevention research. PMID:29213488
Comparing Graphical and Verbal Representations of Measurement Error in Test Score Reports

ERIC Educational Resources Information Center

Zwick, Rebecca; Zapata-Rivera, Diego; Hegarty, Mary

2014-01-01

Research has shown that many educators do not understand the terminology or displays used in test score reports and that measurement error is a particularly challenging concept. We investigated graphical and verbal methods of representing measurement error associated with individual student scores. We created four alternative score reports, each…
Influence of age, sex, technique, and exercise program on movement patterns after an anterior cruciate ligament injury prevention program in youth soccer players.

PubMed

DiStefano, Lindsay J; Padua, Darin A; DiStefano, Michael J; Marshall, Stephen W

2009-03-01

Anterior cruciate ligament (ACL) injury prevention programs show promising results with changing movement; however, little information exists regarding whether a program designed for an individual's movements may be effective or how baseline movements may affect outcomes. A program designed to change specific movements would be more effective than a "one-size-fits-all" program. Greatest improvement would be observed among individuals with the most baseline error. Subjects of different ages and sexes respond similarly. Randomized controlled trial; Level of evidence, 1. One hundred seventy-three youth soccer players from 27 teams were randomly assigned to a generalized or stratified program. Subjects were videotaped during jump-landing trials before and after the program and were assessed using the Landing Error Scoring System (LESS), which is a valid clinical movement analysis tool. A high LESS score indicates more errors. Generalized players performed the same exercises, while the stratified players performed exercises to correct their initial movement errors. Change scores were compared between groups of varying baseline errors, ages, sexes, and programs. Subjects with the highest baseline LESS score improved the most (95% CI, -3.4 to -2.0). High school subjects (95% CI, -1.7 to -0.98) improved their technique more than pre-high school subjects (95% CI, -1.0 to -0.4). There was no difference between the programs or sexes. Players with the greatest amount of movement errors experienced the most improvement. A program's effectiveness may be enhanced if this population is targeted.
Taking the Error Term of the Factor Model into Account: The Factor Score Predictor Interval

ERIC Educational Resources Information Center

Beauducel, Andre

2013-01-01

The problem of factor score indeterminacy implies that the factor and the error scores cannot be completely disentangled in the factor model. It is therefore proposed to compute Harman's factor score predictor that contains an additive combination of factor and error variance. This additive combination is discussed in the framework of classical…
Examining the association of injury with the Functional Movement Screen and Landing Error Scoring System in military recruits undergoing 16 weeks of introductory fitness training.

PubMed

Everard, Eoin; Lyons, Mark; Harrison, Andrew J

2018-06-01

To examine the association of injury with the Functional Movement Screen (FMS) and Landing Error Scoring System (LESS) in military recruits undergoing an intensive 16-week training block. Prospective cohort study. One hundred and thirty-two entry-level male soldiers (18-25years) were tested using the FMS and LESS. The participants underwent an intensive 16-week training program with injury data recorded daily. Chi-squared statistics were used to examine associations between injury risk and (1) poor LESS scores, (2) any score of 1 on the FMS and (3) composite FMS score of ≤14. A composite FMS score of ≤14 was not a significant predictor of injury. LESS scores of >5 and having a score of 1 on any FMS test were significantly associated with injury. LESS scores had greater relative risk, sensitivity and specificity (2.2 (95% CI=1.48-3.34); 71% and 87% respectively) than scores of 1 on the FMS (relative risk=1.32 (95% CI=1.0-1.7); sensitivity=50% and specificity=76%). There was no association between composite FMS score and injury but LESS scores and scores of 1 in the FMS test were significantly associated with injury in varying degrees. LESS scores had a much better association with injury than both any scores of 1 on the FMS and a combination of LESS scores and scores of 1 on the FMS. Furthermore, the LESS provides comparable information related to injury risk as other well-established markers associated with injury such as age, muscular strength and previous injury. Copyright © 2017. Published by Elsevier Ltd.
Validating Emergency Department Vital Signs Using a Data Quality Engine for Data Warehouse

PubMed Central

Genes, N; Chandra, D; Ellis, S; Baumlin, K

2013-01-01

Background : Vital signs in our emergency department information system were entered into free-text fields for heart rate, respiratory rate, blood pressure, temperature and oxygen saturation. Objective : We sought to convert these text entries into a more useful form, for research and QA purposes, upon entry into a data warehouse. Methods : We derived a series of rules and assigned quality scores to the transformed values, conforming to physiologic parameters for vital signs across the age range and spectrum of illness seen in the emergency department. Results : Validating these entries revealed that 98% of free-text data had perfect quality scores, conforming to established vital sign parameters. Average vital signs varied as expected by age. Degradations in quality scores were most commonly attributed logging temperature in Fahrenheit instead of Celsius; vital signs with this error could still be transformed for use. Errors occurred more frequently during periods of high triage, though error rates did not correlate with triage volume. Conclusions : In developing a method for importing free-text vital sign data from our emergency department information system, we now have a data warehouse with a broad array of quality-checked vital signs, permitting analysis and correlation with demographics and outcomes. PMID:24403981
Validating emergency department vital signs using a data quality engine for data warehouse.

PubMed

Genes, N; Chandra, D; Ellis, S; Baumlin, K

2013-01-01

Vital signs in our emergency department information system were entered into free-text fields for heart rate, respiratory rate, blood pressure, temperature and oxygen saturation. We sought to convert these text entries into a more useful form, for research and QA purposes, upon entry into a data warehouse. We derived a series of rules and assigned quality scores to the transformed values, conforming to physiologic parameters for vital signs across the age range and spectrum of illness seen in the emergency department. Validating these entries revealed that 98% of free-text data had perfect quality scores, conforming to established vital sign parameters. Average vital signs varied as expected by age. Degradations in quality scores were most commonly attributed logging temperature in Fahrenheit instead of Celsius; vital signs with this error could still be transformed for use. Errors occurred more frequently during periods of high triage, though error rates did not correlate with triage volume. In developing a method for importing free-text vital sign data from our emergency department information system, we now have a data warehouse with a broad array of quality-checked vital signs, permitting analysis and correlation with demographics and outcomes.
Conditional Standard Errors, Reliability and Decision Consistency of Performance Levels Using Polytomous IRT.

ERIC Educational Resources Information Center

Wang, Tianyou; And Others

M. J. Kolen, B. A. Hanson, and R. L. Brennan (1992) presented a procedure for assessing the conditional standard error of measurement (CSEM) of scale scores using a strong true-score model. They also investigated the ways of using nonlinear transformation from number-correct raw score to scale score to equalize the conditional standard error along…
A Comparison of Three Methods for Computing Scale Score Conditional Standard Errors of Measurement. ACT Research Report Series, 2013 (7)

ERIC Educational Resources Information Center

Woodruff, David; Traynor, Anne; Cui, Zhongmin; Fang, Yu

2013-01-01

Professional standards for educational testing recommend that both the overall standard error of measurement and the conditional standard error of measurement (CSEM) be computed on the score scale used to report scores to examinees. Several methods have been developed to compute scale score CSEMs. This paper compares three methods, based on…
Forecast skill score assessment of a relocatable ocean prediction system, using a simplified objective analysis method

NASA Astrophysics Data System (ADS)

Onken, Reiner

2017-11-01

A relocatable ocean prediction system (ROPS) was employed to an observational data set which was collected in June 2014 in the waters to the west of Sardinia (western Mediterranean) in the framework of the REP14-MED experiment. The observational data, comprising more than 6000 temperature and salinity profiles from a fleet of underwater gliders and shipborne probes, were assimilated in the Regional Ocean Modeling System (ROMS), which is the heart of ROPS, and verified against independent observations from ScanFish tows by means of the forecast skill score as defined by Murphy(1993). A simplified objective analysis (OA) method was utilised for assimilation, taking account of only those profiles which were located within a predetermined time window W. As a result of a sensitivity study, the highest skill score was obtained for a correlation length scale C = 12.5 km, W = 24 h, and r = 1, where r is the ratio between the error of the observations and the background error, both for temperature and salinity. Additional ROPS runs showed that (i) the skill score of assimilation runs was mostly higher than the score of a control run without assimilation, (i) the skill score increased with increasing forecast range, and (iii) the skill score for temperature was higher than the score for salinity in the majority of cases. Further on, it is demonstrated that the vast number of observations can be managed by the applied OA method without data reduction, enabling timely operational forecasts even on a commercially available personal computer or a laptop.
Conditional Standard Errors of Measurement for Scale Scores.

ERIC Educational Resources Information Center

Kolen, Michael J.; And Others

1992-01-01

A procedure is described for estimating the reliability and conditional standard errors of measurement of scale scores incorporating the discrete transformation of raw scores to scale scores. The method is illustrated using a strong true score model, and practical applications are described. (SLD)
Sway Area and Velocity Correlated With MobileMat Balance Error Scoring System (BESS) Scores.

PubMed

Caccese, Jaclyn B; Buckley, Thomas A; Kaminski, Thomas W

2016-08-01

The Balance Error Scoring System (BESS) is often used for sport-related concussion balance assessment. However, moderate intratester and intertester reliability may cause low initial sensitivity, suggesting that a more objective balance assessment method is needed. The MobileMat BESS was designed for objective BESS scoring, but the outcome measures must be validated with reliable balance measures. Thus, the purpose of this investigation was to compare MobileMat BESS scores to linear and nonlinear measures of balance. Eighty-eight healthy collegiate student-athletes (age: 20.0 ± 1.4 y, height: 177.7 ± 10.7 cm, mass: 74.8 ± 13.7 kg) completed the MobileMat BESS. MobileMat BESS scores were compared with 95% area, sway velocity, approximate entropy, and sample entropy. MobileMat BESS scores were significantly correlated with 95% area for single-leg (r = .332) and tandem firm (r = .474), and double-leg foam (r = .660); and with sway velocity for single-leg (r = .406) and tandem firm (r = .601), and double-leg (r = .575) and single-leg foam (r = .434). MobileMat BESS scores were not correlated with approximate or sample entropy. MobileMat BESS scores were low to moderately correlated with linear measures, suggesting the ability to identify changes in the center of mass-center of pressure relationship, but not higher-order processing associated with nonlinear measures. These results suggest that the MobileMat BESS may be a clinically-useful tool that provides objective linear balance measures.
A novel color vision test for detection of diabetic macular edema.

PubMed

Shin, Young Joo; Park, Kyu Hyung; Hwang, Jeong-Min; Wee, Won Ryang; Lee, Jin Hak; Lee, In Bum; Hyon, Joon Young

2014-01-02

To determine the sensitivity of the Seoul National University (SNU) computerized color vision test for detecting diabetic macular edema. From May to September 2003, a total of 73 eyes of 73 patients with diabetes mellitus were examined using the SNU computerized color vision test and optical coherence tomography (OCT). Color deficiency was quantified as the total error score on the SNU test and as error scores for each of four color quadrants corresponding to yellows (Q1), greens (Q2), blues (Q3), and reds (Q4). SNU error scores were assessed as a function of OCT foveal thickness and total macular volume (TMV). The error scores in Q1, Q2, Q3, and Q4 measured by the SNU color vision test increased with foveal thickness (P < 0.05), whereas they were not correlated with TMV. Total error scores, the summation of Q1 and Q3, the summation of Q2 and Q4, and blue-yellow (B-Y) error scores were significantly correlated with foveal thickness (P < 0.05), but not with TMV. The observed correlation between SNU color test error scores and foveal thickness indicates that the SNU test may be useful for detection and monitoring of diabetic macular edema.
Evaluating the prevalence and impact of examiner errors on the Wechsler scales of intelligence: A meta-analysis.

PubMed

Styck, Kara M; Walsh, Shana M

2016-01-01

The purpose of the present investigation was to conduct a meta-analysis of the literature on examiner errors for the Wechsler scales of intelligence. Results indicate that a mean of 99.7% of protocols contained at least 1 examiner error when studies that included a failure to record examinee responses as an error were combined and a mean of 41.2% of protocols contained at least 1 examiner error when studies that ignored errors of omission were combined. Furthermore, graduate student examiners were significantly more likely to make at least 1 error on Wechsler intelligence test protocols than psychologists. However, psychologists made significantly more errors per protocol than graduate student examiners regardless of the inclusion or exclusion of failure to record examinee responses as errors. On average, 73.1% of Full-Scale IQ (FSIQ) scores changed as a result of examiner errors, whereas 15.8%-77.3% of scores on the Verbal Comprehension Index (VCI), Perceptual Reasoning Index (PRI), Working Memory Index (WMI), and Processing Speed Index changed as a result of examiner errors. In addition, results suggest that examiners tend to overestimate FSIQ scores and underestimate VCI scores. However, no strong pattern emerged for the PRI and WMI. It can be concluded that examiner errors occur frequently and impact index and FSIQ scores. Consequently, current estimates for the standard error of measurement of popular IQ tests may not adequately capture the variance due to the examiner. (c) 2016 APA, all rights reserved).
Landing Technique and Performance in Youth Athletes After a Single Injury-Prevention Program Session

PubMed Central

Root, Hayley; Trojian, Thomas; Martinez, Jessica; Kraemer, William; DiStefano, Lindsay J.

2015-01-01

Context Injury-prevention programs (IPPs) performed as season-long warm-ups improve injury rates, performance outcomes, and jump-landing technique. However, concerns regarding program adoption exist. Identifying the acute benefits of using an IPP compared with other warm-ups may encourage IPP adoption. Objective To examine the immediate effects of 3 warm-up protocols (IPP, static warm-up [SWU], or dynamic warm-up [DWU]) on jump-landing technique and performance measures in youth athletes. Design Randomized controlled clinical trial. Setting Gymnasiums. Patients or Other Participants Sixty male and 29 female athletes (age = 13 ± 2 years, height = 162.8 ± 12.6 cm, mass = 37.1 ± 13.5 kg) volunteered to participate in a single session. Intervention(s) Participants were stratified by age, sex, and sport and then were randomized into 1 protocol: IPP, SWU, or DWU. The IPP consisted of dynamic flexibility, strengthening, plyometric, and balance exercises and emphasized proper technique. The SWU consisted of jogging and lower extremity static stretching. The DWU consisted of dynamic lower extremity flexibility exercises. Participants were assessed for landing technique and performance measures immediately before (PRE) and after (POST) completing their warm-ups. Main Outcome Measure(s) One rater graded each jump-landing trial using the Landing Error Scoring System. Participants performed a vertical jump, long jump, shuttle run, and jump-landing task in randomized order. The averages of all jump-landing trials and performance variables were used to calculate 1 composite score for each variable at PRE and POST. Change scores were calculated (POST − PRE) for all measures. Separate 1-way (group) analyses of variance were conducted for each dependent variable (α < .05). Results No differences were observed among groups for any performance measures (P > .05). The Landing Error Scoring System scores improved after the IPP (change = −0.40 ± 1.24 errors) compared with the DWU (0.27 ± 1.09 errors) and SWU (0.43 ± 1.35 errors; P = .04). Conclusions An IPP did not impair sport performance and may have reduced injury risk, which supports the use of these programs before sport activity. PMID:26523663
Evaluation of ensemble forecast uncertainty using a new proper score: application to medium-range and seasonal forecasts

NASA Astrophysics Data System (ADS)

Christensen, Hannah; Moroz, Irene; Palmer, Tim

2015-04-01

Forecast verification is important across scientific disciplines as it provides a framework for evaluating the performance of a forecasting system. In the atmospheric sciences, probabilistic skill scores are often used for verification as they provide a way of unambiguously ranking the performance of different probabilistic forecasts. In order to be useful, a skill score must be proper -- it must encourage honesty in the forecaster, and reward forecasts which are reliable and which have good resolution. A new score, the Error-spread Score (ES), is proposed which is particularly suitable for evaluation of ensemble forecasts. It is formulated with respect to the moments of the forecast. The ES is confirmed to be a proper score, and is therefore sensitive to both resolution and reliability. The ES is tested on forecasts made using the Lorenz '96 system, and found to be useful for summarising the skill of the forecasts. The European Centre for Medium-Range Weather Forecasts (ECMWF) ensemble prediction system (EPS) is evaluated using the ES. Its performance is compared to a perfect statistical probabilistic forecast -- the ECMWF high resolution deterministic forecast dressed with the observed error distribution. This generates a forecast that is perfectly reliable if considered over all time, but which does not vary from day to day with the predictability of the atmospheric flow. The ES distinguishes between the dynamically reliable EPS forecasts and the statically reliable dressed deterministic forecasts. Other skill scores are tested and found to be comparatively insensitive to this desirable forecast quality. The ES is used to evaluate seasonal range ensemble forecasts made with the ECMWF System 4. The ensemble forecasts are found to be skilful when compared with climatological or persistence forecasts, though this skill is dependent on region and time of year.

Standardized error severity score (ESS) ratings to quantify risk associated with child restraint system (CRS) and booster seat misuse.

PubMed

Rudin-Brown, Christina M; Kramer, Chelsea; Langerak, Robin; Scipione, Andrea; Kelsey, Shelley

2017-11-17

Although numerous research studies have reported high levels of error and misuse of child restraint systems (CRS) and booster seats in experimental and real-world scenarios, conclusions are limited because they provide little information regarding which installation issues pose the highest risk and thus should be targeted for change. Beneficial to legislating bodies and researchers alike would be a standardized, globally relevant assessment of the potential injury risk associated with more common forms of CRS and booster seat misuse, which could be applied with observed error frequency-for example, in car seat clinics or during prototype user testing-to better identify and characterize the installation issues of greatest risk to safety. A group of 8 leading world experts in CRS and injury biomechanics, who were members of an international child safety project, estimated the potential injury severity associated with common forms of CRS and booster seat misuse. These injury risk error severity score (ESS) ratings were compiled and compared to scores from previous research that had used a similar procedure but with fewer respondents. To illustrate their application, and as part of a larger study examining CRS and booster seat labeling requirements, the new standardized ESS ratings were applied to objective installation performance data from 26 adult participants who installed a convertible (rear- vs. forward-facing) CRS and booster seat in a vehicle, and a child test dummy in the CRS and booster seat, using labels that only just met minimal regulatory requirements. The outcome measure, the risk priority number (RPN), represented the composite scores of injury risk and observed installation error frequency. Variability within the sample of ESS ratings in the present study was smaller than that generated in previous studies, indicating better agreement among experts on what constituted injury risk. Application of the new standardized ESS ratings to installation performance data revealed several areas of misuse of the CRS/booster seat associated with high potential injury risk. Collectively, findings indicate that standardized ESS ratings are useful for estimating injury risk potential associated with real-world CRS and booster seat installation errors.
Assessment for Learning: Turkey Case

ERIC Educational Resources Information Center

San, Ismail

2016-01-01

Why do we test students? To label them or teach them better? Labeling students is one of the most common errors of educators. If we don't want to fall into that error, then we should use those scores more useful. Marking pupils and focusing on the next examination is not the right approach for humanitarian systems. Then, what should we do?…
Effects of auditory radio interference on a fine, continuous, open motor skill.

PubMed

Lazar, J M; Koceja, D M; Morris, H H

1995-06-01

The effects of human speech on a fine, continuous, and open motor skill were examined. A tape of auditory human radio traffic was injected into a tank gunnery simulator during each training session for 4 wk. of training for 3 hr. a week. The dependent variables were identification time, fire time, kill time, systems errors, and acquisition errors. These were measured by the Unit Conduct Of Fire Trainer (UCOFT). The interference was interjected into the UCOFT Tank Table VIII gunnery test. A Solomon four-group design was used. A 2 x 2 analysis of variance was used to assess whether interference gunnery training resulted in improvements in interference posttest scores. During the first three weeks of training, the interference group committed 106% more systems errors and 75% more acquisition errors than the standard group. The interference training condition was associated with a significant improvement from pre- to posttest of 44% in over-all UCOFT scores; however, when examined on the posttest the standard training did not improve performance significantly over the same period. It was concluded that auditory radio interference degrades performance of this fine, continuous, open motor skill, and interference training appears to abate the effects of this degradation.
Wireless clinical alerts and patient outcomes in the surgical intensive care unit.

PubMed

Major, Kevin; Shabot, M Michael; Cunneen, Scott

2002-12-01

Errors in medicine have gained public interest since the Institute of Medicine published its 1999 report on this subject. Although errors of commission are frequently cited, errors of omission can be equally serious. A computerized surgical intensive care unit (SICU) information system when coupled to an event-driven alerting engine has the potential to reduce errors of omission for critical intensive care unit events. Automated alerts and patient outcomes were prospectively collected for all patients admitted to a tertiary-care SICU for a 2-year period. During the study period 3,973 patients were admitted to the SICU and received 13,608 days of care. A total of 15,066 alert pages were sent including alerts for physiologic condition (6,163), laboratory data (4,951), blood gas (3,774), drug allergy (130), and toxic drug levels (48). Admission Simplified Acute Physiology Score and Acute Physiology and Chronic Health Evaluation II score, SICU lengths of stay, and overall mortality rates were significantly higher in patients who triggered the alerting system. Patients triggering the alert paging system were 49.4 times more likely to die in the SICU compared with patients who did not generate an alert. Even after transfer to floor care the patients who triggered the alerting system were 5.7 times more likely to die in the hospital. An alert page identifies patients who will stay in the SICU longer and have a significantly higher chance of death compared with patients who do not trigger the alerting system.
Evaluation of causes and frequency of medication errors during information technology downtime.

PubMed

Hanuscak, Tara L; Szeinbach, Sheryl L; Seoane-Vazquez, Enrique; Reichert, Brendan J; McCluskey, Charles F

2009-06-15

The causes and frequency of medication errors occurring during information technology downtime were evaluated. Individuals from a convenience sample of 78 hospitals who were directly responsible for supporting and maintaining clinical information systems (CISs) and automated dispensing systems (ADSs) were surveyed using an online tool between February 2007 and May 2007 to determine if medication errors were reported during periods of system downtime. The errors were classified using the National Coordinating Council for Medication Error Reporting and Prevention severity scoring index. The percentage of respondents reporting downtime was estimated. Of the 78 eligible hospitals, 32 respondents with CIS and ADS responsibilities completed the online survey for a response rate of 41%. For computerized prescriber order entry, patch installations and system upgrades caused an average downtime of 57% over a 12-month period. Lost interface and interface malfunction were reported for centralized and decentralized ADSs, with an average downtime response of 34% and 29%, respectively. The average downtime response was 31% for software malfunctions linked to clinical decision-support systems. Although patient harm did not result from 30 (54%) medication errors, the potential for harm was present for 9 (16%) of these errors. Medication errors occurred during CIS and ADS downtime despite the availability of backup systems and standard protocols to handle periods of system downtime. Efforts should be directed to reduce the frequency and length of down-time in order to minimize medication errors during such downtime.
Sleep, mental health status, and medical errors among hospital nurses in Japan.

PubMed

Arimura, Mayumi; Imai, Makoto; Okawa, Masako; Fujimura, Toshimasa; Yamada, Naoto

2010-01-01

Medical error involving nurses is a critical issue since nurses' actions will have a direct and often significant effect on the prognosis of their patients. To investigate the significance of nurse health in Japan and its potential impact on patient services, a questionnaire-based survey amongst nurses working in hospitals was conducted, with the specific purpose of examining the relationship between shift work, mental health and self-reported medical errors. Multivariate analysis revealed significant associations between the shift work system, General Health Questionnaire (GHQ) scores and nurse errors: the odds ratios for shift system and GHQ were 2.1 and 1.1, respectively. It was confirmed that both sleep and mental health status among hospital nurses were relatively poor, and that shift work and poor mental health were significant factors contributing to medical errors.
Graduate Students' Administration and Scoring Errors on the Woodcock-Johnson III Tests of Cognitive Abilities

ERIC Educational Resources Information Center

Ramos, Erica; Alfonso, Vincent C.; Schermerhorn, Susan M.

2009-01-01

The interpretation of cognitive test scores often leads to decisions concerning the diagnosis, educational placement, and types of interventions used for children. Therefore, it is important that practitioners administer and score cognitive tests without error. This study assesses the frequency and types of examiner errors that occur during the…
Shared Dosimetry Error in Epidemiological Dose-Response Analyses

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stram, Daniel O.; Preston, Dale L.; Sokolnikov, Mikhail

2015-03-23

Radiation dose reconstruction systems for large-scale epidemiological studies are sophisticated both in providing estimates of dose and in representing dosimetry uncertainty. For example, a computer program was used by the Hanford Thyroid Disease Study to provide 100 realizations of possible dose to study participants. The variation in realizations reflected the range of possible dose for each cohort member consistent with the data on dose determinates in the cohort. Another example is the Mayak Worker Dosimetry System 2013 which estimates both external and internal exposures and provides multiple realizations of "possible" dose history to workers given dose determinants. This paper takesmore » up the problem of dealing with complex dosimetry systems that provide multiple realizations of dose in an epidemiologic analysis. In this paper we derive expected scores and the information matrix for a model used widely in radiation epidemiology, namely the linear excess relative risk (ERR) model that allows for a linear dose response (risk in relation to radiation) and distinguishes between modifiers of background rates and of the excess risk due to exposure. We show that treating the mean dose for each individual (calculated by averaging over the realizations) as if it was true dose (ignoring both shared and unshared dosimetry errors) gives asymptotically unbiased estimates (i.e. the score has expectation zero) and valid tests of the null hypothesis that the ERR slope β is zero. Although the score is unbiased the information matrix (and hence the standard errors of the estimate of β) is biased for β≠0 when ignoring errors in dose estimates, and we show how to adjust the information matrix to remove this bias, using the multiple realizations of dose. Use of these methods for several studies, including the Mayak Worker Cohort and the U.S. Atomic Veterans Study, is discussed.« less
Automated outcome scoring in a virtual reality simulator for endodontic surgery.

PubMed

Yin, Myat Su; Haddawy, Peter; Suebnukarn, Siriwan; Rhienmora, Phattanapon

2018-01-01

We address the problem of automated outcome assessment in a virtual reality (VR) simulator for endodontic surgery. Outcome assessment is an essential component of any system that provides formative feedback, which requires assessing the outcome, relating it to the procedure, and communicating in a language natural to dental students. This study takes a first step toward automated generation of such comprehensive feedback. Virtual reference templates are computed based on tooth anatomy and the outcome is assessed with a 3D score cube volume which consists of voxel-level non-linear weighted scores based on the templates. The detailed scores are transformed into standard scoring language used by dental schools. The system was evaluated on fifteen outcome samples that contained optimal results and those with errors including perforation of the walls, floor, and both, as well as various combinations of major and minor over and under drilling errors. Five endodontists who had professional training and varying levels of experiences in root canal treatment participated as raters in the experiment. Results from evaluation of our system with expert endodontists show a high degree of agreement with expert scores (information based measure of disagreement 0.04-0.21). At the same time they show some disagreement among human expert scores, reflecting the subjective nature of human outcome scoring. The discriminatory power of the AOS scores analyzed with three grade tiers (A, B, C) using the area under the receiver operating characteristic curve (AUC). The AUC values are generally highest for the {AB: C} cutoff which is cutoff at the boundary between clinically acceptable (B) and clinically unacceptable (C) grades. The objective consistency of computed scores and high degree of agreement with experts make the proposed system a promising addition to existing VR simulators. The translation of detailed level scores into terminology commonly used in dental surgery supports natural communication with students and instructors. With the reference virtual templates created automatically, the approach is robust and is applicable in scoring the outcome of any dental surgery procedure involving the act of drilling. Copyright © 2017 Elsevier B.V. All rights reserved.
National trends in safety performance of electronic health record systems in children's hospitals.

PubMed

Chaparro, Juan D; Classen, David C; Danforth, Melissa; Stockwell, David C; Longhurst, Christopher A

2017-03-01

To evaluate the safety of computerized physician order entry (CPOE) and associated clinical decision support (CDS) systems in electronic health record (EHR) systems at pediatric inpatient facilities in the US using the Leapfrog Group's pediatric CPOE evaluation tool. The Leapfrog pediatric CPOE evaluation tool, a previously validated tool to assess the ability of a CPOE system to identify orders that could potentially lead to patient harm, was used to evaluate 41 pediatric hospitals over a 2-year period. Evaluation of the last available test for each institution was performed, assessing performance overall as well as by decision support category (eg, drug-drug, dosing limits). Longitudinal analysis of test performance was also carried out to assess the impact of testing and the overall trend of CPOE performance in pediatric hospitals. Pediatric CPOE systems were able to identify 62% of potential medication errors in the test scenarios, but ranged widely from 23-91% in the institutions tested. The highest scoring categories included drug-allergy interactions, dosing limits (both daily and cumulative), and inappropriate routes of administration. We found that hospitals with longer periods since their CPOE implementation did not have better scores upon initial testing, but after initial testing there was a consistent improvement in testing scores of 4 percentage points per year. Pediatric computerized physician order entry (CPOE) systems on average are able to intercept a majority of potential medication errors, but vary widely among implementations. Prospective and repeated testing using the Leapfrog Group's evaluation tool is associated with improved ability to intercept potential medication errors. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Impairment of perception and recognition of faces, mimic expression and gestures in schizophrenic patients.

PubMed

Berndl, K; von Cranach, M; Grüsser, O J

1986-01-01

The perception and recognition of faces, mimic expression and gestures were investigated in normal subjects and schizophrenic patients by means of a movie test described in a previous report (Berndl et al. 1986). The error scores were compared with results from a semi-quantitative evaluation of psychopathological symptoms and with some data from the case histories. The overall error scores found in the three groups of schizophrenic patients (paranoic, hebephrenic, schizo-affective) were significantly increased (7-fold) over those of normals. No significant difference in the distribution of the error scores in the three different patient groups was found. In 10 different sub-tests following the movie the deficiencies found in the schizophrenic patients were analysed in detail. The error score for the averbal test was on average higher in paranoic patients than in the two other groups of patients, while the opposite was true for the error scores found in the verbal tests. Age and sex had some impact on the test results. In normals, female subjects were somewhat better than male. In schizophrenic patients the reverse was true. Thus female patients were more affected by the disease than male patients with respect to the task performance. The correlation between duration of the disease and error score was small; less than 10% of the error scores could be attributed to factors related to the duration of illness. Evaluation of psychopathological symptoms indicated that the stronger the schizophrenic defect, the higher the error score, but again this relationship was responsible for not more than 10% of the errors. The estimated degree of acute psychosis and overall sum of psychopathological abnormalities as scored in a semi-quantitative exploration did not correlate with the error score, but with each other. Similarly, treatment with psychopharmaceuticals, previous misuse of drugs or of alcohol had practically no effect on the outcome of the test data. The analysis of performance and test data of schizophrenic patients indicated that our findings are most likely not due to a "non-specific" impairment of cognitive function in schizophrenia, but point to a fairly selective defect in elementary cognitive visual functions necessary for averbal social communication. Some possible explanations of the data are discussed in relation to neuropsychological and neurophysiological findings on "face-specific" cortical areas located in the primate temporal lobe.
Error behaviors associated with loss of competency in Alzheimer's disease.

PubMed

Marson, D C; Annis, S M; McInturff, B; Bartolucci, A; Harrell, L E

1999-12-10

To investigate qualitative behavioral changes associated with declining medical decision-making capacity (competency) in patients with AD. Qualitative measures can yield clinical information about functional changes in neurologic disease not available through quantitative measures. Normal older controls (n = 21) and patients with mild and moderate probable AD (n = 72) were compared using a standardized competency measure and neuropsychological measures. A system of 16 qualitative error scores representing conceptual domains of language, executive dysfunction, affective dysfunction, and compensatory responses was used to analyze errors produced on the competency measure. Patterns of errors were examined across groups. Relationships between error behaviors and competency performance were determined, and neurocognitive correlates of specific error behaviors were identified. AD patients demonstrated more miscomprehension, factual confusion, intrusions, incoherent responses, nonresponsive answers, loss of task, and delegation than controls. Errors in the executive domain (loss of task, nonresponsive answer, and loss of detachment) were key predictors of declining competency performance by AD patients. Neuropsychological analyses in the AD group generally confirmed the conceptual domain assignments of the qualitative scores. Loss of task, nonresponsive answers, and loss of detachment were key behavioral changes associated with declining competency of AD patients and with neurocognitive measures of executive dysfunction. These findings support the growing linkage between executive dysfunction and competency loss.
Testing Intelligently Includes Double-Checking Wechsler IQ Scores

ERIC Educational Resources Information Center

Kuentzel, Jeffrey G.; Hetterscheidt, Lesley A.; Barnett, Douglas

2011-01-01

The rigors of standardized testing make for numerous opportunities for examiner error, including simple computational mistakes in scoring. Although experts recommend that test scoring be double-checked, the extent to which independent double-checking would reduce scoring errors is not known. A double-checking procedure was established at a…
Virtual reality computer simulation.

PubMed

Grantcharov, T P; Rosenberg, J; Pahle, E; Funch-Jensen, P

2001-03-01

Objective assessment of psychomotor skills should be an essential component of a modern surgical training program. There are computer systems that can be used for this purpose, but their wide application is not yet generally accepted. The aim of this study was to validate the role of virtual reality computer simulation as a method for evaluating surgical laparoscopic skills. The study included 14 surgical residents. On day 1, they performed two runs of all six tasks on the Minimally Invasive Surgical Trainer, Virtual Reality (MIST VR). On day 2, they performed a laparoscopic cholecystectomy on living pigs; afterward, they were tested again on the MIST VR. A group of experienced surgeons evaluated the trainees' performance on the animal operation, giving scores for total performance error and economy of motion. During the tasks on the MIST VR, errors and noneconomy of movements for the left and right hand were also recorded. There were significant correlations between error scores in vivo and three of the six in vitro tasks (p < 0.05). In vivo economy scores correlated significantly with non-economy right-hand scores for five of the six tasks and with non-economy left-hand scores for one of the six tasks (p < 0.05). In this study, laparoscopic performance in the animal model correlated significantly with performance on the computer simulator. Thus, the computer model seems to be a promising objective method for the assessment of laparoscopic psychomotor skills.
Analytic score distributions for a spatially continuous tridirectional Monte Carol transport problem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Booth, T.E.

1996-01-01

The interpretation of the statistical error estimates produced by Monte Carlo transport codes is still somewhat of an art. Empirically, there are variance reduction techniques whose error estimates are almost always reliable, and there are variance reduction techniques whose error estimates are often unreliable. Unreliable error estimates usually result from inadequate large-score sampling from the score distribution`s tail. Statisticians believe that more accurate confidence interval statements are possible if the general nature of the score distribution can be characterized. Here, the analytic score distribution for the exponential transform applied to a simple, spatially continuous Monte Carlo transport problem is provided.more » Anisotropic scattering and implicit capture are included in the theory. In large part, the analytic score distributions that are derived provide the basis for the ten new statistical quality checks in MCNP.« less
Confidence Intervals for Weighted Composite Scores under the Compound Binomial Error Model

ERIC Educational Resources Information Center

Kim, Kyung Yong; Lee, Won-Chan

2018-01-01

Reporting confidence intervals with test scores helps test users make important decisions about examinees by providing information about the precision of test scores. Although a variety of estimation procedures based on the binomial error model are available for computing intervals for test scores, these procedures assume that items are randomly…
Ten years of preanalytical monitoring and control: Synthetic Balanced Score Card Indicator

PubMed Central

López-Garrigós, Maite; Flores, Emilio; Santo-Quiles, Ana; Gutierrez, Mercedes; Lugo, Javier; Lillo, Rosa; Leiva-Salinas, Carlos

2015-01-01

Introduction Preanalytical control and monitoring continue to be an important issue for clinical laboratory professionals. The aim of the study was to evaluate a monitoring system of preanalytical errors regarding not suitable samples for analysis, based on different indicators; to compare such indicators in different phlebotomy centres; and finally to evaluate a single synthetic preanalytical indicator that may be included in the balanced scorecard management system (BSC). Materials and methods We collected individual and global preanalytical errors in haematology, coagulation, chemistry, and urine samples analysis. We also analyzed a synthetic indicator that represents the sum of all types of preanalytical errors, expressed in a sigma level. We studied the evolution of those indicators over time and compared indicator results by way of the comparison of proportions and Chi-square. Results There was a decrease in the number of errors along the years (P < 0.001). This pattern was confirmed in primary care patients, inpatients and outpatients. In blood samples, fewer errors occurred in outpatients, followed by inpatients. Conclusion We present a practical and effective methodology to monitor unsuitable sample preanalytical errors. The synthetic indicator results summarize overall preanalytical sample errors, and can be used as part of BSC management system. PMID:25672466
WISC-R Examiner Errors: Cause for Concern.

ERIC Educational Resources Information Center

Slate, John R.; Chick, David

1989-01-01

Clinical psychology graduate students (N=14) administered Wechsler Intelligence Scale for Children-Revised. Found numerous scoring and mechanical errors that influenced full-scale intelligence quotient scores on two-thirds of protocols. Particularly prone to error were Verbal subtests of Vocabulary, Comprehension, and Similarities. Noted specific…
Physician Preferences to Communicate Neuropsychological Results: Comparison of Qualitative Descriptors and a Proposal to Reduce Communication Errors.

PubMed

Schoenberg, Mike R; Osborn, Katie E; Mahone, E Mark; Feigon, Maia; Roth, Robert M; Pliskin, Neil H

2017-11-08

Errors in communication are a leading cause of medical errors. A potential source of error in communicating neuropsychological results is confusion in the qualitative descriptors used to describe standardized neuropsychological data. This study sought to evaluate the extent to which medical consumers of neuropsychological assessments believed that results/findings were not clearly communicated. In addition, preference data for a variety of qualitative descriptors commonly used to communicate normative neuropsychological test scores were obtained. Preference data were obtained for five qualitative descriptor systems as part of a larger 36-item internet-based survey of physician satisfaction with neuropsychological services. A new qualitative descriptor system termed the Simplified Qualitative Classification System (Q-Simple) was proposed to reduce the potential for communication errors using seven terms: very superior, superior, high average, average, low average, borderline, and abnormal/impaired. A non-random convenience sample of 605 clinicians identified from four United States academic medical centers from January 1, 2015 through January 7, 2016 were invited to participate. A total of 182 surveys were completed. A minority of clinicians (12.5%) indicated that neuropsychological study results were not clearly communicated. When communicating neuropsychological standardized scores, the two most preferred qualitative descriptor systems were by Heaton and colleagues (26%) and a newly proposed Q-simple system (22%). Comprehensive norms for an extended Halstead-Reitan battery: Demographic corrections, research findings, and clinical applications. Odessa, TX: Psychological Assessment Resources) (26%) and the newly proposed Q-Simple system (22%). Initial findings highlight the need to improve and standardize communication of neuropsychological results. These data offer initial guidance for preferred terms to communicate test results and form a foundation for more standardized practice among neuropsychologists. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
How Accurate Is a Test Score?

ERIC Educational Resources Information Center

Doppelt, Jerome E.

1956-01-01

The standard error of measurement as a means for estimating the margin of error that should be allowed for in test scores is discussed. The true score measures the performance that is characteristic of the person tested; the variations, plus and minus, around the true score describe a characteristic of the test. When the standard deviation is used…

Speech variability effects on recognition accuracy associated with concurrent task performance by pilots

NASA Technical Reports Server (NTRS)

Simpson, C. A.

1985-01-01

In the present study of the responses of pairs of pilots to aircraft warning classification tasks using an isolated word, speaker-dependent speech recognition system, the induced stress was manipulated by means of different scoring procedures for the classification task and by the inclusion of a competitive manual control task. Both speech patterns and recognition accuracy were analyzed, and recognition errors were recorded by type for an isolated word speaker-dependent system and by an offline technique for a connected word speaker-dependent system. While errors increased with task loading for the isolated word system, there was no such effect for task loading in the case of the connected word system.
Counting-backward test for executive function in idiopathic normal pressure hydrocephalus.

PubMed

Kanno, S; Saito, M; Hayashi, A; Uchiyama, M; Hiraoka, K; Nishio, Y; Hisanaga, K; Mori, E

2012-10-01

The aim of this study was to develop and validate a bedside test for executive function in patients with idiopathic normal pressure hydrocephalus (INPH). Twenty consecutive patients with INPH and 20 patients with Alzheimer's disease (AD) were enrolled in this study. We developed the counting-backward test for evaluating executive function in patients with INPH. Two indices that are considered to be reflective of the attention deficits and response suppression underlying executive dysfunction in INPH were calculated: the first-error score and the reverse-effect index. Performance on both the counting-backward test and standard neuropsychological tests for executive function was assessed in INPH and AD patients. The first-error score, reverse-effect index and the scores from the standard neuropsychological tests for executive function were significantly lower for individuals in the INPH group than in the AD group. The two indices for the counting-backward test in the INPH group were strongly correlated with the total scores for Frontal Assessment Battery and Phonemic Verbal Fluency. The first-error score was also significantly correlated with the error rate of the Stroop colour-word test and the score of the go/no-go test. In addition, we found that the first-error score highly distinguished patients with INPH from those with AD using these tests. The counting-backward test is useful for evaluating executive dysfunction in INPH and for differentiating between INPH and AD patients. In particular, the first-error score may reflect deficits in the response suppression related to executive dysfunction in INPH. © 2012 John Wiley & Sons A/S.
Administration of Neuropsychological Tests Using Interactive Voice Response Technology in the Elderly: Validation and Limitations

PubMed Central

Miller, Delyana Ivanova; Talbot, Vincent; Gagnon, Michèle; Messier, Claude

2013-01-01

Interactive voice response (IVR) systems are computer programs, which interact with people to provide a number of services from business to health care. We examined the ability of an IVR system to administer and score a verbal fluency task (fruits) and the digit span forward and backward in 158 community dwelling people aged between 65 and 92 years of age (full scale IQ of 68–134). Only six participants could not complete all tasks mostly due to early technical problems in the study. Participants were also administered the Wechsler Intelligence Scale fourth edition (WAIS-IV) and Wechsler Memory Scale fourth edition subtests. The IVR system correctly recognized 90% of the fruits in the verbal fluency task and 93–95% of the number sequences in the digit span. The IVR system typically underestimated the performance of participants because of voice recognition errors. In the digit span, these errors led to the erroneous discontinuation of the test: however the correlation between IVR scoring and clinical scoring was still high (93–95%). The correlation between the IVR verbal fluency and the WAIS-IV Similarities subtest was 0.31. The correlation between the IVR digit span forward and backward and the in-person administration was 0.46. We discuss how valid and useful IVR systems are for neuropsychological testing in the elderly. PMID:23950755
The Cut-Score Operating Function: A New Tool to Aid in Standard Setting

ERIC Educational Resources Information Center

Grabovsky, Irina; Wainer, Howard

2017-01-01

In this essay, we describe the construction and use of the Cut-Score Operating Function in aiding standard setting decisions. The Cut-Score Operating Function shows the relation between the cut-score chosen and the consequent error rate. It allows error rates to be defined by multiple loss functions and will show the behavior of each loss…
Measurement Error and Bias in Value-Added Models. Research Report. ETS RR-17-25

ERIC Educational Resources Information Center

Kane, Michael T.

2017-01-01

By aggregating residual gain scores (the differences between each student's current score and a predicted score based on prior performance) for a school or a teacher, value-added models (VAMs) can be used to generate estimates of school or teacher effects. It is known that random errors in the prior scores will introduce bias into predictions of…
Assessing dangerous driving behavior during driving inattention: Psychometric adaptation and validation of the Attention-Related Driving Errors Scale in China.

PubMed

Qu, Weina; Ge, Yan; Zhang, Qian; Zhao, Wenguo; Zhang, Kan

2015-07-01

Driver inattention is a significant cause of motor vehicle collisions and incidents. The purpose of this study was to translate the Attention-Related Driving Error Scale (ARDES) into Chinese and to verify its reliability and validity. A total of 317 drivers completed the Chinese version of the ARDES, the Dula Dangerous Driving Index (DDDI), the Attention-Related Cognitive Errors Scale (ARCES) and the Mindful Attention Awareness Scale (MAAS) questionnaires. Specific sociodemographic variables and traffic violations were also measured. Psychometric results confirm that the ARDES-China has adequate psychometric properties (Cronbach's alpha=0.88) to be a useful tool for evaluating proneness to attentional errors in the Chinese driving population. First, ARDES-China scores were positively correlated with both DDDI scores and number of accidents in the prior year; in addition, ARDES-China scores were a significant predictor of dangerous driving behavior as measured by DDDI. Second, we found that ARDES-China scores were strongly correlated with ARCES scores and negatively correlated with MAAS scores. Finally, different demographic groups exhibited significant differences in ARDES scores; in particular, ARDES scores varied with years of driving experience. Copyright © 2015 Elsevier Ltd. All rights reserved.
Target Uncertainty Mediates Sensorimotor Error Correction

PubMed Central

Vijayakumar, Sethu; Wolpert, Daniel M.

2017-01-01

Human movements are prone to errors that arise from inaccuracies in both our perceptual processing and execution of motor commands. We can reduce such errors by both improving our estimates of the state of the world and through online error correction of the ongoing action. Two prominent frameworks that explain how humans solve these problems are Bayesian estimation and stochastic optimal feedback control. Here we examine the interaction between estimation and control by asking if uncertainty in estimates affects how subjects correct for errors that may arise during the movement. Unbeknownst to participants, we randomly shifted the visual feedback of their finger position as they reached to indicate the center of mass of an object. Even though participants were given ample time to compensate for this perturbation, they only fully corrected for the induced error on trials with low uncertainty about center of mass, with correction only partial in trials involving more uncertainty. The analysis of subjects’ scores revealed that participants corrected for errors just enough to avoid significant decrease in their overall scores, in agreement with the minimal intervention principle of optimal feedback control. We explain this behavior with a term in the loss function that accounts for the additional effort of adjusting one’s response. By suggesting that subjects’ decision uncertainty, as reflected in their posterior distribution, is a major factor in determining how their sensorimotor system responds to error, our findings support theoretical models in which the decision making and control processes are fully integrated. PMID:28129323
Target Uncertainty Mediates Sensorimotor Error Correction.

PubMed

Acerbi, Luigi; Vijayakumar, Sethu; Wolpert, Daniel M

2017-01-01

Human movements are prone to errors that arise from inaccuracies in both our perceptual processing and execution of motor commands. We can reduce such errors by both improving our estimates of the state of the world and through online error correction of the ongoing action. Two prominent frameworks that explain how humans solve these problems are Bayesian estimation and stochastic optimal feedback control. Here we examine the interaction between estimation and control by asking if uncertainty in estimates affects how subjects correct for errors that may arise during the movement. Unbeknownst to participants, we randomly shifted the visual feedback of their finger position as they reached to indicate the center of mass of an object. Even though participants were given ample time to compensate for this perturbation, they only fully corrected for the induced error on trials with low uncertainty about center of mass, with correction only partial in trials involving more uncertainty. The analysis of subjects' scores revealed that participants corrected for errors just enough to avoid significant decrease in their overall scores, in agreement with the minimal intervention principle of optimal feedback control. We explain this behavior with a term in the loss function that accounts for the additional effort of adjusting one's response. By suggesting that subjects' decision uncertainty, as reflected in their posterior distribution, is a major factor in determining how their sensorimotor system responds to error, our findings support theoretical models in which the decision making and control processes are fully integrated.
[Failure modes and effects analysis in the prescription, validation and dispensing process].

PubMed

Delgado Silveira, E; Alvarez Díaz, A; Pérez Menéndez-Conde, C; Serna Pérez, J; Rodríguez Sagrado, M A; Bermejo Vicedo, T

2012-01-01

To apply a failure modes and effects analysis to the prescription, validation and dispensing process for hospitalised patients. A work group analysed all of the stages included in the process from prescription to dispensing, identifying the most critical errors and establishing potential failure modes which could produce a mistake. The possible causes, their potential effects, and the existing control systems were analysed to try and stop them from developing. The Hazard Score was calculated, choosing those that were ≥ 8, and a Severity Index = 4 was selected independently of the hazard Score value. Corrective measures and an implementation plan were proposed. A flow diagram that describes the whole process was obtained. A risk analysis was conducted of the chosen critical points, indicating: failure mode, cause, effect, severity, probability, Hazard Score, suggested preventative measure and strategy to achieve so. Failure modes chosen: Prescription on the nurse's form; progress or treatment order (paper); Prescription to incorrect patient; Transcription error by nursing staff and pharmacist; Error preparing the trolley. By applying a failure modes and effects analysis to the prescription, validation and dispensing process, we have been able to identify critical aspects, the stages in which errors may occur and the causes. It has allowed us to analyse the effects on the safety of the process, and establish measures to prevent or reduce them. Copyright © 2010 SEFH. Published by Elsevier Espana. All rights reserved.
A quantification of the effectiveness of EPID dosimetry and software-based plan verification systems in detecting incidents in radiotherapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bojechko, Casey; Phillps, Mark; Kalet, Alan

Purpose: Complex treatments in radiation therapy require robust verification in order to prevent errors that can adversely affect the patient. For this purpose, the authors estimate the effectiveness of detecting errors with a “defense in depth” system composed of electronic portal imaging device (EPID) based dosimetry and a software-based system composed of rules-based and Bayesian network verifications. Methods: The authors analyzed incidents with a high potential severity score, scored as a 3 or 4 on a 4 point scale, recorded in an in-house voluntary incident reporting system, collected from February 2012 to August 2014. The incidents were categorized into differentmore » failure modes. The detectability, defined as the number of incidents that are detectable divided total number of incidents, was calculated for each failure mode. Results: In total, 343 incidents were used in this study. Of the incidents 67% were related to photon external beam therapy (EBRT). The majority of the EBRT incidents were related to patient positioning and only a small number of these could be detected by EPID dosimetry when performed prior to treatment (6%). A large fraction could be detected by in vivo dosimetry performed during the first fraction (74%). Rules-based and Bayesian network verifications were found to be complimentary to EPID dosimetry, able to detect errors related to patient prescriptions and documentation, and errors unrelated to photon EBRT. Combining all of the verification steps together, 91% of all EBRT incidents could be detected. Conclusions: This study shows that the defense in depth system is potentially able to detect a large majority of incidents. The most effective EPID-based dosimetry verification is in vivo measurements during the first fraction and is complemented by rules-based and Bayesian network plan checking.« less
Evaluation of tactual displays for flight control

NASA Technical Reports Server (NTRS)

Levison, W. H.; Tanner, R. B.; Triggs, T. J.

1973-01-01

Manual tracking experiments were conducted to determine the suitability of tactual displays for presenting flight-control information in multitask situations. Although tracking error scores are considerably greater than scores obtained with a continuous visual display, preliminary results indicate that inter-task interference effects are substantially less with the tactual display in situations that impose high visual scanning workloads. The single-task performance degradation found with the tactual display appears to be a result of the coding scheme rather than the use of the tactual sensory mode per se. Analysis with the state-variable pilot/vehicle model shows that reliable predictions of tracking errors can be obtained for wide-band tracking systems once the pilot-related model parameters have been adjusted to reflect the pilot-display interaction.
ADEPT, a dynamic next generation sequencing data error-detection program with trimming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feng, Shihai; Lo, Chien-Chi; Li, Po-E

Illumina is the most widely used next generation sequencing technology and produces millions of short reads that contain errors. These sequencing errors constitute a major problem in applications such as de novo genome assembly, metagenomics analysis and single nucleotide polymorphism discovery. In this study, we present ADEPT, a dynamic error detection method, based on the quality scores of each nucleotide and its neighboring nucleotides, together with their positions within the read and compares this to the position-specific quality score distribution of all bases within the sequencing run. This method greatly improves upon other available methods in terms of the truemore » positive rate of error discovery without affecting the false positive rate, particularly within the middle of reads. We conclude that ADEPT is the only tool to date that dynamically assesses errors within reads by comparing position-specific and neighboring base quality scores with the distribution of quality scores for the dataset being analyzed. The result is a method that is less prone to position-dependent under-prediction, which is one of the most prominent issues in error prediction. The outcome is that ADEPT improves upon prior efforts in identifying true errors, primarily within the middle of reads, while reducing the false positive rate.« less
ADEPT, a dynamic next generation sequencing data error-detection program with trimming

DOE PAGES

Feng, Shihai; Lo, Chien-Chi; Li, Po-E; ...

2016-02-29

Illumina is the most widely used next generation sequencing technology and produces millions of short reads that contain errors. These sequencing errors constitute a major problem in applications such as de novo genome assembly, metagenomics analysis and single nucleotide polymorphism discovery. In this study, we present ADEPT, a dynamic error detection method, based on the quality scores of each nucleotide and its neighboring nucleotides, together with their positions within the read and compares this to the position-specific quality score distribution of all bases within the sequencing run. This method greatly improves upon other available methods in terms of the truemore » positive rate of error discovery without affecting the false positive rate, particularly within the middle of reads. We conclude that ADEPT is the only tool to date that dynamically assesses errors within reads by comparing position-specific and neighboring base quality scores with the distribution of quality scores for the dataset being analyzed. The result is a method that is less prone to position-dependent under-prediction, which is one of the most prominent issues in error prediction. The outcome is that ADEPT improves upon prior efforts in identifying true errors, primarily within the middle of reads, while reducing the false positive rate.« less
Analyzing human errors in flight mission operations

NASA Technical Reports Server (NTRS)

Bruno, Kristin J.; Welz, Linda L.; Barnes, G. Michael; Sherif, Josef

1993-01-01

A long-term program is in progress at JPL to reduce cost and risk of flight mission operations through a defect prevention/error management program. The main thrust of this program is to create an environment in which the performance of the total system, both the human operator and the computer system, is optimized. To this end, 1580 Incident Surprise Anomaly reports (ISA's) from 1977-1991 were analyzed from the Voyager and Magellan projects. A Pareto analysis revealed that 38 percent of the errors were classified as human errors. A preliminary cluster analysis based on the Magellan human errors (204 ISA's) is presented here. The resulting clusters described the underlying relationships among the ISA's. Initial models of human error in flight mission operations are presented. Next, the Voyager ISA's will be scored and included in the analysis. Eventually, these relationships will be used to derive a theoretically motivated and empirically validated model of human error in flight mission operations. Ultimately, this analysis will be used to make continuous process improvements continuous process improvements to end-user applications and training requirements. This Total Quality Management approach will enable the management and prevention of errors in the future.
Towards more reliable automated multi-dose dispensing: retrospective follow-up study on medication dose errors and product defects.

PubMed

Palttala, Iida; Heinämäki, Jyrki; Honkanen, Outi; Suominen, Risto; Antikainen, Osmo; Hirvonen, Jouni; Yliruusi, Jouko

2013-03-01

To date, little is known on applicability of different types of pharmaceutical dosage forms in an automated high-speed multi-dose dispensing process. The purpose of the present study was to identify and further investigate various process-induced and/or product-related limitations associated with multi-dose dispensing process. The rates of product defects and dose dispensing errors in automated multi-dose dispensing were retrospectively investigated during a 6-months follow-up period. The study was based on the analysis of process data of totally nine automated high-speed multi-dose dispensing systems. Special attention was paid to the dependence of multi-dose dispensing errors/product defects and pharmaceutical tablet properties (such as shape, dimensions, weight, scored lines, coatings, etc.) to profile the most suitable forms of tablets for automated dose dispensing systems. The relationship between the risk of errors in dose dispensing and tablet characteristics were visualized by creating a principal component analysis (PCA) model for the outcome of dispensed tablets. The two most common process-induced failures identified in the multi-dose dispensing are predisposal of tablet defects and unexpected product transitions in the medication cassette (dose dispensing error). The tablet defects are product-dependent failures, while the tablet transitions are dependent on automated multi-dose dispensing systems used. The occurrence of tablet defects is approximately twice as common as tablet transitions. Optimal tablet preparation for the high-speed multi-dose dispensing would be a round-shaped, relatively small/middle-sized, film-coated tablet without any scored line. Commercial tablet products can be profiled and classified based on their suitability to a high-speed multi-dose dispensing process.
Modified Balance Error Scoring System (M-BESS) test scores in athletes wearing protective equipment and cleats.

PubMed

Azad, Aftab Mohammad; Al Juma, Saad; Bhatti, Junaid Ahmad; Delaney, J Scott

2016-01-01

Balance testing is an important part of the initial concussion assessment. There is no research on the differences in Modified Balance Error Scoring System (M-BESS) scores when tested in real world as compared to control conditions. To assess the difference in M-BESS scores in athletes wearing their protective equipment and cleats on different surfaces as compared to control conditions. This cross-sectional study examined university North American football and soccer athletes. Three observers independently rated athletes performing the M-BESS test in three different conditions: (1) wearing shorts and T-shirt in bare feet on firm surface (control); (2) wearing athletic equipment with cleats on FieldTurf; and (3) wearing athletic equipment with cleats on firm surface. Mean M-BESS scores were compared between conditions. 60 participants were recruited: 39 from football (all males) and 21 from soccer (11 males and 10 females). Average age was 21.1 years (SD=1.8). Mean M-BESS scores were significantly lower (p<0.001) for cleats on FieldTurf (mean=26.3; SD=2.0) and for cleats on firm surface (mean=26.6; SD=2.1) as compared to the control condition (mean=28.4; SD=1.5). Females had lower scores than males for cleats on FieldTurf condition (24.9 (SD=1.9) vs 27.3 (SD=1.6), p=0.005). Players who had taping or bracing on their ankles/feet had lower scores when tested with cleats on firm surface condition (24.6 (SD=1.7) vs 26.9 (SD=2.0), p=0.002). Total M-BESS scores for athletes wearing protective equipment and cleats standing on FieldTurf or a firm surface are around two points lower than M-BESS scores performed on the same athletes under control conditions.
Modified Balance Error Scoring System (M-BESS) test scores in athletes wearing protective equipment and cleats

PubMed Central

Azad, Aftab Mohammad; Al Juma, Saad; Bhatti, Junaid Ahmad; Delaney, J Scott

2016-01-01

Background Balance testing is an important part of the initial concussion assessment. There is no research on the differences in Modified Balance Error Scoring System (M-BESS) scores when tested in real world as compared to control conditions. Objective To assess the difference in M-BESS scores in athletes wearing their protective equipment and cleats on different surfaces as compared to control conditions. Methods This cross-sectional study examined university North American football and soccer athletes. Three observers independently rated athletes performing the M-BESS test in three different conditions: (1) wearing shorts and T-shirt in bare feet on firm surface (control); (2) wearing athletic equipment with cleats on FieldTurf; and (3) wearing athletic equipment with cleats on firm surface. Mean M-BESS scores were compared between conditions. Results 60 participants were recruited: 39 from football (all males) and 21 from soccer (11 males and 10 females). Average age was 21.1 years (SD=1.8). Mean M-BESS scores were significantly lower (p<0.001) for cleats on FieldTurf (mean=26.3; SD=2.0) and for cleats on firm surface (mean=26.6; SD=2.1) as compared to the control condition (mean=28.4; SD=1.5). Females had lower scores than males for cleats on FieldTurf condition (24.9 (SD=1.9) vs 27.3 (SD=1.6), p=0.005). Players who had taping or bracing on their ankles/feet had lower scores when tested with cleats on firm surface condition (24.6 (SD=1.7) vs 26.9 (SD=2.0), p=0.002). Conclusions Total M-BESS scores for athletes wearing protective equipment and cleats standing on FieldTurf or a firm surface are around two points lower than M-BESS scores performed on the same athletes under control conditions. PMID:27900181
Shared dosimetry error in epidemiological dose-response analyses

DOE PAGES

Stram, Daniel O.; Preston, Dale L.; Sokolnikov, Mikhail; ...

2015-03-23

Radiation dose reconstruction systems for large-scale epidemiological studies are sophisticated both in providing estimates of dose and in representing dosimetry uncertainty. For example, a computer program was used by the Hanford Thyroid Disease Study to provide 100 realizations of possible dose to study participants. The variation in realizations reflected the range of possible dose for each cohort member consistent with the data on dose determinates in the cohort. Another example is the Mayak Worker Dosimetry System 2013 which estimates both external and internal exposures and provides multiple realizations of "possible" dose history to workers given dose determinants. This paper takesmore » up the problem of dealing with complex dosimetry systems that provide multiple realizations of dose in an epidemiologic analysis. In this paper we derive expected scores and the information matrix for a model used widely in radiation epidemiology, namely the linear excess relative risk (ERR) model that allows for a linear dose response (risk in relation to radiation) and distinguishes between modifiers of background rates and of the excess risk due to exposure. We show that treating the mean dose for each individual (calculated by averaging over the realizations) as if it was true dose (ignoring both shared and unshared dosimetry errors) gives asymptotically unbiased estimates (i.e. the score has expectation zero) and valid tests of the null hypothesis that the ERR slope β is zero. Although the score is unbiased the information matrix (and hence the standard errors of the estimate of β) is biased for β≠0 when ignoring errors in dose estimates, and we show how to adjust the information matrix to remove this bias, using the multiple realizations of dose. The use of these methods in the context of several studies including, the Mayak Worker Cohort, and the U.S. Atomic Veterans Study, is discussed.« less
A new approach to the characterization of subtle errors in everyday action: implications for mild cognitive impairment.

PubMed

Seligman, Sarah C; Giovannetti, Tania; Sestito, John; Libon, David J

2014-01-01

Mild functional difficulties have been associated with early cognitive decline in older adults and increased risk for conversion to dementia in mild cognitive impairment, but our understanding of this decline has been limited by a dearth of objective methods. This study evaluated the reliability and validity of a new system to code subtle errors on an established performance-based measure of everyday action and described preliminary findings within the context of a theoretical model of action disruption. Here 45 older adults completed the Naturalistic Action Test (NAT) and neuropsychological measures. NAT performance was coded for overt errors, and subtle action difficulties were scored using a novel coding system. An inter-rater reliability coefficient was calculated. Validity of the coding system was assessed using a repeated-measures ANOVA with NAT task (simple versus complex) and error type (overt versus subtle) as within-group factors. Correlation/regression analyses were conducted among overt NAT errors, subtle NAT errors, and neuropsychological variables. The coding of subtle action errors was reliable and valid, and episodic memory breakdown predicted subtle action disruption. Results suggest that the NAT can be useful in objectively assessing subtle functional decline. Treatments targeting episodic memory may be most effective in addressing early functional impairment in older age.
Imperfect practice makes perfect: error management training improves transfer of learning.

PubMed

Dyre, Liv; Tabor, Ann; Ringsted, Charlotte; Tolsgaard, Martin G

2017-02-01

Traditionally, trainees are instructed to practise with as few errors as possible during simulation-based training. However, transfer of learning may improve if trainees are encouraged to commit errors. The aim of this study was to assess the effects of error management instructions compared with error avoidance instructions during simulation-based ultrasound training. Medical students (n = 60) with no prior ultrasound experience were randomised to error management training (EMT) (n = 32) or error avoidance training (EAT) (n = 28). The EMT group was instructed to deliberately make errors during training. The EAT group was instructed to follow the simulator instructions and to commit as few errors as possible. Training consisted of 3 hours of simulation-based ultrasound training focusing on fetal weight estimation. Simulation-based tests were administered before and after training. Transfer tests were performed on real patients 7-10 days after the completion of training. Primary outcomes were transfer test performance scores and diagnostic accuracy. Secondary outcomes included performance scores and diagnostic accuracy during the simulation-based pre- and post-tests. A total of 56 participants completed the study. On the transfer test, EMT group participants attained higher performance scores (mean score: 67.7%, 95% confidence interval [CI]: 62.4-72.9%) than EAT group members (mean score: 51.7%, 95% CI: 45.8-57.6%) (p < 0.001; Cohen's d = 1.1, 95% CI: 0.5-1.7). There was a moderate improvement in diagnostic accuracy in the EMT group compared with the EAT group (16.7%, 95% CI: 10.2-23.3% weight deviation versus 26.6%, 95% CI: 16.5-36.7% weight deviation [p = 0.082; Cohen's d = 0.46, 95% CI: -0.06 to 1.0]). No significant interaction effects between group and performance improvements between the pre- and post-tests were found in either performance scores (p = 0.25) or diagnostic accuracy (p = 0.09). The provision of error management instructions during simulation-based training improves the transfer of learning to the clinical setting compared with error avoidance instructions. Rather than teaching to avoid errors, the use of errors for learning should be explored further in medical education theory and practice. © 2016 John Wiley & Sons Ltd and The Association for the Study of Medical Education.

Conditional standard errors of measurement for composite scores on the Wechsler Preschool and Primary Scale of Intelligence-Third Edition.

PubMed

Price, Larry R; Raju, Nambury; Lurie, Anna; Wilkins, Charles; Zhu, Jianjun

2006-02-01

A specific recommendation of the 1999 Standards for Educational and Psychological Testing by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education is that test publishers report estimates of the conditional standard error of measurement (SEM). Procedures for calculating the conditional (score-level) SEM based on raw scores are well documented; however, few procedures have been developed for estimating the conditional SEM of subtest or composite scale scores resulting from a nonlinear transformation. Item response theory provided the psychometric foundation to derive the conditional standard errors of measurement and confidence intervals for composite scores on the Wechsler Preschool and Primary Scale of Intelligence-Third Edition.
Research on Operation Assessment Method for Energy Meter

NASA Astrophysics Data System (ADS)

Chen, Xiangqun; Huang, Rui; Shen, Liman; chen, Hao; Xiong, Dezhi; Xiao, Xiangqi; Liu, Mouhai; Xu, Renheng

2018-03-01

The existing electric energy meter rotation maintenance strategy regularly checks the electric energy meter and evaluates the state. It only considers the influence of time factors, neglects the influence of other factors, leads to the inaccuracy of the evaluation, and causes the waste of resources. In order to evaluate the running state of the electric energy meter in time, a method of the operation evaluation of the electric energy meter is proposed. The method is based on extracting the existing data acquisition system, marketing business system and metrology production scheduling platform that affect the state of energy meters, and classified into error stability, operational reliability, potential risks and other factors according to the influencing factors, based on the above basic test score, inspecting score, monitoring score, score of family defect detection. Then, according to the evaluation model according to the scoring, we evaluate electric energy meter operating state, and finally put forward the corresponding maintenance strategy of rotation.
A Method of Evaluating Operation of Electric Energy Meter

NASA Astrophysics Data System (ADS)

Chen, Xiangqun; Li, Tianyang; Cao, Fei; Chu, Pengfei; Zhao, Xinwang; Huang, Rui; Liu, Liping; Zhang, Chenglin

2018-05-01

The existing electric energy meter rotation maintenance strategy regularly checks the electric energy meter and evaluates the state. It only considers the influence of time factors, neglects the influence of other factors, leads to the inaccuracy of the evaluation, and causes the waste of resources. In order to evaluate the running state of the electric energy meter in time, a method of the operation evaluation of the electric energy meter is proposed. The method is based on extracting the existing data acquisition system, marketing business system and metrology production scheduling platform that affect the state of energy meters, and classified into error stability, operational reliability, potential risks and other factors according to the influencing factors, based on the above basic test score, inspecting score, monitoring score, score of family defect detection. Then, according to the evaluation model according to the scoring, we evaluate electric energy meter operating state, and finally put forward the corresponding maintenance strategy of rotation.
Impact of Measurement Error on Statistical Power: Review of an Old Paradox.

ERIC Educational Resources Information Center

Williams, Richard H.; And Others

1995-01-01

The paradox that a Student t-test based on pretest-posttest differences can attain its greatest power when the difference score reliability is zero was explained by demonstrating that power is not a mathematical function of reliability unless either true score variance or error score variance is constant. (SLD)
Preschool speech error patterns predict articulation and phonological awareness outcomes in children with histories of speech sound disorders.

PubMed

Preston, Jonathan L; Hull, Margaret; Edwards, Mary Louise

2013-05-01

To determine if speech error patterns in preschoolers with speech sound disorders (SSDs) predict articulation and phonological awareness (PA) outcomes almost 4 years later. Twenty-five children with histories of preschool SSDs (and normal receptive language) were tested at an average age of 4;6 (years;months) and were followed up at age 8;3. The frequency of occurrence of preschool distortion errors, typical substitution and syllable structure errors, and atypical substitution and syllable structure errors was used to predict later speech sound production, PA, and literacy outcomes. Group averages revealed below-average school-age articulation scores and low-average PA but age-appropriate reading and spelling. Preschool speech error patterns were related to school-age outcomes. Children for whom >10% of their speech sound errors were atypical had lower PA and literacy scores at school age than children who produced <10% atypical errors. Preschoolers who produced more distortion errors were likely to have lower school-age articulation scores than preschoolers who produced fewer distortion errors. Different preschool speech error patterns predict different school-age clinical outcomes. Many atypical speech sound errors in preschoolers may be indicative of weak phonological representations, leading to long-term PA weaknesses. Preschoolers' distortions may be resistant to change over time, leading to persisting speech sound production problems.
Is Coefficient Alpha Robust to Non-Normal Data?

PubMed Central

Sheng, Yanyan; Sheng, Zhaohui

2011-01-01

Coefficient alpha has been a widely used measure by which internal consistency reliability is assessed. In addition to essential tau-equivalence and uncorrelated errors, normality has been noted as another important assumption for alpha. Earlier work on evaluating this assumption considered either exclusively non-normal error score distributions, or limited conditions. In view of this and the availability of advanced methods for generating univariate non-normal data, Monte Carlo simulations were conducted to show that non-normal distributions for true or error scores do create problems for using alpha to estimate the internal consistency reliability. The sample coefficient alpha is affected by leptokurtic true score distributions, or skewed and/or kurtotic error score distributions. Increased sample sizes, not test lengths, help improve the accuracy, bias, or precision of using it with non-normal data. PMID:22363306
Developmental Eye Movement (DEM) Test Norms for Mandarin Chinese-Speaking Chinese Children.

PubMed

Xie, Yachun; Shi, Chunmei; Tong, Meiling; Zhang, Min; Li, Tingting; Xu, Yaqin; Guo, Xirong; Hong, Qin; Chi, Xia

2016-01-01

The Developmental Eye Movement (DEM) test is commonly used as a clinical visual-verbal ocular motor assessment tool to screen and diagnose reading problems at the onset. No established norm exists for using the DEM test with Mandarin Chinese-speaking Chinese children. This study aims to establish the normative values of the DEM test for the Mandarin Chinese-speaking population in China; it also aims to compare the values with three other published norms for English-, Spanish-, and Cantonese-speaking Chinese children. A random stratified sampling method was used to recruit children from eight kindergartens and eight primary schools in the main urban and suburban areas of Nanjing. A total of 1,425 Mandarin Chinese-speaking children aged 5 to 12 years took the DEM test in Mandarin Chinese. A digital recorder was used to record the process. All of the subjects completed a symptomatology survey, and their DEM scores were determined by a trained tester. The scores were computed using the formula in the DEM manual, except that the "vertical scores" were adjusted by taking the vertical errors into consideration. The results were compared with the three other published norms. In our subjects, a general decrease with age was observed for the four eye movement indexes: vertical score, adjusted horizontal score, ratio, and total error. For both the vertical and adjusted horizontal scores, the Mandarin Chinese-speaking children completed the tests much more quickly than the norms for English- and Spanish-speaking children. However, the same group completed the test slightly more slowly than the norms for Cantonese-speaking children. The differences in the means were significant (P<0.001) in all age groups. For several ages, the scores obtained in this study were significantly different from the reported scores of Cantonese-speaking Chinese children (P<0.005). Compared with English-speaking children, only the vertical score of the 6-year-old group, the vertical-horizontal time ratio of the 8-year-old group and the errors of 9-year-old group had no significant difference (P>0.05); compared with Spanish-speaking children, the scores were statistically significant (P<0.001) for the total error scores of the age groups, except the 6-, 9-, 10-, and 11-year-old age groups (P>0.05). DEM norms may be affected by differences in language, cultural, and educational systems among various ethnicities. The norms of the DEM test are proposed for use with Mandarin Chinese-speaking children in Nanjing and will be proposed for children throughout China.
Administration and Scoring Errors of Graduate Students Learning the WISC-IV: Issues and Controversies

ERIC Educational Resources Information Center

Mrazik, Martin; Janzen, Troy M.; Dombrowski, Stefan C.; Barford, Sean W.; Krawchuk, Lindsey L.

2012-01-01

A total of 19 graduate students enrolled in a graduate course conducted 6 consecutive administrations of the Wechsler Intelligence Scale for Children, 4th edition (WISC-IV, Canadian version). Test protocols were examined to obtain data describing the frequency of examiner errors, including administration and scoring errors. Results identified 511…
Exploring the Effectiveness of a Measurement Error Tutorial in Helping Teachers Understand Score Report Results

ERIC Educational Resources Information Center

Zapata-Rivera, Diego; Zwick, Rebecca; Vezzu, Margaret

2016-01-01

The goal of this study was to explore the effectiveness of a short web-based tutorial in helping teachers to better understand the portrayal of measurement error in test score reports. The short video tutorial included both verbal and graphical representations of measurement error. Results showed a significant difference in comprehension scores…
Medication errors: a prospective cohort study of hand-written and computerised physician order entry in the intensive care unit.

PubMed

Shulman, Rob; Singer, Mervyn; Goldstone, John; Bellingan, Geoff

2005-10-05

The study aimed to compare the impact of computerised physician order entry (CPOE) without decision support with hand-written prescribing (HWP) on the frequency, type and outcome of medication errors (MEs) in the intensive care unit. Details of MEs were collected before, and at several time points after, the change from HWP to CPOE. The study was conducted in a London teaching hospital's 22-bedded general ICU. The sampling periods were 28 weeks before and 2, 10, 25 and 37 weeks after introduction of CPOE. The unit pharmacist prospectively recorded details of MEs and the total number of drugs prescribed daily during the data collection periods, during the course of his normal chart review. The total proportion of MEs was significantly lower with CPOE (117 errors from 2429 prescriptions, 4.8%) than with HWP (69 errors from 1036 prescriptions, 6.7%) (p < 0.04). The proportion of errors reduced with time following the introduction of CPOE (p < 0.001). Two errors with CPOE led to patient harm requiring an increase in length of stay and, if administered, three prescriptions with CPOE could potentially have led to permanent harm or death. Differences in the types of error between systems were noted. There was a reduction in major/moderate patient outcomes with CPOE when non-intercepted and intercepted errors were combined (p = 0.01). The mean baseline APACHE II score did not differ significantly between the HWP and the CPOE periods (19.4 versus 20.0, respectively, p = 0.71). Introduction of CPOE was associated with a reduction in the proportion of MEs and an improvement in the overall patient outcome score (if intercepted errors were included). Moderate and major errors, however, remain a significant concern with CPOE.
Lameness detection challenges in automated milking systems addressed with partial least squares discriminant analysis.

PubMed

Garcia, E; Klaas, I; Amigo, J M; Bro, R; Enevoldsen, C

2014-12-01

Lameness causes decreased animal welfare and leads to higher production costs. This study explored data from an automatic milking system (AMS) to model on-farm gait scoring from a commercial farm. A total of 88 cows were gait scored once per week, for 2 5-wk periods. Eighty variables retrieved from AMS were summarized week-wise and used to predict 2 defined classes: nonlame and clinically lame cows. Variables were represented with 2 transformations of the week summarized variables, using 2-wk data blocks before gait scoring, totaling 320 variables (2 × 2 × 80). The reference gait scoring error was estimated in the first week of the study and was, on average, 15%. Two partial least squares discriminant analysis models were fitted to parity 1 and parity 2 groups, respectively, to assign the lameness class according to the predicted probability of being lame (score 3 or 4/4) or not lame (score 1/4). Both models achieved sensitivity and specificity values around 80%, both in calibration and cross-validation. At the optimum values in the receiver operating characteristic curve, the false-positive rate was 28% in the parity 1 model, whereas in the parity 2 model it was about half (16%), which makes it more suitable for practical application; the model error rates were, 23 and 19%, respectively. Based on data registered automatically from one AMS farm, we were able to discriminate nonlame and lame cows, where partial least squares discriminant analysis achieved similar performance to the reference method. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Undergraduate paramedic students cannot do drug calculations.

PubMed

Eastwood, Kathryn; Boyle, Malcolm J; Williams, Brett

2012-01-01

Previous investigation of drug calculation skills of qualified paramedics has highlighted poor mathematical ability with no published studies having been undertaken on undergraduate paramedics. There are three major error classifications. Conceptual errors involve an inability to formulate an equation from information given, arithmetical errors involve an inability to operate a given equation, and finally computation errors are simple errors of addition, subtraction, division and multiplication. The objective of this study was to determine if undergraduate paramedics at a large Australia university could accurately perform common drug calculations and basic mathematical equations normally required in the workplace. A cross-sectional study methodology using a paper-based questionnaire was administered to undergraduate paramedic students to collect demographical data, student attitudes regarding their drug calculation performance, and answers to a series of basic mathematical and drug calculation questions. Ethics approval was granted. The mean score of correct answers was 39.5% with one student scoring 100%, 3.3% of students (n=3) scoring greater than 90%, and 63% (n=58) scoring 50% or less, despite 62% (n=57) of the students stating they 'did not have any drug calculations issues'. On average those who completed a minimum of year 12 Specialist Maths achieved scores over 50%. Conceptual errors made up 48.5%, arithmetical 31.1% and computational 17.4%. This study suggests undergraduate paramedics have deficiencies in performing accurate calculations, with conceptual errors indicating a fundamental lack of mathematical understanding. The results suggest an unacceptable level of mathematical competence to practice safely in the unpredictable prehospital environment.
Psychomotor performance measured in a virtual environment correlates with technical skills in the operating room.

PubMed

Kundhal, Pavi S; Grantcharov, Teodor P

2009-03-01

This study was conducted to validate the role of virtual reality computer simulation as an objective method for assessing laparoscopic technical skills. The authors aimed to investigate whether performance in the operating room, assessed using a modified Objective Structured Assessment of Technical Skill (OSATS), correlated with the performance parameters registered by a virtual reality laparoscopic trainer (LapSim). The study enrolled 10 surgical residents (3 females) with a median of 5.5 years (range, 2-6 years) since graduation who had similar limited experience in laparoscopic surgery (median, 5; range, 1-16 laparoscopic cholecystectomies). All the participants performed three repetitions of seven basic skills tasks on the LapSim laparoscopic trainer and one laparoscopic cholecystectomy in the operating room. The operating room procedure was video recorded and blindly assessed by two independent observers using a modified OSATS rating scale. Assessment in the operating room was based on three parameters: time used, error score, and economy of motion score. During the tasks on the LapSim, time, error (tissue damage and millimeters of tissue damage [tasks 2-6], error score [incomplete target areas, badly placed clips, and dropped clips [task 7]), and economy of movement parameters (path length and angular path) were registered. The correlation between time, economy, and error parameters during the simulated tasks and the operating room procedure was statistically assessed using Spearman's test. Significant correlations were demonstrated between the time used to complete the operating room procedure and time used for task 7 (r (s) = 0.74; p = 0.015). The error score demonstrated during the laparoscopic cholecystectomy correlated well with the tissue damage in three of the seven tasks (p < 0.05), the millimeters of tissue damage during two of the tasks, and the error score in task 7 (r (s) = 0.67; p = 0.034). Furthermore, statistically significant correlations were observed between the economy of motion score from the operative procedure and LapSim's economy parameters (path length and angular path in six of the tasks) (p < 0.05). The current study demonstrated significant correlations between operative performance in the operating room (assessed using a well-validated rating scale) and psychomotor performance in virtual environment assessed by a computer simulator. This provides strong evidence for the validity of the simulator system as an objective tool for assessing laparoscopic skills. Virtual reality simulation can be used in practice to assess technical skills relevant for minimally invasive surgery.
Sensitivity of the Balance Error Scoring System and the Sensory Organization Test in the Combat Environment.

PubMed

Haran, F Jay; Slaboda, Jill C; King, Laurie A; Wright, W Geoff; Houlihan, Daniel; Norris, Jacob N

2016-04-01

This study evaluated the utility of the Balance Error Scoring System (BESS) and the Sensory Organization Test (SOT) as tools for the screening and monitoring of Service members (SMs) with mild traumatic brain injury (mTBI) in a deployed setting during the acute and subacute phases of recovery. Patient records (N = 699) were reviewed for a cohort of SMs who sustained a blast-related mTBI while deployed to Afghanistan and were treated at the Concussion Restoration Care Center (CRCC) at Camp Leatherneck. On initial intake into the CRCC, participants completed two assessments of postural control, the BESS, and SOT. SMs with mTBI performed significantly worse on the BESS and SOT when compared with comparative samples. When the SOT data were further examined using sensory ratios, the results indicated that postural instability was primarily a result of vestibular and visual integration dysfunction (r > 0.62). The main finding of this study was that the sensitivity of the SOT composite score (50-58%) during the acute phase was higher than previous sensitivities found in the sports medicine literature for impact-related trauma.
The effectiveness of the error reporting promoting program on the nursing error incidence rate in Korean operating rooms.

PubMed

Kim, Myoung-Soo; Kim, Jung-Soon; Jung, In Sook; Kim, Young Hae; Kim, Ho Jung

2007-03-01

The purpose of this study was to develop and evaluate an error reporting promoting program(ERPP) to systematically reduce the incidence rate of nursing errors in operating room. A non-equivalent control group non-synchronized design was used. Twenty-six operating room nurses who were in one university hospital in Busan participated in this study. They were stratified into four groups according to their operating room experience and were allocated to the experimental and control groups using a matching method. Mann-Whitney U Test was used to analyze the differences pre and post incidence rates of nursing errors between the two groups. The incidence rate of nursing errors decreased significantly in the experimental group compared to the pre-test score from 28.4% to 15.7%. The incidence rate by domains, it decreased significantly in the 3 domains-"compliance of aseptic technique", "management of document", "environmental management" in the experimental group while it decreased in the control group which was applied ordinary error-reporting method. Error-reporting system can make possible to hold the errors in common and to learn from them. ERPP was effective to reduce the errors of recognition-related nursing activities. For the wake of more effective error-prevention, we will be better to apply effort of risk management along the whole health care system with this program.
Preschool speech error patterns predict articulation and phonological awareness outcomes in children with histories of speech sound disorders

PubMed Central

Preston, Jonathan L.; Hull, Margaret; Edwards, Mary Louise

2012-01-01

Purpose To determine if speech error patterns in preschoolers with speech sound disorders (SSDs) predict articulation and phonological awareness (PA) outcomes almost four years later. Method Twenty-five children with histories of preschool SSDs (and normal receptive language) were tested at an average age of 4;6 and followed up at 8;3. The frequency of occurrence of preschool distortion errors, typical substitution and syllable structure errors, and atypical substitution and syllable structure errors were used to predict later speech sound production, PA, and literacy outcomes. Results Group averages revealed below-average school-age articulation scores and low-average PA, but age-appropriate reading and spelling. Preschool speech error patterns were related to school-age outcomes. Children for whom more than 10% of their speech sound errors were atypical had lower PA and literacy scores at school-age than children who produced fewer than 10% atypical errors. Preschoolers who produced more distortion errors were likely to have lower school-age articulation scores. Conclusions Different preschool speech error patterns predict different school-age clinical outcomes. Many atypical speech sound errors in preschool may be indicative of weak phonological representations, leading to long-term PA weaknesses. Preschool distortions may be resistant to change over time, leading to persisting speech sound production problems. PMID:23184137
DOE Office of Scientific and Technical Information (OSTI.GOV)

Morley, Steven

The PyForecastTools package provides Python routines for calculating metrics for model validation, forecast verification and model comparison. For continuous predictands the package provides functions for calculating bias (mean error, mean percentage error, median log accuracy, symmetric signed bias), and for calculating accuracy (mean squared error, mean absolute error, mean absolute scaled error, normalized RMSE, median symmetric accuracy). Convenience routines to calculate the component parts (e.g. forecast error, scaled error) of each metric are also provided. To compare models the package provides: generic skill score; percent better. Robust measures of scale including median absolute deviation, robust standard deviation, robust coefficient ofmore » variation and the Sn estimator are all provided by the package. Finally, the package implements Python classes for NxN contingency tables. In the case of a multi-class prediction, accuracy and skill metrics such as proportion correct and the Heidke and Peirce skill scores are provided as object methods. The special case of a 2x2 contingency table inherits from the NxN class and provides many additional metrics for binary classification: probability of detection, probability of false detection, false alarm ration, threat score, equitable threat score, bias. Confidence intervals for many of these quantities can be calculated using either the Wald method or Agresti-Coull intervals.« less
Volcanic ash modeling with the NMMB-MONARCH-ASH model: quantification of offline modeling errors

NASA Astrophysics Data System (ADS)

Marti, Alejandro; Folch, Arnau

2018-03-01

Volcanic ash modeling systems are used to simulate the atmospheric dispersion of volcanic ash and to generate forecasts that quantify the impacts from volcanic eruptions on infrastructures, air quality, aviation, and climate. The efficiency of response and mitigation actions is directly associated with the accuracy of the volcanic ash cloud detection and modeling systems. Operational forecasts build on offline coupled modeling systems in which meteorological variables are updated at the specified coupling intervals. Despite the concerns from other communities regarding the accuracy of this strategy, the quantification of the systematic errors and shortcomings associated with the offline modeling systems has received no attention. This paper employs the NMMB-MONARCH-ASH model to quantify these errors by employing different quantitative and categorical evaluation scores. The skills of the offline coupling strategy are compared against those from an online forecast considered to be the best estimate of the true outcome. Case studies are considered for a synthetic eruption with constant eruption source parameters and for two historical events, which suitably illustrate the severe aviation disruptive effects of European (2010 Eyjafjallajökull) and South American (2011 Cordón Caulle) volcanic eruptions. Evaluation scores indicate that systematic errors due to the offline modeling are of the same order of magnitude as those associated with the source term uncertainties. In particular, traditional offline forecasts employed in operational model setups can result in significant uncertainties, failing to reproduce, in the worst cases, up to 45-70 % of the ash cloud of an online forecast. These inconsistencies are anticipated to be even more relevant in scenarios in which the meteorological conditions change rapidly in time. The outcome of this paper encourages operational groups responsible for real-time advisories for aviation to consider employing computationally efficient online dispersal models.
Standard Errors of Estimated Latent Variable Scores with Estimated Structural Parameters

ERIC Educational Resources Information Center

Hoshino, Takahiro; Shigemasu, Kazuo

2008-01-01

The authors propose a concise formula to evaluate the standard error of the estimated latent variable score when the true values of the structural parameters are not known and must be estimated. The formula can be applied to factor scores in factor analysis or ability parameters in item response theory, without bootstrap or Markov chain Monte…
The Importance of Relying on the Manual: Scoring Error Variance in the WISC-IV Vocabulary Subtest

ERIC Educational Resources Information Center

Erdodi, Laszlo A.; Richard, David C. S.; Hopwood, Christopher

2009-01-01

Classical test theory assumes that ability level has no effect on measurement error. Newer test theories, however, argue that the precision of a measurement instrument changes as a function of the examinee's true score. Research has shown that administration errors are common in the Wechsler scales and that subtests requiring subjective scoring…

The Impact of Misspelled Words on Automated Computer Scoring: A Case Study of Scientific Explanations

NASA Astrophysics Data System (ADS)

Ha, Minsu; Nehm, Ross H.

2016-06-01

Automated computerized scoring systems (ACSSs) are being increasingly used to analyze text in many educational settings. Nevertheless, the impact of misspelled words (MSW) on scoring accuracy remains to be investigated in many domains, particularly jargon-rich disciplines such as the life sciences. Empirical studies confirm that MSW are a pervasive feature of human-generated text and that despite improvements, spell-check and auto-replace programs continue to be characterized by significant errors. Our study explored four research questions relating to MSW and text-based computer assessments: (1) Do English language learners (ELLs) produce equivalent magnitudes and types of spelling errors as non-ELLs? (2) To what degree do MSW impact concept-specific computer scoring rules? (3) What impact do MSW have on computer scoring accuracy? and (4) Are MSW more likely to impact false-positive or false-negative feedback to students? We found that although ELLs produced twice as many MSW as non-ELLs, MSW were relatively uncommon in our corpora. The MSW in the corpora were found to be important features of the computer scoring models. Although MSW did not significantly or meaningfully impact computer scoring efficacy across nine different computer scoring models, MSW had a greater impact on the scoring algorithms for naïve ideas than key concepts. Linguistic and concept redundancy in student responses explains the weak connection between MSW and scoring accuracy. Lastly, we found that MSW tend to have a greater impact on false-positive feedback. We discuss the implications of these findings for the development of next-generation science assessments.
Developmental Eye Movement (DEM) Test Norms for Mandarin Chinese-Speaking Chinese Children

PubMed Central

Tong, Meiling; Zhang, Min; Li, Tingting; Xu, Yaqin; Guo, Xirong; Hong, Qin; Chi, Xia

2016-01-01

The Developmental Eye Movement (DEM) test is commonly used as a clinical visual-verbal ocular motor assessment tool to screen and diagnose reading problems at the onset. No established norm exists for using the DEM test with Mandarin Chinese-speaking Chinese children. This study aims to establish the normative values of the DEM test for the Mandarin Chinese-speaking population in China; it also aims to compare the values with three other published norms for English-, Spanish-, and Cantonese-speaking Chinese children. A random stratified sampling method was used to recruit children from eight kindergartens and eight primary schools in the main urban and suburban areas of Nanjing. A total of 1,425 Mandarin Chinese-speaking children aged 5 to 12 years took the DEM test in Mandarin Chinese. A digital recorder was used to record the process. All of the subjects completed a symptomatology survey, and their DEM scores were determined by a trained tester. The scores were computed using the formula in the DEM manual, except that the “vertical scores” were adjusted by taking the vertical errors into consideration. The results were compared with the three other published norms. In our subjects, a general decrease with age was observed for the four eye movement indexes: vertical score, adjusted horizontal score, ratio, and total error. For both the vertical and adjusted horizontal scores, the Mandarin Chinese-speaking children completed the tests much more quickly than the norms for English- and Spanish-speaking children. However, the same group completed the test slightly more slowly than the norms for Cantonese-speaking children. The differences in the means were significant (P<0.001) in all age groups. For several ages, the scores obtained in this study were significantly different from the reported scores of Cantonese-speaking Chinese children (P<0.005). Compared with English-speaking children, only the vertical score of the 6-year-old group, the vertical-horizontal time ratio of the 8-year-old group and the errors of 9-year-old group had no significant difference (P>0.05); compared with Spanish-speaking children, the scores were statistically significant (P<0.001) for the total error scores of the age groups, except the 6-, 9-, 10-, and 11-year-old age groups (P>0.05). DEM norms may be affected by differences in language, cultural, and educational systems among various ethnicities. The norms of the DEM test are proposed for use with Mandarin Chinese-speaking children in Nanjing and will be proposed for children throughout China. PMID:26881754
Safety culture perceptions of pharmacists in Malaysian hospitals and health clinics: a multicentre assessment using the Safety Attitudes Questionnaire

PubMed Central

Samsuri, Srima Elina; Pei Lin, Lua; Fahrni, Mathumalar Loganathan

2015-01-01

Objective To assess the safety attitudes of pharmacists, provide a profile of their domains of safety attitude and correlate their attitudes with self-reported rates of medication errors. Design A cross-sectional study utilising the Safety Attitudes Questionnaire (SAQ). Setting 3 public hospitals and 27 health clinics. Participants 117 pharmacists. Main outcome measure(s) Safety culture mean scores, variation in scores across working units and between hospitals versus health clinics, predictors of safety culture, and medication errors and their correlation. Results Response rate was 83.6% (117 valid questionnaires returned). Stress recognition (73.0±20.4) and working condition (54.8±17.4) received the highest and lowest mean scores, respectively. Pharmacists exhibited positive attitudes towards: stress recognition (58.1%), job satisfaction (46.2%), teamwork climate (38.5%), safety climate (33.3%), perception of management (29.9%) and working condition (15.4%). With the exception of stress recognition, those who worked in health clinics scored higher than those in hospitals (p<0.05) and higher scores (overall score as well as score for each domain except for stress recognition) correlated negatively with reported number of medication errors. Conversely, those working in hospital (versus health clinic) were 8.9 times more likely (p<0.01) to report a medication error (OR 8.9, CI 3.08 to 25.7). As stress recognition increased, the number of medication errors reported increased (p=0.023). Years of work experience (p=0.017) influenced the number of medication errors reported. For every additional year of work experience, pharmacists were 0.87 times less likely to report a medication error (OR 0.87, CI 0.78 to 0.98). Conclusions A minority (20.5%) of the pharmacists working in hospitals and health clinics was in agreement with the overall SAQ questions and scales. Pharmacists in outpatient and ambulatory units and those in health clinics had better perceptions of safety culture. As perceptions improved, the number of medication errors reported decreased. Group-specific interventions that target specific domains are necessary to improve the safety culture. PMID:26610761
An alternative to the balance error scoring system: using a low-cost balance board to improve the validity/reliability of sports-related concussion balance testing.

PubMed

Chang, Jasper O; Levy, Susan S; Seay, Seth W; Goble, Daniel J

2014-05-01

Recent guidelines advocate sports medicine professionals to use balance tests to assess sensorimotor status in the management of concussions. The present study sought to determine whether a low-cost balance board could provide a valid, reliable, and objective means of performing this balance testing. Criterion validity testing relative to a gold standard and 7 day test-retest reliability. University biomechanics laboratory. Thirty healthy young adults. Balance ability was assessed on 2 days separated by 1 week using (1) a gold standard measure (ie, scientific grade force plate), (2) a low-cost Nintendo Wii Balance Board (WBB), and (3) the Balance Error Scoring System (BESS). Validity of the WBB center of pressure path length and BESS scores were determined relative to the force plate data. Test-retest reliability was established based on intraclass correlation coefficients. Composite scores for the WBB had excellent validity (r = 0.99) and test-retest reliability (R = 0.88). Both the validity (r = 0.10-0.52) and test-retest reliability (r = 0.61-0.78) were lower for the BESS. These findings demonstrate that a low-cost balance board can provide improved balance testing accuracy/reliability compared with the BESS. This approach provides a potentially more valid/reliable, yet affordable, means of assessing sports-related concussion compared with current methods.
Impacts of motivational valence on the error-related negativity elicited by full and partial errors.

PubMed

Maruo, Yuya; Schacht, Annekathrin; Sommer, Werner; Masaki, Hiroaki

2016-02-01

Affect and motivation influence the error-related negativity (ERN) elicited by full errors; however, it is unknown whether they also influence ERNs to correct responses accompanied by covert incorrect response activation (partial errors). Here we compared a neutral condition with conditions, where correct responses were rewarded or where incorrect responses were punished with gains and losses of small amounts of money, respectively. Data analysis distinguished ERNs elicited by full and partial errors. In the reward and punishment conditions, ERN amplitudes to both full and partial errors were larger than in the neutral condition, confirming participants' sensitivity to the significance of errors. We also investigated the relationships between ERN amplitudes and the behavioral inhibition and activation systems (BIS/BAS). Regardless of reward/punishment condition, participants scoring higher on BAS showed smaller ERN amplitudes in full error trials. These findings provide further evidence that the ERN is related to motivational valence and that similar relationships hold for both full and partial errors. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Emotion perception, non-social cognition and symptoms as predictors of theory of mind in schizophrenia.

PubMed

Vaskinn, Anja; Andersson, Stein; Østefjells, Tiril; Andreassen, Ole A; Sundet, Kjetil

2018-06-05

Theory of mind (ToM) can be divided into cognitive and affective ToM, and a distinction can be made between overmentalizing and undermentalizing errors. Research has shown that ToM in schizophrenia is associated with non-social and social cognition, and with clinical symptoms. In this study, we investigate cognitive and clinical predictors of different ToM processes. Ninety-one individuals with schizophrenia participated. ToM was measured with the Movie for the Assessment of Social Cognition (MASC) yielding six scores (total ToM, cognitive ToM, affective ToM, overmentalizing errors, undermentalizing errors and no mentalizing errors). Neurocognition was indexed by a composite score based on the non-social cognitive tests in the MATRICS Consensus Cognitive Battery (MCCB). Emotion perception was measured with Emotion in Biological Motion (EmoBio), a point-light walker task. Clinical symptoms were assessed with the Positive and Negative Syndrome Scale (PANSS). Seventy-one healthy control (HC) participants completed the MASC. Individuals with schizophrenia showed large impairments compared to HC for all MASC scores, except overmentalizing errors. Hierarchical regression analyses with the six different MASC scores as dependent variables revealed that MCCB was a significant predictor of all MASC scores, explaining 8-18% of the variance. EmoBio increased the explained variance significantly, to 17-28%, except for overmentalizing errors. PANSS excited symptoms increased explained variance for total ToM, affective ToM and no mentalizing errors. Both social and non-social cognition were significant predictors of ToM. Overmentalizing was only predicted by non-social cognition. Excited symptoms contributed to overall and affective ToM, and to no mentalizing errors. Copyright © 2018 Elsevier Inc. All rights reserved.
Sustained attention to response task (SART) shows impaired vigilance in a spectrum of disorders of excessive daytime sleepiness.

PubMed

Van Schie, Mojca K M; Thijs, Roland D; Fronczek, Rolf; Middelkoop, Huub A M; Lammers, Gert Jan; Van Dijk, J Gert

2012-08-01

The sustained attention to response task comprises withholding key presses to one in nine of 225 target stimuli; it proved to be a sensitive measure of vigilance in a small group of narcoleptics. We studied sustained attention to response task results in 96 patients from a tertiary narcolepsy referral centre. Diagnoses according to ICSD-2 criteria were narcolepsy with (n=42) and without cataplexy (n=5), idiopathic hypersomnia without long sleep time (n=37), and obstructive sleep apnoea syndrome (n=12). The sustained attention to response task was administered prior to each of five multiple sleep latency test sessions. Analysis concerned error rates, mean reaction time, reaction time variability and post-error slowing, as well as the correlation of sustained attention to response task results with mean latency of the multiple sleep latency test and possible time of day influences. Median sustained attention to response task error scores ranged from 8.4 to 11.1, and mean reaction times from 332 to 366ms. Sustained attention to response task error score and mean reaction time did not differ significantly between patient groups. Sustained attention to response task error score did not correlate with multiple sleep latency test sleep latency. Reaction time was more variable as the error score was higher. Sustained attention to response task error score was highest for the first session. We conclude that a high sustained attention to response task error rate reflects vigilance impairment in excessive daytime sleepiness irrespective of its cause. The sustained attention to response task and the multiple sleep latency test reflect different aspects of sleep/wakefulness and are complementary. © 2011 European Sleep Research Society.
The Impact of Measurement Error on the Accuracy of Individual and Aggregate SGP

ERIC Educational Resources Information Center

McCaffrey, Daniel F.; Castellano, Katherine E.; Lockwood, J. R.

2015-01-01

Student growth percentiles (SGPs) express students' current observed scores as percentile ranks in the distribution of scores among students with the same prior-year scores. A common concern about SGPs at the student level, and mean or median SGPs (MGPs) at the aggregate level, is potential bias due to test measurement error (ME). Shang,…
Beyond the Total Score: A Preliminary Investigation into the Types of Phonological Awareness Errors Made by First Graders

ERIC Educational Resources Information Center

Hayward, Denyse V.; Annable, Caitlin D.; Fung, Jennifer E.; Williamson, Robert D.; Lovell-Johnston, Meridith A.; Phillips, Linda M.

2017-01-01

Current phonological awareness assessment procedures consider only the total score a child achieves. Such an approach may result in children who achieve the same total score receiving the same instruction even though the configuration of their errors represent fundamental knowledge differences. The purpose of this study was to develop a tool for…
Undergraduate paramedic students cannot do drug calculations

PubMed Central

Eastwood, Kathryn; Boyle, Malcolm J; Williams, Brett

2012-01-01

BACKGROUND: Previous investigation of drug calculation skills of qualified paramedics has highlighted poor mathematical ability with no published studies having been undertaken on undergraduate paramedics. There are three major error classifications. Conceptual errors involve an inability to formulate an equation from information given, arithmetical errors involve an inability to operate a given equation, and finally computation errors are simple errors of addition, subtraction, division and multiplication. The objective of this study was to determine if undergraduate paramedics at a large Australia university could accurately perform common drug calculations and basic mathematical equations normally required in the workplace. METHODS: A cross-sectional study methodology using a paper-based questionnaire was administered to undergraduate paramedic students to collect demographical data, student attitudes regarding their drug calculation performance, and answers to a series of basic mathematical and drug calculation questions. Ethics approval was granted. RESULTS: The mean score of correct answers was 39.5% with one student scoring 100%, 3.3% of students (n=3) scoring greater than 90%, and 63% (n=58) scoring 50% or less, despite 62% (n=57) of the students stating they ‘did not have any drug calculations issues’. On average those who completed a minimum of year 12 Specialist Maths achieved scores over 50%. Conceptual errors made up 48.5%, arithmetical 31.1% and computational 17.4%. CONCLUSIONS: This study suggests undergraduate paramedics have deficiencies in performing accurate calculations, with conceptual errors indicating a fundamental lack of mathematical understanding. The results suggest an unacceptable level of mathematical competence to practice safely in the unpredictable prehospital environment. PMID:25215067
Graduate Students' Administration and Scoring Errors on the WISC-IV: Reducing Inaccuracies with Training and Experience

ERIC Educational Resources Information Center

Alper, Jaclyn

2012-01-01

A total of 52 Wechsler Intelligence Scale for Children, Fourth Edition (WISC-IV) protocols, administered by graduate students were examined to obtain data on the type and frequency of examiner errors, the impact of errors on resultant test scores as well as improvement rate over the course of two years in training. Findings were consistent with…
Tests for detecting overdispersion in models with measurement error in covariates.

PubMed

Yang, Yingsi; Wong, Man Yu

2015-11-30

Measurement error in covariates can affect the accuracy in count data modeling and analysis. In overdispersion identification, the true mean-variance relationship can be obscured under the influence of measurement error in covariates. In this paper, we propose three tests for detecting overdispersion when covariates are measured with error: a modified score test and two score tests based on the proposed approximate likelihood and quasi-likelihood, respectively. The proposed approximate likelihood is derived under the classical measurement error model, and the resulting approximate maximum likelihood estimator is shown to have superior efficiency. Simulation results also show that the score test based on approximate likelihood outperforms the test based on quasi-likelihood and other alternatives in terms of empirical power. By analyzing a real dataset containing the health-related quality-of-life measurements of a particular group of patients, we demonstrate the importance of the proposed methods by showing that the analyses with and without measurement error correction yield significantly different results. Copyright © 2015 John Wiley & Sons, Ltd.
Lower Extremity Landing Biomechanics in Both Sexes After a Functional Exercise Protocol

PubMed Central

Wesley, Caroline A.; Aronson, Patricia A.; Docherty, Carrie L.

2015-01-01

Context Sex differences in landing biomechanics play a role in increased rates of anterior cruciate ligament (ACL) injuries in female athletes. Exercising to various states of fatigue may negatively affect landing mechanics, resulting in a higher injury risk, but research is inconclusive regarding sex differences in response to fatigue. Objective To use the Landing Error Scoring System (LESS), a valid clinical movement-analysis tool, to determine the effects of exercise on the landing biomechanics of males and females. Design Cross-sectional study. Setting University laboratory. Patients or Other Participants Thirty-six (18 men, 18 women) healthy college-aged athletes (members of varsity, club, or intramural teams) with no history of ACL injury or prior participation in an ACL injury-prevention program. Intervention(s) Participants were videotaped performing 3 jump-landing trials before and after performance of a functional, sportlike exercise protocol consisting of repetitive sprinting, jumping, and cutting tasks. Main Outcome Measure(s) Landing technique was evaluated using the LESS. A higher LESS score indicates more errors. The mean of the 3 LESS scores in each condition (pre-exercise and postexercise) was used for statistical analysis. Results Women scored higher on the LESS (6.3 ± 1.9) than men (5.0 ± 2.3) regardless of time (P = .04). Postexercise scores (6.3 ± 2.1) were higher than preexercise scores (5.0 ± 2.1) for both sexes (P = .01), but women were not affected to a greater degree than men (P = .62). Conclusions As evidenced by their higher LESS scores, females demonstrated more errors in landing technique than males, which may contribute to their increased rate of ACL injury. Both sexes displayed poor technique after the exercise protocol, which may indicate that participants experience a higher risk of ACL injury in the presence of fatigue. PMID:26285090
SU-E-T-105: An FMEA Survey of Intensity Modulated Radiation Therapy (IMRT) Step and Shoot Dose Delivery Failure Modes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Faught, J Tonigan; Johnson, J; Stingo, F

2015-06-15

Purpose: To assess the perception of TG-142 tolerance level dose delivery failures in IMRT and the application of FMEA process to this specific aspect of IMRT. Methods: An online survey was distributed to medical physicists worldwide that briefly described 11 different failure modes (FMs) covered by basic quality assurance in step- and-shoot IMRT at or near TG-142 tolerance criteria levels. For each FM, respondents estimated the worst case H&N patient percent dose error and FMEA scores for Occurrence, Detectability, and Severity. Demographic data was also collected. Results: 181 individual and three group responses were submitted. 84% were from North America.more » Most (76%) individual respondents performed at least 80% clinical work and 92% were nationally certified. Respondent medical physics experience ranged from 2.5–45 years (average 18 years). 52% of individual respondents were at least somewhat familiar with FMEA, while 17% were not familiar. Several IMRT techniques, treatment planning systems and linear accelerator manufacturers were represented. All FMs received widely varying scores ranging from 1–10 for occurrence, at least 1–9 for detectability, and at least 1–7 for severity. Ranking FMs by RPN scores also resulted in large variability, with each FM being ranked both most risky (1st ) and least risky (11th) by different respondents. On average MLC modeling had the highest RPN scores. Individual estimated percent dose errors and severity scores positively correlated (p<0.10) for each FM as expected. No universal correlations were found between the demographic information collected and scoring, percent dose errors, or ranking. Conclusion: FMs investigated overall were evaluated as low to medium risk, with average RPNs less than 110. The ranking of 11 FMs was not agreed upon by the community. Large variability in FMEA scoring may be caused by individual interpretation and/or experience, thus reflecting the subjective nature of the FMEA tool.« less
Forecasting the value of credit scoring

NASA Astrophysics Data System (ADS)

Saad, Shakila; Ahmad, Noryati; Jaffar, Maheran Mohd

2017-08-01

Nowadays, credit scoring system plays an important role in banking sector. This process is important in assessing the creditworthiness of customers requesting credit from banks or other financial institutions. Usually, the credit scoring is used when customers send the application for credit facilities. Based on the score from credit scoring, bank will be able to segregate the "good" clients from "bad" clients. However, in most cases the score is useful at that specific time only and cannot be used to forecast the credit worthiness of the same applicant after that. Hence, bank will not know if "good" clients will always be good all the time or "bad" clients may become "good" clients after certain time. To fill up the gap, this study proposes an equation to forecast the credit scoring of the potential borrowers at a certain time by using the historical score related to the assumption. The Mean Absolute Percentage Error (MAPE) is used to measure the accuracy of the forecast scoring. Result shows the forecast scoring is highly accurate as compared to actual credit scoring.
Clock Drawing as a Screen for Impaired Driving in Aging and Dementia: Is It Worth the Time?

PubMed Central

Manning, Kevin J.; Davis, Jennifer D.; Papandonatos, George D.; Ott, Brian R.

2014-01-01

Clock drawing is recommended by medical and transportation authorities as a screening test for unsafe drivers. The objective of the present study was to assess the usefulness of different clock drawing systems as screening measures of driving performance in 122 healthy and cognitively impaired older drivers. Clock drawing was measured using four different scoring systems. Driving outcomes included global ratings of safety and the error rate on a standardized on-road test. Findings revealed that clock drawing was significantly correlated with the driving score on the road test for each of the scoring systems. However, receiver operator curve analyses showed limited clinical utility for clock drawing as a screening instrument for impaired on-road driving performance with the area under the curve ranging from 0.53 to 0.61. Results from this study indicate that clock drawing has limited utility as a solitary screening measure of on-road driving, even when considering a variety of scoring approaches. PMID:24296110
Clock drawing as a screen for impaired driving in aging and dementia: is it worth the time?

PubMed

Manning, Kevin J; Davis, Jennifer D; Papandonatos, George D; Ott, Brian R

2014-02-01

Clock drawing is recommended by medical and transportation authorities as a screening test for unsafe drivers. The objective of the present study was to assess the usefulness of different clock drawing systems as screening measures of driving performance in 122 healthy and cognitively impaired older drivers. Clock drawing was measured using four different scoring systems. Driving outcomes included global ratings of safety and the error rate on a standardized on-road test. Findings revealed that clock drawing was significantly correlated with the driving score on the road test for each of the scoring systems. However, receiver operator curve analyses showed limited clinical utility for clock drawing as a screening instrument for impaired on-road driving performance with the area under the curve ranging from 0.53 to 0.61. Results from this study indicate that clock drawing has limited utility as a solitary screening measure of on-road driving, even when considering a variety of scoring approaches.
SIMulation of Medication Error induced by Clinical Trial drug labeling: the SIMME-CT study.

PubMed

Dollinger, Cecile; Schwiertz, Vérane; Sarfati, Laura; Gourc-Berthod, Chloé; Guédat, Marie-Gabrielle; Alloux, Céline; Vantard, Nicolas; Gauthier, Noémie; He, Sophie; Kiouris, Elena; Caffin, Anne-Gaelle; Bernard, Delphine; Ranchon, Florence; Rioufol, Catherine

2016-06-01

To assess the impact of investigational drug labels on the risk of medication error in drug dispensing. A simulation-based learning program focusing on investigational drug dispensing was conducted. The study was undertaken in an Investigational Drugs Dispensing Unit of a University Hospital of Lyon, France. Sixty-three pharmacy workers (pharmacists, residents, technicians or students) were enrolled. Ten risk factors were selected concerning label information or the risk of confusion with another clinical trial. Each risk factor was scored independently out of 5: the higher the score, the greater the risk of error. From 400 labels analyzed, two groups were selected for the dispensing simulation: 27 labels with high risk (score ≥3) and 27 with low risk (score ≤2). Each question in the learning program was displayed as a simulated clinical trial prescription. Medication error was defined as at least one erroneous answer (i.e. error in drug dispensing). For each question, response times were collected. High-risk investigational drug labels correlated with medication error and slower response time. Error rates were significantly 5.5-fold higher for high-risk series. Error frequency was not significantly affected by occupational category or experience in clinical trials. SIMME-CT is the first simulation-based learning tool to focus on investigational drug labels as a risk factor for medication error. SIMME-CT was also used as a training tool for staff involved in clinical research, to develop medication error risk awareness and to validate competence in continuing medical education. © The Author 2016. Published by Oxford University Press in association with the International Society for Quality in Health Care; all rights reserved.
Awareness of Diagnostic Error among Japanese Residents: a Nationwide Study.

PubMed

Nishizaki, Yuji; Shinozaki, Tomohiro; Kinoshita, Kensuke; Shimizu, Taro; Tokuda, Yasuharu

2018-04-01

Residents' understanding of diagnostic error may differ between countries. We sought to explore the relationship between diagnostic error knowledge and self-study, clinical knowledge, and experience. Our nationwide study involved postgraduate year 1 and 2 (PGY-1 and -2) Japanese residents. The Diagnostic Error Knowledge Assessment Test (D-KAT) and General Medicine In-Training Examination (GM-ITE) were administered at the end of the 2014 academic year. D-KAT scores were compared with the benchmark scores of US residents. Associations between D-KAT score and gender, PGY, emergency department (ED) rotations per month, mean number of inpatients handled at any given time, and mean daily minutes of self-study were also analyzed, both with and without adjusting for GM-ITE scores. Student's t test was used for comparisons with linear mixed models and structural equation models (SEM) to explore associations with D-KAT or GM-ITE scores. The mean D-KAT score among Japanese PGY-2 residents was significantly lower than that of their US PGY-2 counterparts (6.2 vs. 8.3, p < 0.001). GM-ITE scores correlated with ED rotations (≥6 rotations: 2.14; 0.16-4.13; p = 0.03), inpatient caseloads (5-9 patients: 1.79; 0.82-2.76; p < 0.001), and average daily minutes of self-study (≥91 min: 2.05; 0.56-3.53; p = 0.01). SEM revealed that D-KAT scores were directly associated with GM-ITE scores (ß = 0.37, 95% CI: 0.34-0.41) and indirectly associated with ED rotations (ß = 0.06, 95% CI: 0.02-0.10), inpatient caseload (ß = 0.04, 95% CI: 0.003-0.08), and average daily minutes of study (ß = 0.13, 95% CI: 0.09-0.17). Knowledge regarding diagnostic error among Japanese residents was poor compared with that among US residents. D-KAT scores correlated strongly with GM-ITE scores, and the latter scores were positively associated with a greater number of ED rotations, larger caseload (though only up to 15 patients), and more time spent studying.
Hope Modified the Association between Distress and Incidence of Self-Perceived Medical Errors among Practicing Physicians: Prospective Cohort Study

PubMed Central

Hayashino, Yasuaki; Utsugi-Ozaki, Makiko; Feldman, Mitchell D.; Fukuhara, Shunichi

2012-01-01

The presence of hope has been found to influence an individual's ability to cope with stressful situations. The objective of this study is to evaluate the relationship between medical errors, hope and burnout among practicing physicians using validated metrics. Prospective cohort study was conducted among hospital based physicians practicing in Japan (N = 836). Measures included the validated Burnout Scale, self-assessment of medical errors and Herth Hope Index (HHI). The main outcome measure was the frequency of self-perceived medical errors, and Poisson regression analysis was used to evaluate the association between hope and medical error. A total of 361 errors were reported in 836 physician-years. We observed a significant association between hope and self-report of medical errors. Compared with the lowest tertile category of HHI, incidence rate ratios (IRRs) of self-perceived medical errors of physicians in the highest category were 0.44 (95%CI, 0.34 to 0.58) and 0.54 (95%CI, 0.42 to 0.70) respectively, for the 2nd and 3rd tertile. In stratified analysis by hope score, among physicians with a low hope score, those who experienced higher burnout reported higher incidence of errors; physicians with high hope scores did not report high incidences of errors, even if they experienced high burnout. Self-perceived medical errors showed a strong association with physicians' hope, and hope modified the association between physicians' burnout and self-perceived medical errors. PMID:22530055

Structural MRI-based detection of Alzheimer's disease using feature ranking and classification error.

PubMed

Beheshti, Iman; Demirel, Hasan; Farokhian, Farnaz; Yang, Chunlan; Matsuda, Hiroshi

2016-12-01

This paper presents an automatic computer-aided diagnosis (CAD) system based on feature ranking for detection of Alzheimer's disease (AD) using structural magnetic resonance imaging (sMRI) data. The proposed CAD system is composed of four systematic stages. First, global and local differences in the gray matter (GM) of AD patients compared to the GM of healthy controls (HCs) are analyzed using a voxel-based morphometry technique. The aim is to identify significant local differences in the volume of GM as volumes of interests (VOIs). Second, the voxel intensity values of the VOIs are extracted as raw features. Third, the raw features are ranked using a seven-feature ranking method, namely, statistical dependency (SD), mutual information (MI), information gain (IG), Pearson's correlation coefficient (PCC), t-test score (TS), Fisher's criterion (FC), and the Gini index (GI). The features with higher scores are more discriminative. To determine the number of top features, the estimated classification error based on training set made up of the AD and HC groups is calculated, with the vector size that minimized this error selected as the top discriminative feature. Fourth, the classification is performed using a support vector machine (SVM). In addition, a data fusion approach among feature ranking methods is introduced to improve the classification performance. The proposed method is evaluated using a data-set from ADNI (130 AD and 130 HC) with 10-fold cross-validation. The classification accuracy of the proposed automatic system for the diagnosis of AD is up to 92.48% using the sMRI data. An automatic CAD system for the classification of AD based on feature-ranking method and classification errors is proposed. In this regard, seven-feature ranking methods (i.e., SD, MI, IG, PCC, TS, FC, and GI) are evaluated. The optimal size of top discriminative features is determined by the classification error estimation in the training phase. The experimental results indicate that the performance of the proposed system is comparative to that of state-of-the-art classification models. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
An Ensemble Method for Spelling Correction in Consumer Health Questions

PubMed Central

Kilicoglu, Halil; Fiszman, Marcelo; Roberts, Kirk; Demner-Fushman, Dina

2015-01-01

Orthographic and grammatical errors are a common feature of informal texts written by lay people. Health-related questions asked by consumers are a case in point. Automatic interpretation of consumer health questions is hampered by such errors. In this paper, we propose a method that combines techniques based on edit distance and frequency counts with a contextual similarity-based method for detecting and correcting orthographic errors, including misspellings, word breaks, and punctuation errors. We evaluate our method on a set of spell-corrected questions extracted from the NLM collection of consumer health questions. Our method achieves a F1 score of 0.61, compared to an informed baseline of 0.29, achieved using ESpell, a spelling correction system developed for biomedical queries. Our results show that orthographic similarity is most relevant in spelling error correction in consumer health questions and that frequency and contextual information are complementary to orthographic features. PMID:26958208
The impact of statistical adjustment on conditional standard errors of measurement in the assessment of physician communication skills.

PubMed

Raymond, Mark R; Clauser, Brian E; Furman, Gail E

2010-10-01

The use of standardized patients to assess communication skills is now an essential part of assessing a physician's readiness for practice. To improve the reliability of communication scores, it has become increasingly common in recent years to use statistical models to adjust ratings provided by standardized patients. This study employed ordinary least squares regression to adjust ratings, and then used generalizability theory to evaluate the impact of these adjustments on score reliability and the overall standard error of measurement. In addition, conditional standard errors of measurement were computed for both observed and adjusted scores to determine whether the improvements in measurement precision were uniform across the score distribution. Results indicated that measurement was generally less precise for communication ratings toward the lower end of the score distribution; and the improvement in measurement precision afforded by statistical modeling varied slightly across the score distribution such that the most improvement occurred in the upper-middle range of the score scale. Possible reasons for these patterns in measurement precision are discussed, as are the limitations of the statistical models used for adjusting performance ratings.
Systematic review of the evidence for Trails B cut-off scores in assessing fitness-to-drive.

PubMed

Roy, Mononita; Molnar, Frank

2013-01-01

Fitness-to-drive guidelines recommend employing the Trail Making B Test (a.k.a. Trails B), but do not provide guidance regarding cut-off scores. There is ongoing debate regarding the optimal cut-off score on the Trails B test. The objective of this study was to address this controversy by systematically reviewing the evidence for specific Trails B cut-off scores (e.g., cut-offs in both time to completion and number of errors) with respect to fitness-to-drive. Systematic review of all prospective cohort, retrospective cohort, case-control, correlation, and cross-sectional studies reporting the ability of the Trails B to predict driving safety that were published in English-language, peer-reviewed journals. Forty-seven articles were reviewed. None of the articles justified sample sizes via formal calculations. Cut-off scores reported based on research include: 90 seconds, 133 seconds, 147 seconds, 180 seconds, and < 3 errors. There is support for the previously published Trails B cut-offs of 3 minutes or 3 errors (the '3 or 3 rule'). Major methodological limitations of this body of research were uncovered including (1) lack of justification of sample size leaving studies open to Type II error (i.e., false negative findings), and (2) excessive focus on associations rather than clinically useful cut-off scores.
Preparing a neuropediatric upper limb exergame rehabilitation system for home-use: a feasibility study.

PubMed

Gerber, Corinna N; Kunz, Bettina; van Hedel, Hubertus J A

2016-03-23

Home-based, computer-enhanced therapy of hand and arm function can complement conventional interventions and increase the amount and intensity of training, without interfering too much with family routines. The objective of the present study was to investigate the feasibility and usability of the new portable version of the YouGrabber® system (YouRehab AG, Zurich, Switzerland) in the home setting. Fifteen families of children (7 girls, mean age: 11.3y) with neuromotor disorders and affected upper limbs participated. They received instructions and took the system home to train for 2 weeks. After returning it, they answered questions about usability, motivation, and their general opinion of the system (Visual Analogue Scale; 0 indicating worst score, 100 indicating best score; ≤30 not satisfied, 31-69 average, ≥70 satisfied). Furthermore, total pure playtime and number of training sessions were quantified. To prove the usability of the system, number and sort of support requests were logged. The usability of the system was considered average to satisfying (mean 60.1-93.1). The lowest score was given for the occurrence of technical errors. Parents had to motivate their children to start (mean 66.5) and continue (mean 68.5) with the training. But in general, parents estimated the therapeutic benefit as high (mean 73.1) and the whole system as very good (mean 87.4). Children played on average 7 times during the 2 weeks; total pure playtime was 185 ± 45 min. Especially at the beginning of the trial, systems were very error-prone. Fortunately, we, or the company, solved most problems before the patients took the systems home. Nevertheless, 10 of 15 families contacted us at least once because of technical problems. Despite that the YouGrabber® is a promising and highly accepted training tool for home-use, currently, it is still error-prone, and the requested support exceeds the support that can be provided by clinical therapists. A technically more robust system, combined with additional attractive games, likely results in higher patient motivation and better compliance. This would reduce the need for parents to motivate their children extrinsically and allow for clinical trials to investigate the effectiveness of the system. ClinicalTrials.gov NCT02368223.
Safety culture perceptions of pharmacists in Malaysian hospitals and health clinics: a multicentre assessment using the Safety Attitudes Questionnaire.

PubMed

Samsuri, Srima Elina; Pei Lin, Lua; Fahrni, Mathumalar Loganathan

2015-11-26

To assess the safety attitudes of pharmacists, provide a profile of their domains of safety attitude and correlate their attitudes with self-reported rates of medication errors. A cross-sectional study utilising the Safety Attitudes Questionnaire (SAQ). 3 public hospitals and 27 health clinics. 117 pharmacists. Safety culture mean scores, variation in scores across working units and between hospitals versus health clinics, predictors of safety culture, and medication errors and their correlation. Response rate was 83.6% (117 valid questionnaires returned). Stress recognition (73.0±20.4) and working condition (54.8±17.4) received the highest and lowest mean scores, respectively. Pharmacists exhibited positive attitudes towards: stress recognition (58.1%), job satisfaction (46.2%), teamwork climate (38.5%), safety climate (33.3%), perception of management (29.9%) and working condition (15.4%). With the exception of stress recognition, those who worked in health clinics scored higher than those in hospitals (p<0.05) and higher scores (overall score as well as score for each domain except for stress recognition) correlated negatively with reported number of medication errors. Conversely, those working in hospital (versus health clinic) were 8.9 times more likely (p<0.01) to report a medication error (OR 8.9, CI 3.08 to 25.7). As stress recognition increased, the number of medication errors reported increased (p=0.023). Years of work experience (p=0.017) influenced the number of medication errors reported. For every additional year of work experience, pharmacists were 0.87 times less likely to report a medication error (OR 0.87, CI 0.78 to 0.98). A minority (20.5%) of the pharmacists working in hospitals and health clinics was in agreement with the overall SAQ questions and scales. Pharmacists in outpatient and ambulatory units and those in health clinics had better perceptions of safety culture. As perceptions improved, the number of medication errors reported decreased. Group-specific interventions that target specific domains are necessary to improve the safety culture. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Random Versus Nonrandom Peer Review: A Case for More Meaningful Peer Review.

PubMed

Itri, Jason N; Donithan, Adam; Patel, Sohil H

2018-05-10

Random peer review programs are not optimized to discover cases with diagnostic error and thus have inherent limitations with respect to educational and quality improvement value. Nonrandom peer review offers an alternative approach in which diagnostic error cases are targeted for collection during routine clinical practice. The objective of this study was to compare error cases identified through random and nonrandom peer review approaches at an academic center. During the 1-year study period, the number of discrepancy cases and score of discrepancy were determined from each approach. The nonrandom peer review process collected 190 cases, of which 60 were scored as 2 (minor discrepancy), 94 as 3 (significant discrepancy), and 36 as 4 (major discrepancy). In the random peer review process, 1,690 cases were reviewed, of which 1,646 were scored as 1 (no discrepancy), 44 were scored as 2 (minor discrepancy), and none were scored as 3 or 4. Several teaching lessons and quality improvement measures were developed as a result of analysis of error cases collected through the nonrandom peer review process. Our experience supports the implementation of nonrandom peer review as a replacement to random peer review, with nonrandom peer review serving as a more effective method for collecting diagnostic error cases with educational and quality improvement value. Copyright © 2018 American College of Radiology. Published by Elsevier Inc. All rights reserved.
Staging Sleep in Polysomnograms: Analysis of Inter-Scorer Variability

PubMed Central

Younes, Magdy; Raneri, Jill; Hanly, Patrick

2016-01-01

Study Objectives: To determine the reasons for inter-scorer variability in sleep staging of polysomnograms (PSGs). Methods: Fifty-six PSGs were scored (5-stage sleep scoring) by 2 experienced technologists, (first manual, M1). Months later, the technologists edited their own scoring (second manual, M2) based upon feedback from the investigators that highlighted differences between their scoring. The PSGs were then scored with an automatic system (Auto) and the technologists edited them, epoch-by-epoch (Edited-Auto). This resulted in 6 different manual scores for each PSG. Epochs were classified as scorer errors (one M1 score differed from the other 5 scores), scorer bias (all 3 scores of each technologist were similar, but differed from the other technologist) and equivocal (sleep scoring was inconsistent within and between technologists). Results: Percent agreement after M1 was 78.9% ± 9.0% and was unchanged after M2 (78.1% ± 9.7%) despite numerous edits (≈40/PSG) by the scorers. Agreement in Edited-Auto was higher (86.5% ± 6.4%, p < 1E−9). Scorer errors (< 2% of epochs) and scorer bias (3.5% ± 2.3% of epochs) together accounted for < 20% of M1 disagreements. A large number of epochs (92 ± 44/PSG) with scoring agreement in M1 were subsequently changed in M2 and/or Edited-Auto. Equivocal epochs, which showed scoring inconsistency, accounted for 28% ± 12% of all epochs, and up to 76% of all epochs in individual patients. Disagreements were largely between awake/NREM, N1/N2, and N2/N3 sleep. Conclusion: Inter-scorer variability is largely due to epochs that are difficult to classify. Availability of digitally identified events (e.g., spindles) or calculated variables (e.g., depth of sleep, delta wave duration) during scoring may greatly reduce scoring variability. Citation: Younes M, Raneri J, Hanly P. Staging sleep in polysomnograms: analysis of inter-scorer variability. J Clin Sleep Med 2016;12(6):885–894. PMID:27070243
Prediction of drug synergy in cancer using ensemble-based machine learning techniques

NASA Astrophysics Data System (ADS)

Singh, Harpreet; Rana, Prashant Singh; Singh, Urvinder

2018-04-01

Drug synergy prediction plays a significant role in the medical field for inhibiting specific cancer agents. It can be developed as a pre-processing tool for therapeutic successes. Examination of different drug-drug interaction can be done by drug synergy score. It needs efficient regression-based machine learning approaches to minimize the prediction errors. Numerous machine learning techniques such as neural networks, support vector machines, random forests, LASSO, Elastic Nets, etc., have been used in the past to realize requirement as mentioned above. However, these techniques individually do not provide significant accuracy in drug synergy score. Therefore, the primary objective of this paper is to design a neuro-fuzzy-based ensembling approach. To achieve this, nine well-known machine learning techniques have been implemented by considering the drug synergy data. Based on the accuracy of each model, four techniques with high accuracy are selected to develop ensemble-based machine learning model. These models are Random forest, Fuzzy Rules Using Genetic Cooperative-Competitive Learning method (GFS.GCCL), Adaptive-Network-Based Fuzzy Inference System (ANFIS) and Dynamic Evolving Neural-Fuzzy Inference System method (DENFIS). Ensembling is achieved by evaluating the biased weighted aggregation (i.e. adding more weights to the model with a higher prediction score) of predicted data by selected models. The proposed and existing machine learning techniques have been evaluated on drug synergy score data. The comparative analysis reveals that the proposed method outperforms others in terms of accuracy, root mean square error and coefficient of correlation.
Prevalence of neurocognitive and balance deficits in collegiate aged football players without clinically diagnosed concussion.

PubMed

Mulligan, Ivan; Boland, Mark; Payette, Justin

2012-07-01

Prospective cohort. To identify the prevalence of neurocognitive and balance deficits in collegiate football players 48 hours following competition. Neurocognitive testing, balance assessments, and subjective report of symptoms are a commonly used test battery in examining athletes when concussion is suspected. Previous literature suggests many concussions go unreported. Little research exists examining the prevalence of neurocognitive or balance deficits in athletes who do not report concussion-like symptoms to a health care provider. Forty-five Division IA collegiate football players participated in this study. Preseason baseline scores using the Balance Error Scoring System, the Immediate Post-Concussion Assessment and Cognitive Testing, and the Postconcussion Symptom Scale were compared to posttest results obtained 48 hours following a game. Prevalence of symptoms was analyzed and reported. Thirty-two (71%) of the 45 athletes tested demonstrated at least 1 deficit in either the Postconcussion Symptom Scale, Balance Error Scoring System, or at least 1 composite score of the Immediate Post-Concussion Assessment and Cognitive Testing. Nineteen of the 32 subjects demonstrated a change in 2 or more categories of neurocognitive and balance function. In a cohort of football players tested 48 hours following their last game of the season, who did not seek medical attention related to a concussion, a significant number demonstrated limitations in neurocognitive and balance performance, suggesting that further research may need to be performed to improve recognition of an athlete's deficits and to improve the ability to assess concussion. Differential diagnosis/symptom prevalence, level 3b.
Hypothesis Testing Using Factor Score Regression

PubMed Central

Devlieger, Ines; Mayer, Axel; Rosseel, Yves

2015-01-01

In this article, an overview is given of four methods to perform factor score regression (FSR), namely regression FSR, Bartlett FSR, the bias avoiding method of Skrondal and Laake, and the bias correcting method of Croon. The bias correcting method is extended to include a reliable standard error. The four methods are compared with each other and with structural equation modeling (SEM) by using analytic calculations and two Monte Carlo simulation studies to examine their finite sample characteristics. Several performance criteria are used, such as the bias using the unstandardized and standardized parameterization, efficiency, mean square error, standard error bias, type I error rate, and power. The results show that the bias correcting method, with the newly developed standard error, is the only suitable alternative for SEM. While it has a higher standard error bias than SEM, it has a comparable bias, efficiency, mean square error, power, and type I error rate. PMID:29795886
Learning curves and impact of previous operative experience on performance on a virtual reality simulator to test laparoscopic surgical skills.

PubMed

Grantcharov, Teodor P; Bardram, Linda; Funch-Jensen, Peter; Rosenberg, Jacob

2003-02-01

The study was carried out to analyze the learning rate for laparoscopic skills on a virtual reality training system and to establish whether the simulator was able to differentiate between surgeons with different laparoscopic experience. Forty-one surgeons were divided into three groups according to their experience in laparoscopic surgery: masters (group 1, performed more than 100 cholecystectomies), intermediates (group 2, between 15 and 80 cholecystectomies), and beginners (group 3, fewer than 10 cholecystectomies) were included in the study. The participants were tested on the Minimally Invasive Surgical Trainer-Virtual Reality (MIST-VR) 10 consecutive times within a 1-month period. Assessment of laparoscopic skills included time, errors, and economy of hand movement, measured by the simulator. The learning curves regarding time reached plateau after the second repetition for group 1, the fifth repetition for group 2, and the seventh repetition for group 3 (Friedman's tests P <0.05). Experienced surgeons did not improve their error or economy of movement scores (Friedman's tests, P >0.2) indicating the absence of a learning curve for these parameters. Group 2 error scores reached plateau after the first repetition, and group 3 after the fifth repetition. Group 2 improved their economy of movement score up to the third repetition and group 3 up to the sixth repetition (Friedman's tests, P <0.05). Experienced surgeons (group 1) demonstrated best performance parameters, followed by group 2 and group 3 (Mann-Whitney test P <0.05). Different learning curves existed for surgeons with different laparoscopic background. The familiarization rate on the simulator was proportional to the operative experience of the surgeons. Experienced surgeons demonstrated best laparoscopic performance on the simulator, followed by those with intermediate experience and the beginners. These differences indicate that the scoring system of MIST-VR is sensitive and specific to measuring skills relevant for laparoscopic surgery.
Stabilizing Conditional Standard Errors of Measurement in Scale Score Transformations

ERIC Educational Resources Information Center

Moses, Tim; Kim, YoungKoung

2017-01-01

The focus of this article is on scale score transformations that can be used to stabilize conditional standard errors of measurement (CSEMs). Three transformations for stabilizing the estimated CSEMs are reviewed, including the traditional arcsine transformation, a recently developed general variance stabilization transformation, and a new method…
Alternative Matching Scores to Control Type I Error of the Mantel-Haenszel Procedure for DIF in Dichotomously Scored Items Conforming to 3PL IRT and Nonparametric 4PBCB Models

ERIC Educational Resources Information Center

Monahan, Patrick O.; Ankenmann, Robert D.

2010-01-01

When the matching score is either less than perfectly reliable or not a sufficient statistic for determining latent proficiency in data conforming to item response theory (IRT) models, Type I error (TIE) inflation may occur for the Mantel-Haenszel (MH) procedure or any differential item functioning (DIF) procedure that matches on summed-item…
Driving error and anxiety related to iPod mp3 player use in a simulated driving experience.

PubMed

Harvey, Ashley R; Carden, Randy L

2009-08-01

Driver distraction due to cellular phone usage has repeatedly been shown to increase the risk of vehicular accidents; however, the literature regarding the use of other personal electronic devices while driving is relatively sparse. It was hypothesized that the usage of an mp3 player would result in an increase in not only driving error while operating a driving simulator, but driver anxiety scores as well. It was also hypothesized that anxiety scores would be positively related to driving errors when using an mp3 player. 32 participants drove through a set course in a driving simulator twice, once with and once without an iPod mp3 player, with the order counterbalanced. Number of driving errors per course, such as leaving the road, impacts with stationary objects, loss of vehicular control, etc., and anxiety were significantly higher when an iPod was in use. Anxiety scores were unrelated to number of driving errors.
The association between frequency of self-reported medical errors and anesthesia trainee supervision: a survey of United States anesthesiology residents-in-training.

PubMed

De Oliveira, Gildasio S; Rahmani, Rod; Fitzgerald, Paul C; Chang, Ray; McCarthy, Robert J

2013-04-01

Poor supervision of physician trainees can be detrimental not only to resident education but also to patient care and safety. Inadequate supervision has been associated with more frequent deaths of patients under the care of junior residents. We hypothesized that residents reporting more medical errors would also report lower quality of supervision scores than the ones with lower reported medical errors. The primary objective of this study was to evaluate the association between the frequency of medical errors reported by residents and their perceived quality of faculty supervision. A cross-sectional nationwide survey was sent to 1000 residents randomly selected from anesthesiology training departments across the United States. Residents from 122 residency programs were invited to participate, the median (interquartile range) per institution was 7 (4-11). Participants were asked to complete a survey assessing demography, perceived quality of faculty supervision, and perceived causes of inadequate perceived supervision. Responses to the statements "I perform procedures for which I am not properly trained," "I make mistakes that have negative consequences for the patient," and "I have made a medication error (drug or incorrect dose) in the last year" were used to assess error rates. Average supervision scores were determined using the De Oliveira Filho et al. scale and compared among the frequency of self-reported error categories using the Kruskal-Wallis test. Six hundred four residents responded to the survey (60.4%). Forty-five (7.5%) of the respondents reported performing procedures for which they were not properly trained, 24 (4%) reported having made mistakes with negative consequences to patients, and 16 (3%) reported medication errors in the last year having occurred multiple times or often. Supervision scores were inversely correlated with the frequency of reported errors for all 3 questions evaluating errors. At a cutoff value of 3, supervision scores demonstrated an overall accuracy (area under the curve) (99% confidence interval) of 0.81 (0.73-0.86), 0.89 (0.77-0.95), and 0.93 (0.77-0.98) for predicting a response of multiple times or often to the question of performing procedures for which they were not properly trained, reported mistakes with negative consequences to patients, and reported medication errors in the last year, respectively. Anesthesiology trainees who reported a greater incidence of medical errors with negative consequences to patients and drug errors also reported lower scores for supervision by faculty. Our findings suggest that further studies of the association between supervision and patient safety are warranted. (Anesth Analg 2013;116:892-7).
The corneal transplant score: a simple corneal graft candidate calculator.

PubMed

Rosenfeld, Eldar; Varssano, David

2013-07-01

Shortage of corneas for transplantation has created long waiting lists in most countries. Transplant calculators are available for many organs. The purpose of this study is to describe a simple automatic scoring system for keratoplasty recipient candidates, based on several parameters that we consider most relevant for tissue allocation, and to compare the system's accuracy in predicting decisions made by a cornea specialist. Twenty pairs of candidate data were randomly created on an electronic spreadsheet. A single priority score was computed from the data of each candidate. A cornea surgeon and the automated system then decided independently which candidate in each pair should have surgery if only a single cornea was available. The scoring system can calculate values between 0 (lowest priority) and 18 (highest priority) for each candidate. Average score value in our randomly created cohort was 6.35 ± 2.38 (mean ± SD), range 1.28 to 10.76. Average score difference between the candidates in each pair was 3.12 ± 2.10, range 0.08 to 8.45. The manual scoring process, although theoretical, was mentally and emotionally demanding for the surgeon. Agreement was achieved between the human decision and the calculated value in 19 of 20 pairs. Disagreement was reached in the pair with the lowest score difference (0.08). With worldwide donor cornea shortage, waiting for transplantation can be long. Manual sorting of priority for transplantation in a long waiting list is difficult, time-consuming and prone to error. The suggested system may help achieve a justified distribution of available tissue.
Targeting safety improvements through identification of incident origination and detection in a near-miss incident learning system.

PubMed

Novak, Avrey; Nyflot, Matthew J; Ermoian, Ralph P; Jordan, Loucille E; Sponseller, Patricia A; Kane, Gabrielle M; Ford, Eric C; Zeng, Jing

2016-05-01

Radiation treatment planning involves a complex workflow that has multiple potential points of vulnerability. This study utilizes an incident reporting system to identify the origination and detection points of near-miss errors, in order to guide their departmental safety improvement efforts. Previous studies have examined where errors arise, but not where they are detected or applied a near-miss risk index (NMRI) to gauge severity. From 3/2012 to 3/2014, 1897 incidents were analyzed from a departmental incident learning system. All incidents were prospectively reviewed weekly by a multidisciplinary team and assigned a NMRI score ranging from 0 to 4 reflecting potential harm to the patient (no potential harm to potential critical harm). Incidents were classified by point of incident origination and detection based on a 103-step workflow. The individual steps were divided among nine broad workflow categories (patient assessment, imaging for radiation therapy (RT) planning, treatment planning, pretreatment plan review, treatment delivery, on-treatment quality management, post-treatment completion, equipment/software quality management, and other). The average NMRI scores of incidents originating or detected within each broad workflow area were calculated. Additionally, out of 103 individual process steps, 35 were classified as safety barriers, the process steps whose primary function is to catch errors. The safety barriers which most frequently detected incidents were identified and analyzed. Finally, the distance between event origination and detection was explored by grouping events by the number of broad workflow area events passed through before detection, and average NMRI scores were compared. Near-miss incidents most commonly originated within treatment planning (33%). However, the incidents with the highest average NMRI scores originated during imaging for RT planning (NMRI = 2.0, average NMRI of all events = 1.5), specifically during the documentation of patient positioning and localization of the patient. Incidents were most frequently detected during treatment delivery (30%), and incidents identified at this point also had higher severity scores than other workflow areas (NMRI = 1.6). Incidents identified during on-treatment quality management were also more severe (NMRI = 1.7), and the specific process steps of reviewing portal and CBCT images tended to catch highest-severity incidents. On average, safety barriers caught 46% of all incidents, most frequently at physics chart review, therapist's chart check, and the review of portal images; however, most of the incidents that pass through a particular safety barrier are not designed to be capable of being captured at that barrier. Incident learning systems can be used to assess the most common points of error origination and detection in radiation oncology. This can help tailor safety improvement efforts and target the highest impact portions of the workflow. The most severe near-miss events tend to originate during simulation, with the most severe near-miss events detected at the time of patient treatment. Safety barriers can be improved to allow earlier detection of near-miss events.
Minimally important change, measurement error, and responsiveness for the Self-Reported Foot and Ankle Score

PubMed Central

Cöster, Maria C; Nilsdotter, Anna; Brudin, Lars; Bremander, Ann

2017-01-01

Background and purpose Patient-reported outcome measures (PROMs) are increasingly used to evaluate results in orthopedic surgery. To enhance good responsiveness with a PROM, the minimally important change (MIC) should be established. MIC reflects the smallest measured change in score that is perceived as being relevant by the patients. We assessed MIC for the Self-reported Foot and Ankle Score (SEFAS) used in Swedish national registries. Patients and methods Patients with forefoot disorders (n = 83) or hindfoot/ankle disorders (n = 80) completed the SEFAS before surgery and 6 months after surgery. At 6 months also, a patient global assessment (PGA) scale—as external criterion—was completed. Measurement error was expressed as the standard error of a single determination. MIC was calculated by (1) median change scores in improved patients on the PGA scale, and (2) the best cutoff point (BCP) and area under the curve (AUC) using analysis of receiver operating characteristic curves (ROCs). Results The change in mean summary score was the same, 9 (SD 9), in patients with forefoot disorders and in patients with hindfoot/ankle disorders. MIC for SEFAS in the total sample was 5 score points (IQR: 2–8) and the measurement error was 2.4. BCP was 5 and AUC was 0.8 (95% CI: 0.7–0.9). Interpretation As previously shown, SEFAS has good responsiveness. The score change in SEFAS 6 months after surgery should exceed 5 score points in both forefoot patients and hindfoot/ankle patients to be considered as being clinically relevant. PMID:28464751
Development and validation of a composite scoring system for robot-assisted surgical training--the Robotic Skills Assessment Score.

PubMed

Chowriappa, Ashirwad J; Shi, Yi; Raza, Syed Johar; Ahmed, Kamran; Stegemann, Andrew; Wilding, Gregory; Kaouk, Jihad; Peabody, James O; Menon, Mani; Hassett, James M; Kesavadas, Thenkurussi; Guru, Khurshid A

2013-12-01

A standardized scoring system does not exist in virtual reality-based assessment metrics to describe safe and crucial surgical skills in robot-assisted surgery. This study aims to develop an assessment score along with its construct validation. All subjects performed key tasks on previously validated Fundamental Skills of Robotic Surgery curriculum, which were recorded, and metrics were stored. After an expert consensus for the purpose of content validation (Delphi), critical safety determining procedural steps were identified from the Fundamental Skills of Robotic Surgery curriculum and a hierarchical task decomposition of multiple parameters using a variety of metrics was used to develop Robotic Skills Assessment Score (RSA-Score). Robotic Skills Assessment mainly focuses on safety in operative field, critical error, economy, bimanual dexterity, and time. Following, the RSA-Score was further evaluated for construct validation and feasibility. Spearman correlation tests performed between tasks using the RSA-Scores indicate no cross correlation. Wilcoxon rank sum tests were performed between the two groups. The proposed RSA-Score was evaluated on non-robotic surgeons (n = 15) and on expert-robotic surgeons (n = 12). The expert group demonstrated significantly better performance on all four tasks in comparison to the novice group. Validation of the RSA-Score in this study was carried out on the Robotic Surgical Simulator. The RSA-Score is a valid scoring system that could be incorporated in any virtual reality-based surgical simulator to achieve standardized assessment of fundamental surgical tents during robot-assisted surgery. Copyright © 2013 Elsevier Inc. All rights reserved.

A biometric identification system based on eigenpalm and eigenfinger features.

PubMed

Ribaric, Slobodan; Fratric, Ivan

2005-11-01

This paper presents a multimodal biometric identification system based on the features of the human hand. We describe a new biometric approach to personal identification using eigenfinger and eigenpalm features, with fusion applied at the matching-score level. The identification process can be divided into the following phases: capturing the image; preprocessing; extracting and normalizing the palm and strip-like finger subimages; extracting the eigenpalm and eigenfinger features based on the K-L transform; matching and fusion; and, finally, a decision based on the (k, l)-NN classifier and thresholding. The system was tested on a database of 237 people (1,820 hand images). The experimental results showed the effectiveness of the system in terms of the recognition rate (100 percent), the equal error rate (EER = 0.58 percent), and the total error rate (TER = 0.72 percent).
Cue-based assertion classification for Swedish clinical text – developing a lexicon for pyConTextSwe

PubMed Central

Velupillai, Sumithra; Skeppstedt, Maria; Kvist, Maria; Mowery, Danielle; Chapman, Brian E.; Dalianis, Hercules; Chapman, Wendy W.

2014-01-01

Objective The ability of a cue-based system to accurately assert whether a disorder is affirmed, negated, or uncertain is dependent, in part, on its cue lexicon. In this paper, we continue our study of porting an assertion system (pyConTextNLP) from English to Swedish (pyConTextSwe) by creating an optimized assertion lexicon for clinical Swedish. Methods and material We integrated cues from four external lexicons, along with generated inflections and combinations. We used subsets of a clinical corpus in Swedish. We applied four assertion classes (definite existence, probable existence, probable negated existence and definite negated existence) and two binary classes (existence yes/no and uncertainty yes/no) to pyConTextSwe. We compared pyConTextSwe’s performance with and without the added cues on a development set, and improved the lexicon further after an error analysis. On a separate evaluation set, we calculated the system’s final performance. Results Following integration steps, we added 454 cues to pyConTextSwe. The optimized lexicon developed after an error analysis resulted in statistically significant improvements on the development set (83% F-score, overall). The system’s final F-scores on an evaluation set were 81% (overall). For the individual assertion classes, F-score results were 88% (definite existence), 81% (probable existence), 55% (probable negated existence), and 63% (definite negated existence). For the binary classifications existence yes/no and uncertainty yes/no, final system performance was 97%/87% and 78%/86% F-score, respectively. Conclusions We have successfully ported pyConTextNLP to Swedish (pyConTextSwe). We have created an extensive and useful assertion lexicon for Swedish clinical text, which could form a valuable resource for similar studies, and which is publicly available. PMID:24556644
Basic Diagnosis and Prediction of Persistent Contrail Occurrence using High-resolution Numerical Weather Analyses/Forecasts and Logistic Regression. Part I: Effects of Random Error

NASA Technical Reports Server (NTRS)

Duda, David P.; Minnis, Patrick

2009-01-01

Straightforward application of the Schmidt-Appleman contrail formation criteria to diagnose persistent contrail occurrence from numerical weather prediction data is hindered by significant bias errors in the upper tropospheric humidity. Logistic models of contrail occurrence have been proposed to overcome this problem, but basic questions remain about how random measurement error may affect their accuracy. A set of 5000 synthetic contrail observations is created to study the effects of random error in these probabilistic models. The simulated observations are based on distributions of temperature, humidity, and vertical velocity derived from Advanced Regional Prediction System (ARPS) weather analyses. The logistic models created from the simulated observations were evaluated using two common statistical measures of model accuracy, the percent correct (PC) and the Hanssen-Kuipers discriminant (HKD). To convert the probabilistic results of the logistic models into a dichotomous yes/no choice suitable for the statistical measures, two critical probability thresholds are considered. The HKD scores are higher when the climatological frequency of contrail occurrence is used as the critical threshold, while the PC scores are higher when the critical probability threshold is 0.5. For both thresholds, typical random errors in temperature, relative humidity, and vertical velocity are found to be small enough to allow for accurate logistic models of contrail occurrence. The accuracy of the models developed from synthetic data is over 85 percent for both the prediction of contrail occurrence and non-occurrence, although in practice, larger errors would be anticipated.
Development of Web-Based Examination System Using Open Source Programming Model

ERIC Educational Resources Information Center

Abass, Olalere A.; Olajide, Samuel A.; Samuel, Babafemi O.

2017-01-01

The traditional method of assessment (examination) is often characterized by examination questions leakages, human errors during marking of scripts and recording of scores. The technological advancement in the field of computer science has necessitated the need for computer usage in majorly all areas of human life and endeavors, education sector…
Concussion History and Time Since Concussion Do not Influence Static and Dynamic Balance in Collegiate Athletes.

PubMed

Merritt, Eric D; Brown, Cathleen N; Queen, Robin M; Simpson, Kathy J; Schmidt, Julianne D

2017-11-01

Dynamic balance deficits exist following a concussion, sometimes years after injury. However, clinicians lack practical tools for assessing dynamic balance. To determine if there are significant differences in static and dynamic balance performance between individuals with and without a history of concussion. Cross sectional. Clinical research laboratory. 45 collegiate student-athletes with a history of concussion (23 males, 22 females; age = 20.0 ± 1.4 y; height = 175.8 ± 11.6 cm; mass = 76.4 ± 19.2 kg) and 45 matched controls with no history of concussion (23 males, 22 females; age = 20.0 ± 1.3 y; height = 178.8 ± 13.2 cm; mass = 75.7 ± 18.2 kg). Participants completed a static (Balance Error Scoring System) and dynamic (Y Balance Test-Lower Quarter) balance assessment. A composite score was calculated from the mean normalized Y Balance Test-Lower Quarter reach distances. Firm, foam, and overall errors were counted during the Balance Error Scoring System by a single reliable rater. One-way ANOVAs were used to compare balance performance between groups. Pearson's correlations were performed to determine the relationship between the time since the most recent concussion and balance performance. A Bonferonni adjusted a priori α < 0.025 was used for all analyses. Static and dynamic balance performance did not significantly differ between groups. No significant correlation was found between the time since the most recent concussion and balance performance. Collegiate athletes with a history of concussion do not present with static or dynamic balance deficits when measured using clinical assessments. More research is needed to determine whether the Y Balance Test-Lower Quarter is sensitive to acute balance deficits following concussion.
Muscle Strength and Qualitative Jump-Landing Differences in Male and Female Military Cadets: The Jump-ACL Study

PubMed Central

Beutler, Anthony I.; de la Motte, Sarah J.; Marshall, Stephen W.; Padua, Darin A.; Boden, Barry P.

2009-01-01

Recent studies have focused on gender differences in movement patterns as risk factors for ACL injury. Understanding intrinsic and extrinsic factors which contribute to movement patterns is critical to ACL injury prevention efforts. Isometric lower- extremity muscular strength, anthropometrics, and jump-landing technique were analyzed for 2,753 cadets (1,046 female, 1,707 male) from the U.S. Air Force, Military and Naval Academies. Jump- landings were evaluated using the Landing Error Scoring System (LESS), a valid qualitative movement screening tool. We hypothesized that distinct anthropometric factors (Q-angle, navicular drop, bodyweight) and muscle strength would predict poor jump-landing technique in males versus females, and that female cadets would have higher scores (more errors) on a qualitative movement screen (LESS) than males. Mean LESS scores were significantly higher in female (5.34 ± 1.51) versus male (4.65 ± 1.69) cadets (p < 0.001). Qualitative movement scores were analyzed using factor analyses, yielding five factors, or “patterns”, contributing to poor landing technique. Females were significantly more likely to have poor technique due to landing with less hip and knee flexion at initial contact (p < 0.001), more knee valgus with wider landing stance (p < 0. 001), and less flexion displacement over the entire landing (p < 0.001). Males were more likely to have poor technique due to landing toe-out (p < 0.001), with heels first, and with an asymmetric foot landing (p < 0.001). Many of the identified factor patterns have been previously proposed to contribute to ACL injury risk. However, univariate and multivariate analyses of muscular strength and anthropometric factors did not strongly predict LESS scores for either gender, suggesting that changing an athlete’s alignment, BMI, or muscle strength may not directly improve his or her movement patterns. Key points Important differences in male and female landing technique can be captured using a qualitative movement screen: the Landing Error Scoring System (LESS). Female cadets were more likely to land with shallow sagittal flexion, wide stance width, and more pronounced knee flexion. Male cadets were more likely to exhibit a heel-strike or asymmetric foot-strike and to land with toe out. Lower extremity muscle strength, Q-angle, and navicular drop do not significantly predict landing movement pattern in male or female cadets. PMID:21132103
Landing Error Scoring System Differences Between Single-Sport and Multi-Sport Female High School-Aged Athletes.

PubMed

Beese, Mark E; Joy, Elizabeth; Switzler, Craig L; Hicks-Little, Charlie A

2015-08-01

Single-sport specialization (SSS) is becoming more prevalent in youth athletes. Deficits in functional movement have been shown to predispose athletes to injury. It is unclear whether a link exists between SSS and the development of functional movement deficits that predispose SSS athletes to an increased risk of knee injury. To determine whether functional movement deficits exist in SSS athletes compared with multi-sport (M-S) athletes. Cross-sectional study. Soccer practice fields. A total of 40 (21 SSS [age = 15.05 ± 1.2 years], 19 M-S [age = 15.32 ± 1.2 years]) female high school athlete volunteers were recruited through local soccer clubs. All SSS athletes played soccer. Participants were grouped into 2 categories: SSS and M-S. All participants completed 3 trials of the standard Landing Error Scoring System (LESS) jump-landing task. They performed a double-legged jump from a 30-cm platform, landing on a rubber mat at a distance of half their body height. Upon landing, participants immediately performed a maximal vertical jump. Values were assigned to each trial using the LESS scoring criteria. We averaged the 3 scored trials and then used a Mann-Whitney U test to test for differences between groups. Participant scores from the jump-landing assessment for each group were also placed into the 4 defined LESS categories for group comparison using a Pearson χ(2) test. The α level was set a priori at .05. Mean scores were 6.84 ± 1.81 for the SSS group and 6.07 ± 1.93 for the M-S group. We observed no differences between groups (z = -1.44, P = .15). A Pearson χ(2) analysis revealed that the proportions of athletes classified as having excellent, good, moderate, or poor LESS scores were not different between the SSS and M-S groups ([Formula: see text] = 1.999, P = .57). Participation in soccer alone compared with multiple sports did not affect LESS scores in adolescent female soccer players. However, the LESS scores indicated that most female adolescent athletes may be at an increased risk for knee injury, regardless of the number of sports played.
Systematic review of the evidence for Trails B cut-off scores in assessing fitness-to-drive

PubMed Central

Roy, Mononita; Molnar, Frank

2013-01-01

Background Fitness-to-drive guidelines recommend employing the Trail Making B Test (a.k.a. Trails B), but do not provide guidance regarding cut-off scores. There is ongoing debate regarding the optimal cut-off score on the Trails B test. The objective of this study was to address this controversy by systematically reviewing the evidence for specific Trails B cut-off scores (e.g., cut-offs in both time to completion and number of errors) with respect to fitness-to-drive. Methods Systematic review of all prospective cohort, retrospective cohort, case-control, correlation, and cross-sectional studies reporting the ability of the Trails B to predict driving safety that were published in English-language, peer-reviewed journals. Results Forty-seven articles were reviewed. None of the articles justified sample sizes via formal calculations. Cut-off scores reported based on research include: 90 seconds, 133 seconds, 147 seconds, 180 seconds, and < 3 errors. Conclusions There is support for the previously published Trails B cut-offs of 3 minutes or 3 errors (the ‘3 or 3 rule’). Major methodological limitations of this body of research were uncovered including (1) lack of justification of sample size leaving studies open to Type II error (i.e., false negative findings), and (2) excessive focus on associations rather than clinically useful cut-off scores. PMID:23983828
Comment on 3PL IRT Adjustment for Guessing

ERIC Educational Resources Information Center

Chiu, Ting-Wei; Camilli, Gregory

2013-01-01

Guessing behavior is an issue discussed widely with regard to multiple choice tests. Its primary effect is on number-correct scores for examinees at lower levels of proficiency. This is a systematic error or bias, which increases observed test scores. Guessing also can inflate random error variance. Correction or adjustment for guessing formulas…
An Effectiveness Index and Profile for Instructional Media.

ERIC Educational Resources Information Center

Bond, Jack H.

A scale was developed for judging the relative value of various media in teaching children. Posttest scores were partitioned into several components: error, prior knowledge, guessing, and gain from the learning exercise. By estimating the amounts of prior knowledge, guessing, and error, and then subtracting these from the total score, an index of…
Nonparametric Item Response Curve Estimation with Correction for Measurement Error

ERIC Educational Resources Information Center

Guo, Hongwen; Sinharay, Sandip

2011-01-01

Nonparametric or kernel regression estimation of item response curves (IRCs) is often used in item analysis in testing programs. These estimates are biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. Accuracy of this estimation is a concern theoretically and operationally.…
How Do Clinical Information Systems Affect the Cognitive Demands of General Practitioners?: Usability Study with a Focus on Cognitive Workload.

PubMed

Ariza, Ferran; Kalra, Dipak; Potts, Henry Ww

2015-11-20

Clinical information systems in the National Health Service do not need to conform to any explicit usability requirements. Poor usability can increase the mental workload experienced by clinicians and cause fatigue, increase error rates and impact the overall patient safety. Mental workload can be used as a measure of usability. To assess the subjective cognitive workload experienced by general practitioners (GPs) with their systems. To raise awareness of the importance of usability in system design among users, designers, developers and policymakers. We used a modified version of the NASA Task Load Index, adapted for web. We developed a set of common clinical scenarios and computer tasks on an online survey. We emailed the study link to 199 clinical commissioning groups and 1,646 GP practices in England. Sixty-seven responders completed the survey. The respondents had spent an average of 17 years in general practice, had experience of using a mean of 1.5 GP computer systems and had used their current system for a mean time of 6.7 years. The mental workload score was not different among systems. There were significant differences among the task scores, but these differences were not specific to particular systems. The overall score and task scores were related to the length of experience with their present system. Four tasks imposed a higher mental workload on GPs: 'repeat prescribing', 'find episode', 'drug management' and 'overview records'. Further usability studies on GP systems should focus on these tasks. Users, policymakers, designers and developers should remain aware of the importance of usability in system design.What does this study add?• Current GP systems in England do not need to conform to explicit usability requirements. Poor usability can increase the mental workload of clinicians and lead to errors.• Some clinical computer tasks incur more cognitive workload than others and should be considered carefully during the design of a system.• GPs did not report overall very high levels of subjective cognitive workload when undertaking common clinical tasks with their systems.• Further usability studies on GP systems should focus on the tasks incurring higher cognitive workload.• Users, policymakers, and designers and developers should remain aware of the importance of usability in system design.
Basecalling with LifeTrace

PubMed Central

Walther, Dirk; Bartha, Gábor; Morris, Macdonald

2001-01-01

A pivotal step in electrophoresis sequencing is the conversion of the raw, continuous chromatogram data into the actual sequence of discrete nucleotides, a process referred to as basecalling. We describe a novel algorithm for basecalling implemented in the program LifeTrace. Like Phred, currently the most widely used basecalling software program, LifeTrace takes processed trace data as input. It was designed to be tolerant to variable peak spacing by means of an improved peak-detection algorithm that emphasizes local chromatogram information over global properties. LifeTrace is shown to generate high-quality basecalls and reliable quality scores. It proved particularly effective when applied to MegaBACE capillary sequencing machines. In a benchmark test of 8372 dye-primer MegaBACE chromatograms, LifeTrace generated 17% fewer substitution errors, 16% fewer insertion/deletion errors, and 2.4% more aligned bases to the finished sequence than did Phred. For two sets totaling 6624 dye-terminator chromatograms, the performance improvement was 15% fewer substitution errors, 10% fewer insertion/deletion errors, and 2.1% more aligned bases. The processing time required by LifeTrace is comparable to that of Phred. The predicted quality scores were in line with observed quality scores, permitting direct use for quality clipping and in silico single nucleotide polymorphism (SNP) detection. Furthermore, we introduce a new type of quality score associated with every basecall: the gap-quality. It estimates the probability of a deletion error between the current and the following basecall. This additional quality score improves detection of single basepair deletions when used for locating potential basecalling errors during the alignment. We also describe a new protocol for benchmarking that we believe better discerns basecaller performance differences than methods previously published. PMID:11337481
An examination of the interrater reliability between practitioners and researchers on the static-99.

PubMed

Quesada, Stephen P; Calkins, Cynthia; Jeglic, Elizabeth L

2014-11-01

Many studies have validated the psychometric properties of the Static-99, the most widely used measure of sexual offender recidivism risk. However much of this research relied on instrument coding completed by well-trained researchers. This study is the first to examine the interrater reliability (IRR) of the Static-99 between practitioners in the field and researchers. Using archival data from a sample of 1,973 formerly incarcerated sex offenders, field raters' scores on the Static-99 were compared with those of researchers. Overall, clinicians and researchers had excellent IRR on Static-99 total scores, with IRR coefficients ranging from "substantial" to "outstanding" for the individual 10 items of the scale. The most common causes of discrepancies were coding manual errors, followed by item subjectivity, inaccurate item scoring, and calculation errors. These results offer important data with regard to the frequency and perceived nature of scoring errors. © The Author(s) 2013.
Verbal Serial List Learning in Mild Cognitive Impairment: A Profile Analysis of Interference, Forgetting, and Errors

PubMed Central

Libon, David J.; Bondi, Mark W.; Price, Catherine C.; Lamar, Melissa; Eppig, Joel; Wambach, Denene M.; Nieves, Christine; Delano-Wood, Lisa; Giovannetti, Tania; Lippa, Carol; Kabasakalian, Anahid; Cosentino, Stephanie; Swenson, Rod; Penney, Dana L.

2012-01-01

Using cluster analysis Libon et al. (2010) found three verbal serial list-learning profiles involving delay memory test performance in patients with mild cognitive impairment (MCI). Amnesic MCI (aMCI) patients presented with low scores on delay free recall and recognition tests; mixed MCI (mxMCI) patients scored higher on recognition compared to delay free recall tests; and dysexecutive MCI (dMCI) patients generated relatively intact scores on both delay test conditions. The aim of the current research was to further characterize memory impairment in MCI by examining forgetting/savings, interference from a competing word list, intrusion errors/perseverations, intrusion word frequency, and recognition foils in these three statistically determined MCI groups compared to normal control (NC) participants. The aMCI patients exhibited little savings, generated more highly prototypic intrusion errors, and displayed indiscriminate responding to delayed recognition foils. The mxMCI patients exhibited higher saving scores, fewer and less prototypic intrusion errors, and selectively endorsed recognition foils from the interference list. dMCI patients also selectively endorsed recognition foils from the interference list but performed similarly compared to NC participants. These data suggest the existence of distinct memory impairments in MCI and caution against the routine use of a single memory test score to operationally define MCI. PMID:21880171
Reliability, Validity, and Minimal Detectable Change of Balance Evaluation Systems Test and Its Short Versions in Older Cancer Survivors: A Pilot Study.

PubMed

Huang, Min H; Miller, Kara; Smith, Kristin; Fredrickson, Kayle; Shilling, Tracy

2016-01-01

Cancer is primarily a disease of older adults. About 77% of all cancers are diagnosed in persons aged 55 years and older. Cancer and its treatment can cause diverse sequelae impacting body systems underlying balance control. No study has examined the psychometric properties of balance assessment tools in older cancer survivors, presenting a significant challenge in the selection of outcome measures for clinicians treating this fast-growing population. This study aimed to determine the reliability, validity, and minimal detectable change (MDC) of the Balance Evaluation System Test (BESTest), Mini-Balance Evaluation Systems Test (Mini-BESTest), and Brief-Balance Evaluation Systems Test (Brief-BESTest) in community-dwelling older cancer survivors. This study was a cross-sectional design. Twenty breast and 8 prostate cancer survivors participated [age (SD) = 68.4 (8.13) years]. The BESTest and Activity-specific Balance Confidence (ABC) Scale were administered during the first session. Scores of Mini-BESTest and Brief-BESTest were extracted on the basis of the scores of BESTest. The BESTest was repeated within 1 to 2 weeks by the same rater to determine the test-retest reliability. For the analysis of the inter-rater reliability, 21 participants were randomly selected to be evaluated by 2 raters. A primary rater administered the test. The 2 raters independently and concurrently scored the performance of the participants. Each rater recorded the ratings separately on the scoring sheet. No discussion among the raters was allowed throughout the testing. Intraclass correlation coefficients (ICCs), standard error of measurement, minimal detectable change (MDC), and Bland-Altman plots were calculated. Concurrent validity of these balance tests with the ABC Scale was examined using the Spearman correlation. The BESTest, Mini-BESTest, and Brief-BESTest had high test-retest (ICC = 0.90-0.94) and interrater reliability (ICC = 0.86-0.96), small standard error of measurement (0.86-2.47 points), and MDC (2.39-6.86 points). The Bland-Altman plot revealed no systematic errors. The scores of BESTest, Mini-BEST, and Brief-BEST were correlated significantly with those of ABC Scale (P < .01), supporting their concurrent validity. The BESTest, Mini-BESTest, and Brief-BESTest showed high interrater and test-retest reliability, and excellent concurrent validity with the ABC Scale for community-dwelling cancer survivors aged 55 years and older who had completed cancer treatments for at least 3 months. Future studies are necessary to determine the predictive values for determining fall risks using balance assessment tools in older cancer survivors. Clinicians can utilize the BESTest and its short versions to evaluate balance problems in community-dwelling older cancer survivors and apply the established MDC to assess the intervention outcomes.
ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data

PubMed Central

2012-01-01

Background Next-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results. Results Here we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy. Conclusion ReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration. PMID:22946927
Minimizing Interrater Variability in Staging Sleep by Use of Computer-Derived Features

PubMed Central

Younes, Magdy; Hanly, Patrick J.

2016-01-01

Study Objectives: Inter-scorer variability in sleep staging of polysomnograms (PSGs) results primarily from difficulty in determining whether: (1) an electroencephalogram pattern of wakefulness spans > 15 sec in transitional epochs, (2) spindles or K complexes are present, and (3) duration of delta waves exceeds 6 sec in a 30-sec epoch. We hypothesized that providing digitally derived information about these variables to PSG scorers may reduce inter-scorer variability. Methods: Fifty-six PSGs were scored (five-stage) by two experienced technologists, (first manual, M1). Months later, the technologists edited their own scoring (second manual, M2). PSGs were then scored with an automatic system and the same two technologists and an additional experienced technologist edited them, epoch-by-epoch (Edited-Auto). This resulted in seven manual scores for each PSG. The two M2 scores were then independently modified using digitally obtained values for sleep depth and delta duration and digitally identified spindles and K complexes. Results: Percent agreement between scorers in M2 was 78.9 ± 9.0% before modification and 96.5 ± 2.6% after. Errors of this approach were defined as a change in a manual score to a stage that was not assigned by any scorer during the seven manual scoring sessions. Total errors averaged 7.1 ± 3.7% and 6.9 ± 3.8% of epochs for scorers 1 and 2, respectively, and there was excellent agreement between the modified score and the initial manual score of each technologist. Conclusions: Providing digitally obtained information about sleep depth, delta duration, spindles and K complexes during manual scoring can greatly reduce interrater variability in sleep staging by eliminating the guesswork in scoring epochs with equivocal features. Citation: Younes M, Hanly PJ. Minimizing interrater variability in staging sleep by use of computer-derived features. J Clin Sleep Med 2016;12(10):1347–1356. PMID:27448418
Asymptotic Standard Errors for Item Response Theory True Score Equating of Polytomous Items

ERIC Educational Resources Information Center

Cher Wong, Cheow

2015-01-01

Building on previous works by Lord and Ogasawara for dichotomous items, this article proposes an approach to derive the asymptotic standard errors of item response theory true score equating involving polytomous items, for equivalent and nonequivalent groups of examinees. This analytical approach could be used in place of empirical methods like…
Some Results on Mean Square Error for Factor Score Prediction

ERIC Educational Resources Information Center

Krijnen, Wim P.

2006-01-01

For the confirmatory factor model a series of inequalities is given with respect to the mean square error (MSE) of three main factor score predictors. The eigenvalues of these MSE matrices are a monotonic function of the eigenvalues of the matrix gamma[subscript rho] = theta[superscript 1/2] lambda[subscript rho] 'psi[subscript rho] [superscript…

Standard Error Estimation of 3PL IRT True Score Equating with an MCMC Method

ERIC Educational Resources Information Center

Liu, Yuming; Schulz, E. Matthew; Yu, Lei

2008-01-01

A Markov chain Monte Carlo (MCMC) method and a bootstrap method were compared in the estimation of standard errors of item response theory (IRT) true score equating. Three test form relationships were examined: parallel, tau-equivalent, and congeneric. Data were simulated based on Reading Comprehension and Vocabulary tests of the Iowa Tests of…
Standard Error of Linear Observed-Score Equating for the NEAT Design with Nonnormally Distributed Data

ERIC Educational Resources Information Center

Zu, Jiyun; Yuan, Ke-Hai

2012-01-01

In the nonequivalent groups with anchor test (NEAT) design, the standard error of linear observed-score equating is commonly estimated by an estimator derived assuming multivariate normality. However, real data are seldom normally distributed, causing this normal estimator to be inconsistent. A general estimator, which does not rely on the…
Measurement Error in Nonparametric Item Response Curve Estimation. Research Report. ETS RR-11-28

ERIC Educational Resources Information Center

Guo, Hongwen; Sinharay, Sandip

2011-01-01

Nonparametric, or kernel, estimation of item response curve (IRC) is a concern theoretically and operationally. Accuracy of this estimation, often used in item analysis in testing programs, is biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. In this study, we investigate…
Does Wechsler Intelligence Scale administration and scoring proficiency improve during assessment training?

PubMed

Platt, Tyson L; Zachar, Peter; Ray, Glen E; Lobello, Steven G; Underhill, Andrea T

2007-04-01

Studies have found that Wechsler scale administration and scoring proficiency is not easily attained during graduate training. These findings may be related to methodological issues. Using a single-group repeated measures design, this study documents statistically significant, though modest, error reduction on the WAIS-III and WISC-III during a graduate course in assessment. The study design does not permit the isolation of training factors related to error reduction, or assessment of whether error reduction is a function of mere practice. However, the results do indicate that previous study findings of no or inconsistent improvement in scoring proficiency may have been the result of methodological factors. Implications for teaching individual intelligence testing and further research are discussed.
Rank score and permutation testing alternatives for regression quantile estimates

USGS Publications Warehouse

Cade, B.S.; Richards, J.D.; Mielke, P.W.

2006-01-01

Performance of quantile rank score tests used for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1) were evaluated by simulation for models with p = 2 and 6 predictors, moderate collinearity among predictors, homogeneous and hetero-geneous errors, small to moderate samples (n = 20–300), and central to upper quantiles (0.50–0.99). Test statistics evaluated were the conventional quantile rank score T statistic distributed as χ2 random variable with q degrees of freedom (where q parameters are constrained by H 0:) and an F statistic with its sampling distribution approximated by permutation. The permutation F-test maintained better Type I errors than the T-test for homogeneous error models with smaller n and more extreme quantiles τ. An F distributional approximation of the F statistic provided some improvements in Type I errors over the T-test for models with > 2 parameters, smaller n, and more extreme quantiles but not as much improvement as the permutation approximation. Both rank score tests required weighting to maintain correct Type I errors when heterogeneity under the alternative model increased to 5 standard deviations across the domain of X. A double permutation procedure was developed to provide valid Type I errors for the permutation F-test when null models were forced through the origin. Power was similar for conditions where both T- and F-tests maintained correct Type I errors but the F-test provided some power at smaller n and extreme quantiles when the T-test had no power because of excessively conservative Type I errors. When the double permutation scheme was required for the permutation F-test to maintain valid Type I errors, power was less than for the T-test with decreasing sample size and increasing quantiles. Confidence intervals on parameters and tolerance intervals for future predictions were constructed based on test inversion for an example application relating trout densities to stream channel width:depth.
The assessment of cognitive errors using an observer-rated method.

PubMed

Drapeau, Martin

2014-01-01

Cognitive Errors (CEs) are a key construct in cognitive behavioral therapy (CBT). Integral to CBT is that individuals with depression process information in an overly negative or biased way, and that this bias is reflected in specific depressotypic CEs which are distinct from normal information processing. Despite the importance of this construct in CBT theory, practice, and research, few methods are available to researchers and clinicians to reliably identify CEs as they occur. In this paper, the author presents a rating system, the Cognitive Error Rating Scale, which can be used by trained observers to identify and assess the cognitive errors of patients or research participants in vivo, i.e., as they are used or reported by the patients or participants. The method is described, including some of the more important rating conventions to be considered when using the method. This paper also describes the 15 cognitive errors assessed, and the different summary scores, including valence of the CEs, that can be derived from the method.
Pollen flow in the wildservice tree, Sorbus torminalis (L.) Crantz. I. Evaluating the paternity analysis procedure in continuous populations.

PubMed

Oddou-Muratorio, S; Houot, M-L; Demesure-Musch, B; Austerlitz, F

2003-12-01

The joint development of polymorphic molecular markers and paternity analysis methods provides new approaches to investigate ongoing patterns of pollen flow in natural plant populations. However, paternity studies are hindered by false paternity assignment and the nondetection of true fathers. To gauge the risk of these two types of errors, we performed a simulation study to investigate the impact on paternity analysis of: (i) the assumed values for the size of the breeding male population (NBMP), and (ii) the rate of scoring error in genotype assessment. Our simulations were based on microsatellite data obtained from a natural population of the entomophilous wild service tree, Sorbus torminalis (L.) Crantz. We show that an accurate estimate of NBMP is required to minimize both types of errors, and we assess the reliability of a technique used to estimate NBMP based on parent-offspring genetic data. We then show that scoring errors in genotype assessment only slightly affect the assessment of paternity relationships, and conclude that it is generally better to neglect the scoring error rate in paternity analyses within a nonisolated population.
Baseline Establishment Using Virtual Environment Traumatic Brain Injury Screen (VETS)

DTIC Science & Technology

2015-06-01

indicator of mTBI. Further, these results establish a baseline data set, which may be useful in comparing concussed individuals. 14. SUBJECT TERMS... Concussion , mild traumatic brain injury (mTBI), traumatic brain injury (TBI), balance, Sensory Organization Test, Balance Error Scoring System, center of...43 5.2 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . 44 Appendix A Military Acute Concussion Evaluation 47
LeadMine: a grammar and dictionary driven approach to entity recognition.

PubMed

Lowe, Daniel M; Sayle, Roger A

2015-01-01

Chemical entity recognition has traditionally been performed by machine learning approaches. Here we describe an approach using grammars and dictionaries. This approach has the advantage that the entities found can be directly related to a given grammar or dictionary, which allows the type of an entity to be known and, if an entity is misannotated, indicates which resource should be corrected. As recognition is driven by what is expected, if spelling errors occur, they can be corrected. Correcting such errors is highly useful when attempting to lookup an entity in a database or, in the case of chemical names, converting them to structures. Our system uses a mixture of expertly curated grammars and dictionaries, as well as dictionaries automatically derived from public resources. We show that the heuristics developed to filter our dictionary of trivial chemical names (from PubChem) yields a better performing dictionary than the previously published Jochem dictionary. Our final system performs post-processing steps to modify the boundaries of entities and to detect abbreviations. These steps are shown to significantly improve performance (2.6% and 4.0% F1-score respectively). Our complete system, with incremental post-BioCreative workshop improvements, achieves 89.9% precision and 85.4% recall (87.6% F1-score) on the CHEMDNER test set. Grammar and dictionary approaches can produce results at least as good as the current state of the art in machine learning approaches. While machine learning approaches are commonly thought of as "black box" systems, our approach directly links the output entities to the input dictionaries and grammars. Our approach also allows correction of errors in detected entities, which can assist with entity resolution.
LeadMine: a grammar and dictionary driven approach to entity recognition

PubMed Central

2015-01-01

Background Chemical entity recognition has traditionally been performed by machine learning approaches. Here we describe an approach using grammars and dictionaries. This approach has the advantage that the entities found can be directly related to a given grammar or dictionary, which allows the type of an entity to be known and, if an entity is misannotated, indicates which resource should be corrected. As recognition is driven by what is expected, if spelling errors occur, they can be corrected. Correcting such errors is highly useful when attempting to lookup an entity in a database or, in the case of chemical names, converting them to structures. Results Our system uses a mixture of expertly curated grammars and dictionaries, as well as dictionaries automatically derived from public resources. We show that the heuristics developed to filter our dictionary of trivial chemical names (from PubChem) yields a better performing dictionary than the previously published Jochem dictionary. Our final system performs post-processing steps to modify the boundaries of entities and to detect abbreviations. These steps are shown to significantly improve performance (2.6% and 4.0% F1-score respectively). Our complete system, with incremental post-BioCreative workshop improvements, achieves 89.9% precision and 85.4% recall (87.6% F1-score) on the CHEMDNER test set. Conclusions Grammar and dictionary approaches can produce results at least as good as the current state of the art in machine learning approaches. While machine learning approaches are commonly thought of as "black box" systems, our approach directly links the output entities to the input dictionaries and grammars. Our approach also allows correction of errors in detected entities, which can assist with entity resolution. PMID:25810776
OSA severity assessment based on sleep breathing analysis using ambient microphone.

PubMed

Dafna, E; Tarasiuk, A; Zigel, Y

2013-01-01

In this paper, an audio-based system for severity estimation of obstructive sleep apnea (OSA) is proposed. The system estimates the apnea-hypopnea index (AHI), which is the average number of apneic events per hour of sleep. This system is based on a Gaussian mixture regression algorithm that was trained and validated on full-night audio recordings. Feature selection process using a genetic algorithm was applied to select the best features extracted from time and spectra domains. A total of 155 subjects, referred to in-laboratory polysomnography (PSG) study, were recruited. Using the PSG's AHI score as a gold-standard, the performances of the proposed system were evaluated using a Pearson correlation, AHI error, and diagnostic agreement methods. Correlation of R=0.89, AHI error of 7.35 events/hr, and diagnostic agreement of 77.3% were achieved, showing encouraging performances and a reliable non-contact alternative method for OSA severity estimation.
Abnormal Error Monitoring in Math-Anxious Individuals: Evidence from Error-Related Brain Potentials

PubMed Central

Suárez-Pellicioni, Macarena; Núñez-Peña, María Isabel; Colomé, Àngels

2013-01-01

This study used event-related brain potentials to investigate whether math anxiety is related to abnormal error monitoring processing. Seventeen high math-anxious (HMA) and seventeen low math-anxious (LMA) individuals were presented with a numerical and a classical Stroop task. Groups did not differ in terms of trait or state anxiety. We found enhanced error-related negativity (ERN) in the HMA group when subjects committed an error on the numerical Stroop task, but not on the classical Stroop task. Groups did not differ in terms of the correct-related negativity component (CRN), the error positivity component (Pe), classical behavioral measures or post-error measures. The amplitude of the ERN was negatively related to participants’ math anxiety scores, showing a more negative amplitude as the score increased. Moreover, using standardized low resolution electromagnetic tomography (sLORETA) we found greater activation of the insula in errors on a numerical task as compared to errors in a non-numerical task only for the HMA group. The results were interpreted according to the motivational significance theory of the ERN. PMID:24236212
Challenges in clinical natural language processing for automated disorder normalization.

PubMed

Leaman, Robert; Khare, Ritu; Lu, Zhiyong

2015-10-01

Identifying key variables such as disorders within the clinical narratives in electronic health records has wide-ranging applications within clinical practice and biomedical research. Previous research has demonstrated reduced performance of disorder named entity recognition (NER) and normalization (or grounding) in clinical narratives than in biomedical publications. In this work, we aim to identify the cause for this performance difference and introduce general solutions. We use closure properties to compare the richness of the vocabulary in clinical narrative text to biomedical publications. We approach both disorder NER and normalization using machine learning methodologies. Our NER methodology is based on linear-chain conditional random fields with a rich feature approach, and we introduce several improvements to enhance the lexical knowledge of the NER system. Our normalization method - never previously applied to clinical data - uses pairwise learning to rank to automatically learn term variation directly from the training data. We find that while the size of the overall vocabulary is similar between clinical narrative and biomedical publications, clinical narrative uses a richer terminology to describe disorders than publications. We apply our system, DNorm-C, to locate disorder mentions and in the clinical narratives from the recent ShARe/CLEF eHealth Task. For NER (strict span-only), our system achieves precision=0.797, recall=0.713, f-score=0.753. For the normalization task (strict span+concept) it achieves precision=0.712, recall=0.637, f-score=0.672. The improvements described in this article increase the NER f-score by 0.039 and the normalization f-score by 0.036. We also describe a high recall version of the NER, which increases the normalization recall to as high as 0.744, albeit with reduced precision. We perform an error analysis, demonstrating that NER errors outnumber normalization errors by more than 4-to-1. Abbreviations and acronyms are found to be frequent causes of error, in addition to the mentions the annotators were not able to identify within the scope of the controlled vocabulary. Disorder mentions in text from clinical narratives use a rich vocabulary that results in high term variation, which we believe to be one of the primary causes of reduced performance in clinical narrative. We show that pairwise learning to rank offers high performance in this context, and introduce several lexical enhancements - generalizable to other clinical NER tasks - that improve the ability of the NER system to handle this variation. DNorm-C is a high performing, open source system for disorders in clinical text, and a promising step toward NER and normalization methods that are trainable to a wide variety of domains and entities. (DNorm-C is open source software, and is available with a trained model at the DNorm demonstration website: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#DNorm.). Published by Elsevier Inc.
Effects of Hip Strengthening on Neuromuscular Control, Hip Strength, and Self-Reported Functional Deficits in Individuals With Chronic Ankle Instability.

PubMed

Smith, Brent I; Curtis, Denice; Docherty, Carrie L

2018-06-12

Deficits in ankle and hip strength and lower-extremity postural control are associated with chronic ankle instability (CAI). Following strength training, muscle groups demonstrate increased strength. This change is partially credited to improved neuromuscular control, and many studies have investigated ankle protocols for subjects with CAI. The effects of isolating hip musculature in strength training protocols in this population are not well understood. To examine the effects of hip strengthening on clinical and self-reported outcomes in patients with CAI. Prospective randomized controlled clinical trial. Athletic training facility. Twenty-six participants with CAI (12 males and 14 females; age = 20.9 [1.5] y, height = 170.0 [12.7] cm, and mass = 77.5 [17.5] kg) were randomly assigned to training or control groups. Participants completed either 4 weeks of supervised hip strengthening (resistance bands 3 times a week) or no intervention. Participants were assessed on 4 clinical measures (Star Excursion Balance Test in the anterior, posteromedial, and posterolateral directions; Balance Error Scoring System; hip external rotation strength; and hip abduction strength) and a patient-reported measure (the Foot and Ankle Ability Measure activities of daily living and sports subscales) before and after the 4-week training period. The training group displayed significantly improved posttest measures compared with the control group for hip abduction strength (training: 446.3 [77.4] N, control: 314.7 [49.6] N, P < .01); hip external rotation strength (training: 222.1 [48.7] N, control: 169.4 [34.6] N, P < .01); Star Excursion Balance Test reach in the anterior (training: 93.1% [7.4%], control: 90.2% [7.9%], P < .01), posteromedial (training: 96.3% [8.9%], control: 88.0% [8.8%], P < .01), and posterolateral (training: 95.4% [11.1%], control: 86.6% [9.6%], P < .01) directions; Balance Error Scoring System total errors (training: 9.9 [6.3] errors, control: 21.2 [6.3] errors, P < .01); and the Foot and Ankle Ability Measure-sports score (training: 88.0 [12.6], control: 84.8 [10.9], P < .01). Improved clinical and patient-reported outcomes in the training group suggest hip strengthening is beneficial in the management and prevention of recurrent symptoms associated with CAI.
Analysis of Covariance: Is It the Appropriate Model to Study Change?

ERIC Educational Resources Information Center

Marston, Paul T., Borich, Gary D.

The four main approaches to measuring treatment effects in schools; raw gain, residual gain, covariance, and true scores; were compared. A simulation study showed true score analysis produced a large number of Type-I errors. When corrected for this error, this method showed the least power of the four. This outcome was clearly the result of the…
A Guide for Setting the Cut-Scores to Minimize Weighted Classification Errors in Test Batteries

ERIC Educational Resources Information Center

Grabovsky, Irina; Wainer, Howard

2017-01-01

In this article, we extend the methodology of the Cut-Score Operating Function that we introduced previously and apply it to a testing scenario with multiple independent components and different testing policies. We derive analytically the overall classification error rate for a test battery under the policy when several retakes are allowed for…
How Achievement Error Patterns of Students with Mild Intellectual Disability Differ from Low IQ and Low Achievement Students without Diagnoses

ERIC Educational Resources Information Center

Root, Melissa M.; Marchis, Lavinia; White, Erica; Courville, Troy; Choi, Dowon; Bray, Melissa A.; Pan, Xingyu; Wayte, Jessica

2017-01-01

This study investigated the differences in error factor scores on the Kaufman Test of Educational Achievement-Third Edition between individuals with mild intellectual disabilities (Mild IDs), those with low achievement scores but average intelligence, and those with low intelligence but without a Mild ID diagnosis. The two control groups were…
Evaluation of Two Methods for Modeling Measurement Errors When Testing Interaction Effects with Observed Composite Scores

ERIC Educational Resources Information Center

Hsiao, Yu-Yu; Kwok, Oi-Man; Lai, Mark H. C.

2018-01-01

Path models with observed composites based on multiple items (e.g., mean or sum score of the items) are commonly used to test interaction effects. Under this practice, researchers generally assume that the observed composites are measured without errors. In this study, we reviewed and evaluated two alternative methods within the structural…
Relationships between evidence-based practice, quality improvement and clinical error experience of nurses in Korean hospitals.

PubMed

Hwang, Jee-In; Park, Hyeoun-Ae

2015-07-01

This study investigated individual and work-related factors associated with nurses' perceptions of evidence-based practice (EBP) and quality improvement (QI), and the relationships between evidence-based practice, quality improvement and clinical errors. Understanding the factors affecting evidence-based practice and quality improvement activities and their relationships with clinical errors is important for designing strategies to promote evidence-based practice, quality improvement and patient safety. A cross-sectional survey was conducted with 594 nurses in two Korean teaching hospitals using the evidence-based practice Questionnaire and quality improvement scale developed in this study. Four hundred and forty-three nurses (74.6%) returned the completed survey. Nurses' ages and educational levels were significantly associated with evidence-based practice scores whereas age and job position were associated with quality improvement scores. There were positive, moderate correlations between evidence-based practice and quality improvement scores. Nurses who had not made any clinical errors during the past 12 months had significantly higher quality improvement skills scores than those who had. The findings indicated the necessity of educational support regarding evidence-based practice and quality improvement for younger staff nurses who have no master degrees. Enhancing quality improvement skills may reduce clinical errors. Nurse managers should consider the characteristics of their staff when implementing educational and clinical strategies for evidence-based practice and quality improvement. © 2013 John Wiley & Sons Ltd.
Validation of the Kp Geomagnetic Index Forecast at CCMC

NASA Astrophysics Data System (ADS)

Frechette, B. P.; Mays, M. L.

2017-12-01

The Community Coordinated Modeling Center (CCMC) Space Weather Research Center (SWRC) sub-team provides space weather services to NASA robotic mission operators and science campaigns and prototypes new models, forecasting techniques, and procedures. The Kp index is a measure of geomagnetic disturbances for space weather in the magnetosphere such as geomagnetic storms and substorms. In this study, we performed validation on the Newell et al. (2007) Kp prediction equation from December 2010 to July 2017. The purpose of this research is to understand the Kp forecast performance because it's critical for NASA missions to have confidence in the space weather forecast. This research was done by computing the Kp error for each forecast (average, minimum, maximum) and each synoptic period. Then to quantify forecast performance we computed the mean error, mean absolute error, root mean square error, multiplicative bias and correlation coefficient. A contingency table was made for each forecast and skill scores were computed. The results are compared to the perfect score and reference forecast skill score. In conclusion, the skill score and error results show that the minimum of the predicted Kp over each synoptic period from the Newell et al. (2007) Kp prediction equation performed better than the maximum or average of the prediction. However, persistence (reference forecast) outperformed all of the Kp forecasts (minimum, maximum, and average). Overall, the Newell Kp prediction still predicts within a range of 1, even though persistence beats it.

Improving the quality of cognitive screening assessments: ACEmobile, an iPad-based version of the Addenbrooke's Cognitive Examination-III.

PubMed

Newman, Craig G J; Bevins, Adam D; Zajicek, John P; Hodges, John R; Vuillermoz, Emil; Dickenson, Jennifer M; Kelly, Denise S; Brown, Simona; Noad, Rupert F

2018-01-01

Ensuring reliable administration and reporting of cognitive screening tests are fundamental in establishing good clinical practice and research. This study captured the rate and type of errors in clinical practice, using the Addenbrooke's Cognitive Examination-III (ACE-III), and then the reduction in error rate using a computerized alternative, the ACEmobile app. In study 1, we evaluated ACE-III assessments completed in National Health Service (NHS) clinics ( n = 87) for administrator error. In study 2, ACEmobile and ACE-III were then evaluated for their ability to capture accurate measurement. In study 1, 78% of clinically administered ACE-IIIs were either scored incorrectly or had arithmetical errors. In study 2, error rates seen in the ACE-III were reduced by 85%-93% using ACEmobile. Error rates are ubiquitous in routine clinical use of cognitive screening tests and the ACE-III. ACEmobile provides a framework for supporting reduced administration, scoring, and arithmetical error during cognitive screening.
Trends in Classroom Observation Scores.

PubMed

Casabianca, Jodi M; Lockwood, J R; McCaffrey, Daniel F

2015-04-01

Observations and ratings of classroom teaching and interactions collected over time are susceptible to trends in both the quality of instruction and rater behavior. These trends have potential implications for inferences about teaching and for study design. We use scores on the Classroom Assessment Scoring System-Secondary (CLASS-S) protocol from 458 middle school teachers over a 2-year period to study changes over time in (a) the average quality of teaching for the population of teachers, (b) the average severity of the population of raters, and (c) the severity of individual raters. To obtain these estimates and assess them in the context of other factors that contribute to the variability in scores, we develop an augmented G study model that is broadly applicable for modeling sources of variability in classroom observation ratings data collected over time. In our data, we found that trends in teaching quality were small. Rater drift was very large during raters' initial days of observation and persisted throughout nearly 2 years of scoring. Raters did not converge to a common level of severity; using our model we estimate that variability among raters actually increases over the course of the study. Variance decompositions based on the model find that trends are a modest source of variance relative to overall rater effects, rater errors on specific lessons, and residual error. The discussion provides possible explanations for trends and rater divergence as well as implications for designs collecting ratings over time.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Novak, Avrey; Nyflot, Matthew J.; Ermoian, Ralph P.

Purpose: Radiation treatment planning involves a complex workflow that has multiple potential points of vulnerability. This study utilizes an incident reporting system to identify the origination and detection points of near-miss errors, in order to guide their departmental safety improvement efforts. Previous studies have examined where errors arise, but not where they are detected or applied a near-miss risk index (NMRI) to gauge severity. Methods: From 3/2012 to 3/2014, 1897 incidents were analyzed from a departmental incident learning system. All incidents were prospectively reviewed weekly by a multidisciplinary team and assigned a NMRI score ranging from 0 to 4 reflectingmore » potential harm to the patient (no potential harm to potential critical harm). Incidents were classified by point of incident origination and detection based on a 103-step workflow. The individual steps were divided among nine broad workflow categories (patient assessment, imaging for radiation therapy (RT) planning, treatment planning, pretreatment plan review, treatment delivery, on-treatment quality management, post-treatment completion, equipment/software quality management, and other). The average NMRI scores of incidents originating or detected within each broad workflow area were calculated. Additionally, out of 103 individual process steps, 35 were classified as safety barriers, the process steps whose primary function is to catch errors. The safety barriers which most frequently detected incidents were identified and analyzed. Finally, the distance between event origination and detection was explored by grouping events by the number of broad workflow area events passed through before detection, and average NMRI scores were compared. Results: Near-miss incidents most commonly originated within treatment planning (33%). However, the incidents with the highest average NMRI scores originated during imaging for RT planning (NMRI = 2.0, average NMRI of all events = 1.5), specifically during the documentation of patient positioning and localization of the patient. Incidents were most frequently detected during treatment delivery (30%), and incidents identified at this point also had higher severity scores than other workflow areas (NMRI = 1.6). Incidents identified during on-treatment quality management were also more severe (NMRI = 1.7), and the specific process steps of reviewing portal and CBCT images tended to catch highest-severity incidents. On average, safety barriers caught 46% of all incidents, most frequently at physics chart review, therapist’s chart check, and the review of portal images; however, most of the incidents that pass through a particular safety barrier are not designed to be capable of being captured at that barrier. Conclusions: Incident learning systems can be used to assess the most common points of error origination and detection in radiation oncology. This can help tailor safety improvement efforts and target the highest impact portions of the workflow. The most severe near-miss events tend to originate during simulation, with the most severe near-miss events detected at the time of patient treatment. Safety barriers can be improved to allow earlier detection of near-miss events.« less
Association between reading speed, cycloplegic refractive error, and oculomotor function in reading disabled children versus controls.

PubMed

Quaid, Patrick; Simpson, Trefford

2013-01-01

Approximately one in ten students aged 6 to 16 in Ontario (Canada) school boards have an individual education plan (IEP) in place due to various learning disabilities, many of which are specific to reading difficulties. The relationship between reading (specifically objectively determined reading speed and eye movement data), refractive error, and binocular vision related clinical measurements remain elusive. One hundred patients were examined in this study (50 IEP and 50 controls, age range 6 to 16 years). IEP patients were referred by three local school boards, with controls being recruited from the routine clinic population (non-IEP patients in the same age group). A comprehensive eye examination was performed on all subjects, in addition to a full binocular vision work-up and cycloplegic refraction. In addition to the cycloplegic refractive error, the following binocular vision related data was also acquired: vergence facility, vergence amplitudes, accommodative facility, accommodative amplitudes, near point of convergence, stereopsis, and a standardized symptom scoring scale. Both the IEP and control groups were also examined using the Visagraph III system, which permits recording of the following reading parameters objectively: (i) reading speed, both raw values and values compared to grade normative data, and (ii) the number of eye movements made per 100 words read. Comprehension was assessed via a questionnaire administered at the end of the reading task, with each subject requiring 80% or greater comprehension. The IEP group had significantly greater hyperopia compared to the control group on cycloplegic examination. Vergence facility was significantly correlated to (i) reading speed, (ii) number of eye movements made when reading, and (iii) a standardized symptom scoring system. Vergence facility was also significantly reduced in the IEP group versus controls. Significant differences in several other binocular vision related scores were also found. This research indicates there are significant associations between reading speed, refractive error, and in particular vergence facility. It appears sensible that students being considered for reading specific IEP status should have a full eye examination (including cycloplegia), in addition to a comprehensive binocular vision evaluation.
Assessing team performance in the operating room: development and use of a "black-box" recorder and other tools for the intraoperative environment.

PubMed

Guerlain, Stephanie; Adams, Reid B; Turrentine, F Beth; Shin, Thomas; Guo, Hui; Collins, Stephen R; Calland, J Forrest

2005-01-01

The objective of this research was to develop a digital system to archive the complete operative environment along with the assessment tools for analysis of this data, allowing prospective studies of operative performance, intraoperative errors, team performance, and communication. Ability to study this environment will yield new insights, allowing design of systems to avoid preventable errors that contribute to perioperative complications. A multitrack, synchronized, digital audio-visual recording system (RATE tool) was developed to monitor intraoperative performance, including software to synchronize data and allow assignment of independent observational scores. Cases were scored for technical performance, participants' situational awareness (knowledge of critical information), and their comfort and satisfaction with the conduct of the procedure. Laparoscopic cholecystectomy (n = 10) was studied. Technical performance of the RATE tool was excellent. The RATE tool allowed real time, multitrack data collection of all aspects of the operative environment, while permitting digital recording of the objective assessment data in a time synchronized and annotated fashion during the procedure. The mean technical performance score was 73% +/- 28% of maximum (perfect) performance. Situational awareness varied widely among team members, with the attending surgeon typically the only team member having comprehensive knowledge of critical case information. The RATE tool allows prospective analysis of performance measures such as technical judgments, team performance, and communication patterns, offers the opportunity to conduct prospective intraoperative studies of human performance, and allows for postoperative discussion, review, and teaching. This study also suggests that gaps in situational awareness might be an underappreciated source of operative adverse events. Future uses of this system will aid teaching, failure or adverse event analysis, and intervention research.
Factors that influence the generation of autobiographical memory conjunction errors

PubMed Central

Devitt, Aleea L.; Monk-Fromont, Edwin; Schacter, Daniel L.; Addis, Donna Rose

2015-01-01

The constructive nature of memory is generally adaptive, allowing us to efficiently store, process and learn from life events, and simulate future scenarios to prepare ourselves for what may come. However, the cost of a flexibly constructive memory system is the occasional conjunction error, whereby the components of an event are authentic, but the combination of those components is false. Using a novel recombination paradigm, it was demonstrated that details from one autobiographical memory may be incorrectly incorporated into another, forming autobiographical memory conjunction errors that elude typical reality monitoring checks. The factors that contribute to the creation of these conjunction errors were examined across two experiments. Conjunction errors were more likely to occur when the corresponding details were partially rather than fully recombined, likely due to increased plausibility and ease of simulation of partially recombined scenarios. Brief periods of imagination increased conjunction error rates, in line with the imagination inflation effect. Subjective ratings suggest that this inflation is due to similarity of phenomenological experience between conjunction and authentic memories, consistent with a source monitoring perspective. Moreover, objective scoring of memory content indicates that increased perceptual detail may be particularly important for the formation of autobiographical memory conjunction errors. PMID:25611492
Factors that influence the generation of autobiographical memory conjunction errors.

PubMed

Devitt, Aleea L; Monk-Fromont, Edwin; Schacter, Daniel L; Addis, Donna Rose

2016-01-01

The constructive nature of memory is generally adaptive, allowing us to efficiently store, process and learn from life events, and simulate future scenarios to prepare ourselves for what may come. However, the cost of a flexibly constructive memory system is the occasional conjunction error, whereby the components of an event are authentic, but the combination of those components is false. Using a novel recombination paradigm, it was demonstrated that details from one autobiographical memory (AM) may be incorrectly incorporated into another, forming AM conjunction errors that elude typical reality monitoring checks. The factors that contribute to the creation of these conjunction errors were examined across two experiments. Conjunction errors were more likely to occur when the corresponding details were partially rather than fully recombined, likely due to increased plausibility and ease of simulation of partially recombined scenarios. Brief periods of imagination increased conjunction error rates, in line with the imagination inflation effect. Subjective ratings suggest that this inflation is due to similarity of phenomenological experience between conjunction and authentic memories, consistent with a source monitoring perspective. Moreover, objective scoring of memory content indicates that increased perceptual detail may be particularly important for the formation of AM conjunction errors.
Automated assessment of joint synovitis activity from medical ultrasound and power doppler examinations using image processing and machine learning methods.

PubMed

Cupek, Rafal; Ziębiński, Adam

2016-01-01

Rheumatoid arthritis is the most common rheumatic disease with arthritis, and causes substantial functional disability in approximately 50% patients after 10 years. Accurate measurement of the disease activity is crucial to provide an adequate treatment and care to the patients. The aim of this study is focused on a computer aided diagnostic system that supports an assessment of synovitis severity. This paper focus on a computer aided diagnostic system that was developed within joint Polish-Norwegian research project related to the automated assessment of the severity of synovitis. Semiquantitative ultrasound with power Doppler is a reliable and widely used method of assessing synovitis. Synovitis is estimated by ultrasound examiner using the scoring system graded from 0 to 3. Activity score is estimated on the basis of the examiner's experience or standardized ultrasound atlases. The method needs trained medical personnel and the result can be affected by a human error. The porotype of a computer-aided diagnostic system and algorithms essential for an analysis of ultrasonic images of finger joints are main scientific output of the MEDUSA project. Medusa Evaluation System prototype uses bone, skin, joint and synovitis area detectors for mutual structural model based evaluation of synovitis. Finally, several algorithms that support the semi-automatic or automatic detection of the bone region were prepared as well as a system that uses the statistical data processing approach in order to automatically localize the regions of interest. Semiquantitative ultrasound with power Doppler is a reliable and widely used method of assessing synovitis. Activity score is estimated on the basis of the examiner's experience and the result can be affected by a human error. In this paper we presented the MEDUSA project which is focused on a computer aided diagnostic system that supports an assessment of synovitis severity.
Errors and Understanding: The Effects of Error-Management Training on Creative Problem-Solving

ERIC Educational Resources Information Center

Robledo, Issac C.; Hester, Kimberly S.; Peterson, David R.; Barrett, Jamie D.; Day, Eric A.; Hougen, Dean P.; Mumford, Michael D.

2012-01-01

People make errors in their creative problem-solving efforts. The intent of this article was to assess whether error-management training would improve performance on creative problem-solving tasks. Undergraduates were asked to solve an educational leadership problem known to call for creative thought where problem solutions were scored for…
Developing a Machine-Supported Coding System for Constructed-Response Items in PISA. Research Report. ETS RR-17-47

ERIC Educational Resources Information Center

Yamamoto, Kentaro; He, Qiwei; Shin, Hyo Jeong; von Davier, Mattias

2017-01-01

Approximately a third of the Programme for International Student Assessment (PISA) items in the core domains (math, reading, and science) are constructed-response items and require human coding (scoring). This process is time-consuming, expensive, and prone to error as often (a) humans code inconsistently, and (b) coding reliability in…
The Relationship between Mean Square Differences and Standard Error of Measurement: Comment on Barchard (2012)

ERIC Educational Resources Information Center

Pan, Tianshu; Yin, Yue

2012-01-01

In the discussion of mean square difference (MSD) and standard error of measurement (SEM), Barchard (2012) concluded that the MSD between 2 sets of test scores is greater than 2(SEM)[superscript 2] and SEM underestimates the score difference between 2 tests when the 2 tests are not parallel. This conclusion has limitations for 2 reasons. First,…
A Brief Look at: Test Scores and the Standard Error of Measurement. E&R Report No. 10.13

ERIC Educational Resources Information Center

Holdzkom, David; Sumner, Brian; McMillen, Brad

2010-01-01

In the context of standardized testing, the standard error of measurement (SEM) is a measure of the factors other than the student's actual knowledge of the tested material that may affect the student's test score. Such factors may include distractions in the testing environment, fatigue, hunger, or even luck. This means that a student's observed…
Malingering in Toxic Exposure. Classification Accuracy of Reliable Digit Span and WAIS-III Digit Span Scaled Scores

ERIC Educational Resources Information Center

Greve, Kevin W.; Springer, Steven; Bianchini, Kevin J.; Black, F. William; Heinly, Matthew T.; Love, Jeffrey M.; Swift, Douglas A.; Ciota, Megan A.

2007-01-01

This study examined the sensitivity and false-positive error rate of reliable digit span (RDS) and the WAIS-III Digit Span (DS) scaled score in persons alleging toxic exposure and determined whether error rates differed from published rates in traumatic brain injury (TBI) and chronic pain (CP). Data were obtained from the files of 123 persons…
The developmental eye movement (DEM) test and Cantonese-speaking children in Hong Kong SAR, China.

PubMed

Pang, Peter C; Lam, Carly S; Woo, George C

2010-07-01

There is no published norm for the Developmental Eye Movement (DEM) Test for Cantonese-speaking Chinese children. This study aimed to determine the normative values of this test for Cantonese-speaking Chinese children in Hong Kong SAR and to compare the results with the published norms of English-speaking and Spanish-speaking children. Cantonese-speaking students aged from 6 to 11 years were tested by the DEM test in Cantonese and a digital recorder was used to record the process. The DEM scores for the 305 students were determined by listening again to the audio records after the test and computed by using the formula from the DEM manual, except that the 'vertical scores' were adjusted by taking the vertical errors into consideration. The results were compared with other norms that have been published. Our subjects made more vertical errors than in other normative studies and adjusted vertical scores were proposed. In both adjusted vertical and horizontal scores, the Cantonese-speaking children completed the tests much faster than the norms for English- and Spanish-speaking children, the differences of the means being significant (p < 0.0001) in all age groups. The DEM norms may be affected by differences in languages, cultures and education systems among different ethnicities. The norms of the DEM test are proposed for Cantonese-speaking children in Hong Kong SAR, China.
Simulated Driving Assessment (SDA) for Teen Drivers: Results from a Validation Study

PubMed Central

McDonald, Catherine C.; Kandadai, Venk; Loeb, Helen; Seacrist, Thomas S.; Lee, Yi-Ching; Winston, Zachary; Winston, Flaura K.

2015-01-01

Background Driver error and inadequate skill are common critical reasons for novice teen driver crashes, yet few validated, standardized assessments of teen driving skills exist. The purpose of this study was to evaluate the construct and criterion validity of a newly developed Simulated Driving Assessment (SDA) for novice teen drivers. Methods The SDA's 35-minute simulated drive incorporates 22 variations of the most common teen driver crash configurations. Driving performance was compared for 21 inexperienced teens (age 16–17 years, provisional license ≤90 days) and 17 experienced adults (age 25–50 years, license ≥5 years, drove ≥100 miles per week, no collisions or moving violations ≤3 years). SDA driving performance (Error Score) was based on driving safety measures derived from simulator and eye-tracking data. Negative driving outcomes included simulated collisions or run-off-the-road incidents. A professional driving evaluator/instructor reviewed videos of SDA performance (DEI Score). Results The SDA demonstrated construct validity: 1.) Teens had a higher Error Score than adults (30 vs. 13, p=0.02); 2.) For each additional error committed, the relative risk of a participant's propensity for a simulated negative driving outcome increased by 8% (95% CI: 1.05–1.10, p<0.01). The SDA demonstrated criterion validity: Error Score was correlated with DEI Score (r=−0.66, p<0.001). Conclusions This study supports the concept of validated simulated driving tests like the SDA to assess novice driver skill in complex and hazardous driving scenarios. The SDA, as a standard protocol to evaluate teen driver performance, has the potential to facilitate screening and assessment of teen driving readiness and could be used to guide targeted skill training. PMID:25740939
Design and validation of a portable, inexpensive and multi-beam timing light system using the Nintendo Wii hand controllers.

PubMed

Clark, Ross A; Paterson, Kade; Ritchie, Callan; Blundell, Simon; Bryant, Adam L

2011-03-01

Commercial timing light systems (CTLS) provide precise measurement of athletes running velocity, however they are often expensive and difficult to transport. In this study an inexpensive, wireless and portable timing light system was created using the infrared camera in Nintendo Wii hand controllers (NWHC). System creation with gold-standard validation. A Windows-based software program using NWHC to replicate a dual-beam timing gate was created. Firstly, data collected during 2m walking and running trials were validated against a 3D kinematic system. Secondly, data recorded during 5m running trials at various intensities from standing or flying starts were compared to a single beam CTLS and the independent and average scores of three handheld stopwatch (HS) operators. Intraclass correlation coefficient and Bland-Altman plots were used to assess validity. Absolute error quartiles and percentage of trials in absolute error threshold ranges were used to determine accuracy. The NWHC system was valid when compared against the 3D kinematic system (ICC=0.99, median absolute error (MAR)=2.95%). For the flying 5m trials the NWHC system possessed excellent validity and precision (ICC=0.97, MAR<3%) when compared with the CTLS. In contrast, the NWHC system and the HS values during standing start trials possessed only modest validity (ICC<0.75) and accuracy (MAR>8%). A NWHC timing light system is inexpensive, portable and valid for assessing running velocity. Errors in the 5m standing start trials may have been due to erroneous event detection by either the commercial or NWHC-based timing light systems. Copyright © 2010 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Measuring Error Identification and Recovery Skills in Surgical Residents.

PubMed

Sternbach, Joel M; Wang, Kevin; El Khoury, Rym; Teitelbaum, Ezra N; Meyerson, Shari L

2017-02-01

Although error identification and recovery skills are essential for the safe practice of surgery, they have not traditionally been taught or evaluated in residency training. This study validates a method for assessing error identification and recovery skills in surgical residents using a thoracoscopic lobectomy simulator. We developed a 5-station, simulator-based examination containing the most commonly encountered cognitive and technical errors occurring during division of the superior pulmonary vein for left upper lobectomy. Successful completion of each station requires identification and correction of these errors. Examinations were video recorded and scored in a blinded fashion using an examination-specific rating instrument evaluating task performance as well as error identification and recovery skills. Evidence of validity was collected in the categories of content, response process, internal structure, and relationship to other variables. Fifteen general surgical residents (9 interns and 6 third-year residents) completed the examination. Interrater reliability was high, with an intraclass correlation coefficient of 0.78 between 4 trained raters. Station scores ranged from 64% to 84% correct. All stations adequately discriminated between high- and low-performing residents, with discrimination ranging from 0.35 to 0.65. The overall examination score was significantly higher for intermediate residents than for interns (mean, 74 versus 64 of 90 possible; p = 0.03). The described simulator-based examination with embedded errors and its accompanying assessment tool can be used to measure error identification and recovery skills in surgical residents. This examination provides a valid method for comparing teaching strategies designed to improve error recognition and recovery to enhance patient safety. Copyright © 2017 The Society of Thoracic Surgeons. Published by Elsevier Inc. All rights reserved.
Higher mental workload is associated with poorer laparoscopic performance as measured by the NASA-TLX tool.

PubMed

Yurko, Yuliya Y; Scerbo, Mark W; Prabhu, Ajita S; Acker, Christina E; Stefanidis, Dimitrios

2010-10-01

Increased workload during task performance may increase fatigue and facilitate errors. The National Aeronautics and Space Administration-Task Load Index (NASA-TLX) is a previously validated tool for workload self-assessment. We assessed the relationship of workload and performance during simulator training on a complex laparoscopic task. NASA-TLX workload data from three separate trials were analyzed. All participants were novices (n = 28), followed the same curriculum on the fundamentals of laparoscopic surgery suturing model, and were tested in the animal operating room (OR) on a Nissen fundoplication model after training. Performance and workload scores were recorded at baseline, after proficiency achievement, and during the test. Performance, NASA-TLX scores, and inadvertent injuries during the test were analyzed and compared. Workload scores declined during training and mirrored performance changes. NASA-TLX scores correlated significantly with performance scores (r = -0.5, P < 0.001). Participants with higher workload scores caused more inadvertent injuries to adjacent structures in the OR (r = 0.38, P < 0.05). Increased mental and physical workload scores at baseline correlated with higher workload scores in the OR (r = 0.52-0.82; P < 0.05) and more inadvertent injuries (r = 0.52, P < 0.01). Increased workload is associated with inferior task performance and higher likelihood of errors. The NASA-TLX questionnaire accurately reflects workload changes during simulator training and may identify individuals more likely to experience high workload and more prone to errors during skill transfer to the clinical environment.
Validation of scores of use of inhalation devices: valoration of errors *

PubMed Central

Zambelli-Simões, Letícia; Martins, Maria Cleusa; Possari, Juliana Carneiro da Cunha; Carvalho, Greice Borges; Coelho, Ana Carla Carvalho; Cipriano, Sonia Lucena; de Carvalho-Pinto, Regina Maria; Cukier, Alberto; Stelmach, Rafael

2015-01-01

Abstract Objective: To validate two scores quantifying the ability of patients to use metered dose inhalers (MDIs) or dry powder inhalers (DPIs); to identify the most common errors made during their use; and to identify the patients in need of an educational program for the use of these devices. Methods: This study was conducted in three phases: validation of the reliability of the inhaler technique scores; validation of the contents of the two scores using a convenience sample; and testing for criterion validation and discriminant validation of these instruments in patients who met the inclusion criteria. Results: The convenience sample comprised 16 patients. Interobserver disagreement was found in 19% and 25% of the DPI and MDI scores, respectively. After expert analysis on the subject, the scores were modified and were applied in 72 patients. The most relevant difficulty encountered during the use of both types of devices was the maintenance of total lung capacity after a deep inhalation. The degree of correlation of the scores by observer was 0.97 (p < 0.0001). There was good interobserver agreement in the classification of patients as able/not able to use a DPI (50%/50% and 52%/58%; p < 0.01) and an MDI (49%/51% and 54%/46%; p < 0.05). Conclusions: The validated scores allow the identification and correction of inhaler technique errors during consultations and, as a result, improvement in the management of inhalation devices. PMID:26398751
An FMEA evaluation of intensity modulated radiation therapy dose delivery failures at tolerance criteria levels.

PubMed

Faught, Jacqueline Tonigan; Balter, Peter A; Johnson, Jennifer L; Kry, Stephen F; Court, Laurence E; Stingo, Francesco C; Followill, David S

2017-11-01

The objective of this work was to assess both the perception of failure modes in Intensity Modulated Radiation Therapy (IMRT) when the linac is operated at the edge of tolerances given in AAPM TG-40 (Kutcher et al.) and TG-142 (Klein et al.) as well as the application of FMEA to this specific section of the IMRT process. An online survey was distributed to approximately 2000 physicists worldwide that participate in quality services provided by the Imaging and Radiation Oncology Core - Houston (IROC-H). The survey briefly described eleven different failure modes covered by basic quality assurance in step-and-shoot IMRT at or near TG-40 (Kutcher et al.) and TG-142 (Klein et al.) tolerance criteria levels. Respondents were asked to estimate the worst case scenario percent dose error that could be caused by each of these failure modes in a head and neck patient as well as the FMEA scores: Occurrence, Detectability, and Severity. Risk probability number (RPN) scores were calculated as the product of these scores. Demographic data were also collected. A total of 181 individual and three group responses were submitted. 84% were from North America. Most (76%) individual respondents performed at least 80% clinical work and 92% were nationally certified. Respondent medical physics experience ranged from 2.5 to 45 yr (average 18 yr). A total of 52% of individual respondents were at least somewhat familiar with FMEA, while 17% were not familiar. Several IMRT techniques, treatment planning systems, and linear accelerator manufacturers were represented. All failure modes received widely varying scores ranging from 1 to 10 for occurrence, at least 1-9 for detectability, and at least 1-7 for severity. Ranking failure modes by RPN scores also resulted in large variability, with each failure mode being ranked both most risky (1st) and least risky (11th) by different respondents. On average MLC modeling had the highest RPN scores. Individual estimated percent dose errors and severity scores positively correlated (P < 0.01) for each FM as expected. No universal correlations were found between the demographic information collected and scoring, percent dose errors or ranking. Failure modes investigated overall were evaluated as low to medium risk, with average RPNs less than 110. The ranking of 11 failure modes was not agreed upon by the community. Large variability in FMEA scoring may be caused by individual interpretation and/or experience, reflecting the subjective nature of the FMEA tool. © 2017 American Association of Physicists in Medicine.

Visuomotor adaptability in older adults with mild cognitive decline.

PubMed

Schaffert, Jeffrey; Lee, Chi-Mei; Neill, Rebecca; Bo, Jin

2017-02-01

The current study examined the augmentation of error feedback on visuomotor adaptability in older adults with varying degrees of cognitive decline (assessed by the Montreal Cognitive Assessment; MoCA). Twenty-three participants performed a center-out computerized visuomotor adaptation task when the visual feedback of their hand movement error was presented in a regular (ratio=1:1) or enhanced (ratio=1:2) error feedback schedule. Results showed that older adults with lower scores on the MoCA had less adaptability than those with higher MoCA scores during the regular feedback schedule. However, participants demonstrated similar adaptability during the enhanced feedback schedule, regardless of their cognitive ability. Furthermore, individuals with lower MoCA scores showed larger after-effects in spatial control during the enhanced schedule compared to the regular schedule, whereas individuals with higher MoCA scores displayed the opposite pattern. Additional neuro-cognitive assessments revealed that spatial working memory and processing speed were positively related to motor adaptability during the regular scheduled but negatively related to adaptability during the enhanced schedule. We argue that individuals with mild cognitive decline employed different adaptation strategies when encountering enhanced visual feedback, suggesting older adults with mild cognitive impairment (MCI) may benefit from enhanced visual error feedback during sensorimotor adaptation. Copyright © 2016 Elsevier B.V. All rights reserved.
The relationship between somatic and cognitive-affective depression symptoms and error-related ERP’s

PubMed Central

Bridwell, David A.; Steele, Vaughn R.; Maurer, J. Michael; Kiehl, Kent A.; Calhoun, Vince D.

2014-01-01

Background The symptoms that contribute to the clinical diagnosis of depression likely emerge from, or are related to, underlying cognitive deficits. To understand this relationship further, we examined the relationship between self-reported somatic and cognitive-affective Beck’s Depression Inventory-II (BDI-II) symptoms and aspects of cognitive control reflected in error event-related potential (ERP) responses. Methods Task and assessment data were analyzed within 51 individuals. The group contained a broad distribution of depressive symptoms, as assessed by BDI-II scores. ERP’s were collected following error responses within a go/no-go task. Individual error ERP amplitudes were estimated by conducting group independent component analysis (ICA) on the electroencephalographic (EEG) time series and analyzing the individual reconstructed source epochs. Source error amplitudes were correlated with the subset of BDI-II scores representing somatic and cognitive-affective symptoms. Results We demonstrate a negative relationship between somatic depression symptoms (i.e. fatigue or loss of energy) (after regressing out cognitive-affective scores, age and IQ) and the central-parietal ERP response that peaks at 359 ms. The peak amplitudes within this ERP response were not significantly related to cognitive-affective symptom severity (after regressing out the somatic symptom scores, age, and IQ). Limitations These findings were obtained within a population of female adults from a maximum-security correctional facility. Thus, additional research is required to verify that they generalize to the broad population. Conclusions These results suggest that individuals with greater somatic depression symptoms demonstrate a reduced awareness of behavioral errors, and help clarify the relationship between clinical measures of self-reported depression symptoms and cognitive control. PMID:25451400
The relationship between somatic and cognitive-affective depression symptoms and error-related ERPs.

PubMed

Bridwell, David A; Steele, Vaughn R; Maurer, J Michael; Kiehl, Kent A; Calhoun, Vince D

2015-02-01

The symptoms that contribute to the clinical diagnosis of depression likely emerge from, or are related to, underlying cognitive deficits. To understand this relationship further, we examined the relationship between self-reported somatic and cognitive-affective Beck'sDepression Inventory-II (BDI-II) symptoms and aspects of cognitive control reflected in error event-related potential (ERP) responses. Task and assessment data were analyzed within 51 individuals. The group contained a broad distribution of depressive symptoms, as assessed by BDI-II scores. ERPs were collected following error responses within a go/no-go task. Individual error ERP amplitudes were estimated by conducting group independent component analysis (ICA) on the electroencephalographic (EEG) time series and analyzing the individual reconstructed source epochs. Source error amplitudes were correlated with the subset of BDI-II scores representing somatic and cognitive-affective symptoms. We demonstrate a negative relationship between somatic depression symptoms (i.e. fatigue or loss of energy) (after regressing out cognitive-affective scores, age and IQ) and the central-parietal ERP response that peaks at 359 ms. The peak amplitudes within this ERP response were not significantly related to cognitive-affective symptom severity (after regressing out the somatic symptom scores, age, and IQ). These findings were obtained within a population of female adults from a maximum-security correctional facility. Thus, additional research is required to verify that they generalize to the broad population. These results suggest that individuals with greater somatic depression symptoms demonstrate a reduced awareness of behavioral errors, and help clarify the relationship between clinical measures of self-reported depression symptoms and cognitive control. Copyright © 2014 Elsevier B.V. All rights reserved.
Personal authentication using hand vein triangulation and knuckle shape.

PubMed

Kumar, Ajay; Prathyusha, K Venkata

2009-09-01

This paper presents a new approach to authenticate individuals using triangulation of hand vein images and simultaneous extraction of knuckle shape information. The proposed method is fully automated and employs palm dorsal hand vein images acquired from the low-cost, near infrared, contactless imaging. The knuckle tips are used as key points for the image normalization and extraction of region of interest. The matching scores are generated in two parallel stages: (i) hierarchical matching score from the four topologies of triangulation in the binarized vein structures and (ii) from the geometrical features consisting of knuckle point perimeter distances in the acquired images. The weighted score level combination from these two matching scores are used to authenticate the individuals. The achieved experimental results from the proposed system using contactless palm dorsal-hand vein images are promising (equal error rate of 1.14%) and suggest more user friendly alternative for user identification.
Outlier removal, sum scores, and the inflation of the Type I error rate in independent samples t tests: the power of alternatives and recommendations.

PubMed

Bakker, Marjan; Wicherts, Jelte M

2014-09-01

In psychology, outliers are often excluded before running an independent samples t test, and data are often nonnormal because of the use of sum scores based on tests and questionnaires. This article concerns the handling of outliers in the context of independent samples t tests applied to nonnormal sum scores. After reviewing common practice, we present results of simulations of artificial and actual psychological data, which show that the removal of outliers based on commonly used Z value thresholds severely increases the Type I error rate. We found Type I error rates of above 20% after removing outliers with a threshold value of Z = 2 in a short and difficult test. Inflations of Type I error rates are particularly severe when researchers are given the freedom to alter threshold values of Z after having seen the effects thereof on outcomes. We recommend the use of nonparametric Mann-Whitney-Wilcoxon tests or robust Yuen-Welch tests without removing outliers. These alternatives to independent samples t tests are found to have nominal Type I error rates with a minimal loss of power when no outliers are present in the data and to have nominal Type I error rates and good power when outliers are present. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Error and Error Mitigation in Low-Coverage Genome Assemblies

PubMed Central

Hubisz, Melissa J.; Lin, Michael F.; Kellis, Manolis; Siepel, Adam

2011-01-01

The recent release of twenty-two new genome sequences has dramatically increased the data available for mammalian comparative genomics, but twenty of these new sequences are currently limited to ∼2× coverage. Here we examine the extent of sequencing error in these 2× assemblies, and its potential impact in downstream analyses. By comparing 2× assemblies with high-quality sequences from the ENCODE regions, we estimate the rate of sequencing error to be 1–4 errors per kilobase. While this error rate is fairly modest, sequencing error can still have surprising effects. For example, an apparent lineage-specific insertion in a coding region is more likely to reflect sequencing error than a true biological event, and the length distribution of coding indels is strongly distorted by error. We find that most errors are contributed by a small fraction of bases with low quality scores, in particular, by the ends of reads in regions of single-read coverage in the assembly. We explore several approaches for automatic sequencing error mitigation (SEM), making use of the localized nature of sequencing error, the fact that it is well predicted by quality scores, and information about errors that comes from comparisons across species. Our automatic methods for error mitigation cannot replace the need for additional sequencing, but they do allow substantial fractions of errors to be masked or eliminated at the cost of modest amounts of over-correction, and they can reduce the impact of error in downstream phylogenomic analyses. Our error-mitigated alignments are available for download. PMID:21340033
Speech abilities in preschool children with speech sound disorder with and without co-occurring language impairment.

PubMed

Macrae, Toby; Tyler, Ann A

2014-10-01

The authors compared preschool children with co-occurring speech sound disorder (SSD) and language impairment (LI) to children with SSD only in their numbers and types of speech sound errors. In this post hoc quasi-experimental study, independent samples t tests were used to compare the groups in the standard score from different tests of articulation/phonology, percent consonants correct, and the number of omission, substitution, distortion, typical, and atypical error patterns used in the production of different wordlists that had similar levels of phonetic and structural complexity. In comparison with children with SSD only, children with SSD and LI used similar numbers but different types of errors, including more omission patterns ( p < .001, d = 1.55) and fewer distortion patterns ( p = .022, d = 1.03). There were no significant differences in substitution, typical, and atypical error pattern use. Frequent omission error pattern use may reflect a more compromised linguistic system characterized by absent phonological representations for target sounds (see Shriberg et al., 2005). Research is required to examine the diagnostic potential of early frequent omission error pattern use in predicting later diagnoses of co-occurring SSD and LI and/or reading problems.
AGILE: Autonomous Global Integrated Language Exploitation

DTIC Science & Technology

2009-12-01

combination, including METEOR-based alignment (with stemming and WordNet synonym matching) and GIZA ++ based alignment. So far, we have not seen any...parse trees and a detailed analysis of how function words operate in translation. This program lets us fix alignment errors that systems like GIZA ...correlates better with Pyramid than with Responsiveness scoring (i.e., it is a more precise, careful, measure) • BE generally outperforms ROUGE
A Note on Some Characteristics and Correlates of the Meier Art Test of Aesthetic Perception.

ERIC Educational Resources Information Center

Stallings, William M.; Anderson, Frances E.

The reliability and the predictive and concurrent validity of the MATAP were investigated with the implicit goal of improving the prediction of course grades in the College of Fine and Applied Arts. It was found that reliability and validity coefficients were low, and it was suggested that the scoring system was a source of error variance. (MS)
Recruitment into diabetes prevention programs: what is the impact of errors in self-reported measures of obesity?

PubMed

Hernan, Andrea; Philpot, Benjamin; Janus, Edward D; Dunbar, James A

2012-07-08

Error in self-reported measures of obesity has been frequently described, but the effect of self-reported error on recruitment into diabetes prevention programs is not well established. The aim of this study was to examine the effect of using self-reported obesity data from the Finnish diabetes risk score (FINDRISC) on recruitment into the Greater Green Triangle Diabetes Prevention Project (GGT DPP). The GGT DPP was a structured group-based lifestyle modification program delivered in primary health care settings in South-Eastern Australia. Between 2004-05, 850 FINDRISC forms were collected during recruitment for the GGT DPP. Eligible individuals, at moderate to high risk of developing diabetes, were invited to undertake baseline tests, including anthropometric measurements performed by specially trained nurses. In addition to errors in calculating total risk scores, accuracy of self-reported data (height, weight, waist circumference (WC) and Body Mass Index (BMI)) from FINDRISCs was compared with baseline data, with impact on participation eligibility presented. Overall, calculation errors impacted on eligibility in 18 cases (2.1%). Of n = 279 GGT DPP participants with measured data, errors (total score calculation, BMI or WC) in self-report were found in n = 90 (32.3%). These errors were equally likely to result in under- or over-reported risk. Under-reporting was more common in those reporting lower risk scores (Spearman-rho = -0.226, p-value < 0.001). However, underestimation resulted in only 6% of individuals at high risk of diabetes being incorrectly categorised as moderate or low risk of diabetes. Overall FINDRISC was found to be an effective tool to screen and recruit participants at moderate to high risk of diabetes, accurately categorising levels of overweight and obesity using self-report data. The results could be generalisable to other diabetes prevention programs using screening tools which include self-reported levels of obesity.
Qualities of dental chart recording and coding.

PubMed

Chantravekin, Yosananda; Tasananutree, Munchulika; Santaphongse, Supitcha; Aittiwarapoj, Anchisa

2013-01-01

Chart recording and coding are the important processes in the healthcare informatics system, but there were only a few reports in the dentistry field. The objectives of this study are to study the qualities of dental chart recording and coding, as well as the achievement of lecture/workshop on this topic. The study was performed by auditing the patient's charts at the TU Dental Student Clinic from July 2011-August 2012. The chart recording mean scores ranged from 51.0-55.7%, whereas the errors in the coding process were presented in the coder part more than the doctor part. The lecture/workshop could improve the scores only in some topics.
The Truth about Scores Children Achieve on Tests.

ERIC Educational Resources Information Center

Brown, Jonathan R.

1989-01-01

The importance of using the standard error of measurement (SEm) in determining reliability in test scores is emphasized. The SEm is compared to the hypothetical true score for standardized tests, and procedures for calculation of the SEm are explained. (JDD)
The search for causal inferences: using propensity scores post hoc to reduce estimation error with nonexperimental research.

PubMed

Tumlinson, Samuel E; Sass, Daniel A; Cano, Stephanie M

2014-03-01

While experimental designs are regarded as the gold standard for establishing causal relationships, such designs are usually impractical owing to common methodological limitations. The objective of this article is to illustrate how propensity score matching (PSM) and using propensity scores (PS) as a covariate are viable alternatives to reduce estimation error when experimental designs cannot be implemented. To mimic common pediatric research practices, data from 140 simulated participants were used to resemble an experimental and nonexperimental design that assessed the effect of treatment status on participant weight loss for diabetes. Pretreatment participant characteristics (age, gender, physical activity, etc.) were then used to generate PS for use in the various statistical approaches. Results demonstrate how PSM and using the PS as a covariate can be used to reduce estimation error and improve statistical inferences. References for issues related to the implementation of these procedures are provided to assist researchers.
Smoothing of the bivariate LOD score for non-normal quantitative traits.

PubMed

Buil, Alfonso; Dyer, Thomas D; Almasy, Laura; Blangero, John

2005-12-30

Variance component analysis provides an efficient method for performing linkage analysis for quantitative traits. However, type I error of variance components-based likelihood ratio testing may be affected when phenotypic data are non-normally distributed (especially with high values of kurtosis). This results in inflated LOD scores when the normality assumption does not hold. Even though different solutions have been proposed to deal with this problem with univariate phenotypes, little work has been done in the multivariate case. We present an empirical approach to adjust the inflated LOD scores obtained from a bivariate phenotype that violates the assumption of normality. Using the Collaborative Study on the Genetics of Alcoholism data available for the Genetic Analysis Workshop 14, we show how bivariate linkage analysis with leptokurtotic traits gives an inflated type I error. We perform a novel correction that achieves acceptable levels of type I error.
Clinical implementation and failure mode and effects analysis of HDR skin brachytherapy using Valencia and Leipzig surface applicators.

PubMed

Sayler, Elaine; Eldredge-Hindy, Harriet; Dinome, Jessie; Lockamy, Virginia; Harrison, Amy S

2015-01-01

The planning procedure for Valencia and Leipzig surface applicators (VLSAs) (Nucletron, Veenendaal, The Netherlands) differs substantially from CT-based planning; the unfamiliarity could lead to significant errors. This study applies failure modes and effects analysis (FMEA) to high-dose-rate (HDR) skin brachytherapy using VLSAs to ensure safety and quality. A multidisciplinary team created a protocol for HDR VLSA skin treatments and applied FMEA. Failure modes were identified and scored by severity, occurrence, and detectability. The clinical procedure was then revised to address high-scoring process nodes. Several key components were added to the protocol to minimize risk probability numbers. (1) Diagnosis, prescription, applicator selection, and setup are reviewed at weekly quality assurance rounds. Peer review reduces the likelihood of an inappropriate treatment regime. (2) A template for HDR skin treatments was established in the clinic's electronic medical record system to standardize treatment instructions. This reduces the chances of miscommunication between the physician and planner as well as increases the detectability of an error. (3) A screen check was implemented during the second check to increase detectability of an error. (4) To reduce error probability, the treatment plan worksheet was designed to display plan parameters in a format visually similar to the treatment console display, facilitating data entry and verification. (5) VLSAs are color coded and labeled to match the electronic medical record prescriptions, simplifying in-room selection and verification. Multidisciplinary planning and FMEA increased detectability and reduced error probability during VLSA HDR brachytherapy. This clinical model may be useful to institutions implementing similar procedures. Copyright © 2015 American Brachytherapy Society. Published by Elsevier Inc. All rights reserved.
FIASCO II failure to achieve a satisfactory cardiac outcome study: the elimination of system errors.

PubMed

Farid, Shakil; Page, Aravinda; Jenkins, David; Jones, Mark T; Freed, Darren; Nashef, Samer A M

2013-07-01

Death in low-risk cardiac surgical patients provides a simple and accessible method by which modifiable causes of death can be identified. In the first FIASCO study published in 2009, local potentially modifiable causes of preventable death in low-risk patients with a logistic EuroSCORE of 0-2 undergoing cardiac surgery were inadequate myocardial protection and lack of clarity in the chain of responsibility. As a result, myocardial protection was improved, and a formalized system introduced to ensure clarity of the chain of responsibility in the care of all cardiac surgical patients. The purpose of the current study was to re-audit outcomes in low-risk patients to see if improvements have been achieved. Patients with a logistic EuroSCORE of 0-2 who had cardiac surgery from January 2006 to August 2012 were included. Data were prospectively collected and retrospectively analysed. The case notes of patients who died in hospital were subject to internal and external review and classified according to preventability. Two thousand five hundred and forty-nine patients with a logistic EuroSCORE of 0-2 underwent cardiac surgery during the study period. Seven deaths occurred in truly low-risk patients, giving a mortality of 0.27%. Of the seven, three were considered preventable and four non-preventable. Mortality was marginally lower than in our previous study (0.37%), and no death occurred as a result of inadequate myocardial protection or communication failures. We postulate that the regular study of such events in all institutions may unmask systemic errors that can be remedied to prevent or reduce future occurrences. We encourage all units to use this methodology to detect any similarly modifiable factors in their practice.
Boundary overlap for medical image segmentation evaluation

NASA Astrophysics Data System (ADS)

Yeghiazaryan, Varduhi; Voiculescu, Irina

2017-03-01

All medical image segmentation algorithms need to be validated and compared, and yet no evaluation framework is widely accepted within the imaging community. Collections of segmentation results often need to be compared and ranked by their effectiveness. Evaluation measures which are popular in the literature are based on region overlap or boundary distance. None of these are consistent in the way they rank segmentation results: they tend to be sensitive to one or another type of segmentation error (size, location, shape) but no single measure covers all error types. We introduce a new family of measures, with hybrid characteristics. These measures quantify similarity/difference of segmented regions by considering their overlap around the region boundaries. This family is more sensitive than other measures in the literature to combinations of segmentation error types. We compare measure performance on collections of segmentation results sourced from carefully compiled 2D synthetic data, and also on 3D medical image volumes. We show that our new measure: (1) penalises errors successfully, especially those around region boundaries; (2) gives a low similarity score when existing measures disagree, thus avoiding overly inflated scores; and (3) scores segmentation results over a wider range of values. We consider a representative measure from this family and the effect of its only free parameter on error sensitivity, typical value range, and running time.
The Iatroref study: medical errors are associated with symptoms of depression in ICU staff but not burnout or safety culture.

PubMed

Garrouste-Orgeas, Maité; Perrin, Marion; Soufir, Lilia; Vesin, Aurélien; Blot, François; Maxime, Virginie; Beuret, Pascal; Troché, Gilles; Klouche, Kada; Argaud, Laurent; Azoulay, Elie; Timsit, Jean-François

2015-02-01

Staff behaviours to optimise patient safety may be influenced by burnout, depression and strength of the safety culture. We evaluated whether burnout, symptoms of depression and safety culture affected the frequency of medical errors and adverse events (selected using Delphi techniques) in ICUs. Prospective, observational, multicentre (31 ICUs) study from August 2009 to December 2011. Burnout, depression symptoms and safety culture were evaluated using the Maslach Burnout Inventory (MBI), CES-Depression scale and Safety Attitudes Questionnaire, respectively. Of 1,988 staff members, 1,534 (77.2 %) participated. Frequencies of medical errors and adverse events were 804.5/1,000 and 167.4/1,000 patient-days, respectively. Burnout prevalence was 3 or 40 % depending on the definition (severe emotional exhaustion, depersonalisation and low personal accomplishment; or MBI score greater than -9). Depression symptoms were identified in 62/330 (18.8 %) physicians and 188/1,204 (15.6 %) nurses/nursing assistants. Median safety culture score was 60.7/100 [56.8-64.7] in physicians and 57.5/100 [52.4-61.9] in nurses/nursing assistants. Depression symptoms were an independent risk factor for medical errors. Burnout was not associated with medical errors. The safety culture score had a limited influence on medical errors. Other independent risk factors for medical errors or adverse events were related to ICU organisation (40 % of ICU staff off work on the previous day), staff (specific safety training) and patients (workload). One-on-one training of junior physicians during duties and existence of a hospital risk-management unit were associated with lower risks. The frequency of selected medical errors in ICUs was high and was increased when staff members had symptoms of depression.
Predictors of driving safety in early Alzheimer disease.

PubMed

Dawson, J D; Anderson, S W; Uc, E Y; Dastrup, E; Rizzo, M

2009-02-10

To measure the association of cognition, visual perception, and motor function with driving safety in Alzheimer disease (AD). Forty drivers with probable early AD (mean Mini-Mental State Examination score 26.5) and 115 elderly drivers without neurologic disease underwent a battery of cognitive, visual, and motor tests, and drove a standardized 35-mile route in urban and rural settings in an instrumented vehicle. A composite cognitive score (COGSTAT) was calculated for each subject based on eight neuropsychological tests. Driving safety errors were noted and classified by a driving expert based on video review. Drivers with AD committed an average of 42.0 safety errors/drive (SD = 12.8), compared to an average of 33.2 (SD = 12.2) for drivers without AD (p < 0.0001); the most common errors were lane violations. Increased age was predictive of errors, with a mean of 2.3 more errors per drive observed for each 5-year age increment. After adjustment for age and gender, COGSTAT was a significant predictor of safety errors in subjects with AD, with a 4.1 increase in safety errors observed for a 1 SD decrease in cognitive function. Significant increases in safety errors were also found in subjects with AD with poorer scores on Benton Visual Retention Test, Complex Figure Test-Copy, Trail Making Subtest-A, and the Functional Reach Test. Drivers with Alzheimer disease (AD) exhibit a range of performance on tests of cognition, vision, and motor skills. Since these tests provide additional predictive value of driving performance beyond diagnosis alone, clinicians may use these tests to help predict whether a patient with AD can safely operate a motor vehicle.
Assessing the learning curve for the acquisition of laparoscopic skills on a virtual reality simulator.

PubMed

Sherman, V; Feldman, L S; Stanbridge, D; Kazmi, R; Fried, G M

2005-05-01

The aim of this study was to develop summary metrics and assess the construct validity for a virtual reality laparoscopic simulator (LapSim) by comparing the learning curves of three groups with different levels of laparoscopic expertise. Three groups of subjects ('expert', 'junior', and 'naïve') underwent repeated trials on three LapSim tasks. Formulas were developed to calculate scores for efficiency ('time-error') and economy of 'motion' ('motion') using metrics generated by the software after each drill. Data (mean +/- SD) were evaluated by analysis of variance (ANOVA). Significance was set at p < 0.05. All three groups improved significantly from baseline to final for both 'time-error' and 'motion' scores. There were significant differences between groups in time error performances at baseline and final, due to higher scores in the 'expert' group. A significant difference in 'motion' scores was seen only at baseline. We have developed summary metrics for the LapSim that differentiate among levels of laparoscopic experience. This study also provides evidence of construct validity for the LapSim.

A Family of Algorithms for Computing Consensus about Node State from Network Data

PubMed Central

Brush, Eleanor R.; Krakauer, David C.; Flack, Jessica C.

2013-01-01

Biological and social networks are composed of heterogeneous nodes that contribute differentially to network structure and function. A number of algorithms have been developed to measure this variation. These algorithms have proven useful for applications that require assigning scores to individual nodes–from ranking websites to determining critical species in ecosystems–yet the mechanistic basis for why they produce good rankings remains poorly understood. We show that a unifying property of these algorithms is that they quantify consensus in the network about a node's state or capacity to perform a function. The algorithms capture consensus by either taking into account the number of a target node's direct connections, and, when the edges are weighted, the uniformity of its weighted in-degree distribution (breadth), or by measuring net flow into a target node (depth). Using data from communication, social, and biological networks we find that that how an algorithm measures consensus–through breadth or depth– impacts its ability to correctly score nodes. We also observe variation in sensitivity to source biases in interaction/adjacency matrices: errors arising from systematic error at the node level or direct manipulation of network connectivity by nodes. Our results indicate that the breadth algorithms, which are derived from information theory, correctly score nodes (assessed using independent data) and are robust to errors. However, in cases where nodes “form opinions” about other nodes using indirect information, like reputation, depth algorithms, like Eigenvector Centrality, are required. One caveat is that Eigenvector Centrality is not robust to error unless the network is transitive or assortative. In these cases the network structure allows the depth algorithms to effectively capture breadth as well as depth. Finally, we discuss the algorithms' cognitive and computational demands. This is an important consideration in systems in which individuals use the collective opinions of others to make decisions. PMID:23874167
Prediction of Beck Depression Inventory (BDI-II) Score Using Acoustic Measurements in a Sample of Iium Engineering Students

NASA Astrophysics Data System (ADS)

Fikri Zanil, Muhamad; Nur Wahidah Nik Hashim, Nik; Azam, Huda

2017-11-01

Psychiatrist currently relies on questionnaires and interviews for psychological assessment. These conservative methods often miss true positives and might lead to death, especially in cases where a patient might be experiencing suicidal predisposition but was only diagnosed as major depressive disorder (MDD). With modern technology, an assessment tool might aid psychiatrist with a more accurate diagnosis and thus hope to reduce casualty. This project will explore on the relationship between speech features of spoken audio signal (reading) in Bahasa Malaysia with the Beck Depression Inventory scores. The speech features used in this project were Power Spectral Density (PSD), Mel-frequency Ceptral Coefficients (MFCC), Transition Parameter, formant and pitch. According to analysis, the optimum combination of speech features to predict BDI-II scores include PSD, MFCC and Transition Parameters. The linear regression approach with sequential forward/backward method was used to predict the BDI-II scores using reading speech. The result showed 0.4096 mean absolute error (MAE) for female reading speech. For male, the BDI-II scores successfully predicted 100% less than 1 scores difference with MAE of 0.098437. A prediction system called Depression Severity Evaluator (DSE) was developed. The DSE managed to predict one out of five subjects. Although the prediction rate was low, the system precisely predict the score within the maximum difference of 4.93 for each person. This demonstrates that the scores are not random numbers.
Event-Related-Potential (ERP) Correlates of Performance Monitoring in Adults With Attention-Deficit Hyperactivity Disorder (ADHD)

PubMed Central

Marquardt, Lynn; Eichele, Heike; Lundervold, Astri J.; Haavik, Jan; Eichele, Tom

2018-01-01

Introduction: Attention-deficit hyperactivity disorder (ADHD) is one of the most frequent neurodevelopmental disorders in children and tends to persist into adulthood. Evidence from neuropsychological, neuroimaging, and electrophysiological studies indicates that alterations of error processing are core symptoms in children and adolescents with ADHD. To test whether adults with ADHD show persisting deficits and compensatory processes, we investigated performance monitoring during stimulus-evaluation and response-selection, with a focus on errors, as well as within-group correlations with symptom scores. Methods: Fifty-five participants (27 ADHD and 28 controls) aged 19–55 years performed a modified flanker task during EEG recording with 64 electrodes, and the ADHD and control groups were compared on measures of behavioral task performance, event-related potentials of performance monitoring (N2, P3), and error processing (ERN, Pe). Adult ADHD Self-Report Scale (ASRS) was used to assess ADHD symptom load. Results: Adults with ADHD showed higher error rates in incompatible trials, and these error rates correlated positively with the ASRS scores. Also, we observed lower P3 amplitudes in incompatible trials, which were inversely correlated with symptom load in the ADHD group. Adults with ADHD also displayed reduced error-related ERN and Pe amplitudes. There were no significant differences in reaction time (RT) and RT variability between the two groups. Conclusion: Our findings show deviations of electrophysiological measures, suggesting reduced effortful engagement of attentional and error-monitoring processes in adults with ADHD. Associations between ADHD symptom scores, event-related potential amplitudes, and poorer task performance in the ADHD group further support this notion. PMID:29706908
Assessment of a model for achieving competency in administration and scoring of the WAIS-IV in post-graduate psychology students.

PubMed

Roberts, Rachel M; Davis, Melissa C

2015-01-01

There is a need for an evidence-based approach to training professional psychologists in the administration and scoring of standardized tests such as the Wechsler Adult Intelligence Scale (WAIS) due to substantial evidence that these tasks are associated with numerous errors that have the potential to significantly impact clients' lives. Twenty three post-graduate psychology students underwent training in using the WAIS-IV according to a best-practice teaching model that involved didactic teaching, independent study of the test manual, and in-class practice with teacher supervision and feedback. Video recordings and test protocols from a role-played test administration were analyzed for errors according to a comprehensive checklist with self, peer, and faculty member reviews. 91.3% of students were rated as having demonstrated competency in administration and scoring. All students were found to make errors, with substantially more errors being detected by the faculty member than by self or peers. Across all subtests, the most frequent errors related to failure to deliver standardized instructions verbatim from the manual. The failure of peer and self-reviews to detect the majority of the errors suggests that novice feedback (self or peers) may be ineffective to eliminate errors and the use of more senior peers may be preferable. It is suggested that involving senior trainees, recent graduates and/or experienced practitioners in the training of post-graduate students may have benefits for both parties, promoting a peer-learning and continuous professional development approach to the development and maintenance of skills in psychological assessment.
Commentary on Values and Standards in Performance Assessment.

ERIC Educational Resources Information Center

Guion, Robert M.

1995-01-01

This commentary discusses three essential themes in performance assessment and its scoring. First, scores should mean something. Second, performance scores should permit fair and meaningful comparisons. Third, validity-reducing errors should be minimal. Increased attention to performance assessment may overcome these problems. (SLD)
The characteristics of patients with uncertain/mild cognitive impairment on the Alzheimer disease assessment scale-cognitive subscale.

PubMed

Pyo, Geunyeong; Elble, Rodger J; Ala, Thomas; Markwell, Stephen J

2006-01-01

The performances of the uncertain/mild cognitive impairment (MCI) patients on the Alzheimer Disease Assessment Scale-Cognitive (ADAS-Cog) subscale were compared with those of normal controls, Alzheimer disease patients with CDR 0.5, and Alzheimer disease patients with CDR 1.0. The Uncertain/MCI group was significantly different from normal controls and Alzheimer disease CDR 0.5 or 1.0 groups on the ADAS-Cog except on a few non-memory subtests. Age was significantly correlated with total error score in the normal group, but there was no significant correlation between age and ADAS-Cog scores in the patient groups. Education was not significantly correlated with the ADAS-Cog scores in any group. Regardless of age and educational level, there were clear differences between the normal group and the Uncertain/MCI group, especially on the total error scores. We found that the total error score of the ADAS-Cog was the most reliable variable in detecting patients with mild cognitive impairment. The present study demonstrated that the ADAS-Cog is a promising tool for detecting and studying patients with mild cognitive impairment. The results also indicated that demographic variables such as age and education do not play a significant role in the diagnosis of mild cognitive impaired patients based on the ADAS-Cog scores.
To twist, roll, stroke or poke? A study of input devices for menu navigation in the cockpit.

PubMed

Stanton, Neville A; Harvey, Catherine; Plant, Katherine L; Bolton, Luke

2013-01-01

Modern interfaces within the aircraft cockpit integrate many flight management system (FMS) functions into a single system. The success of a user's interaction with an interface depends upon the optimisation between the input device, tasks and environment within which the system is used. In this study, four input devices were evaluated using a range of Human Factors methods, in order to assess aspects of usability including task interaction times, error rates, workload, subjective usability and physical discomfort. The performance of the four input devices was compared using a holistic approach and the findings showed that no single input device produced consistently high performance scores across all of the variables evaluated. The touch screen produced the highest number of 'best' scores; however, discomfort ratings for this device were high, suggesting that it is not an ideal solution as both physical and cognitive aspects of performance must be accounted for in design. This study evaluated four input devices for control of a screen-based flight management system. A holistic approach was used to evaluate both cognitive and physical performance. Performance varied across the dependent variables and between the devices; however, the touch screen produced the largest number of 'best' scores.
Outcomes of a Failure Mode and Effects Analysis for medication errors in pediatric anesthesia.

PubMed

Martin, Lizabeth D; Grigg, Eliot B; Verma, Shilpa; Latham, Gregory J; Rampersad, Sally E; Martin, Lynn D

2017-06-01

The Institute of Medicine has called for development of strategies to prevent medication errors, which are one important cause of preventable harm. Although the field of anesthesiology is considered a leader in patient safety, recent data suggest high medication error rates in anesthesia practice. Unfortunately, few error prevention strategies for anesthesia providers have been implemented. Using Toyota Production System quality improvement methodology, a multidisciplinary team observed 133 h of medication practice in the operating room at a tertiary care freestanding children's hospital. A failure mode and effects analysis was conducted to systematically deconstruct and evaluate each medication handling process step and score possible failure modes to quantify areas of risk. A bundle of five targeted countermeasures were identified and implemented over 12 months. Improvements in syringe labeling (73 to 96%), standardization of medication organization in the anesthesia workspace (0 to 100%), and two-provider infusion checks (23 to 59%) were observed. Medication error reporting improved during the project and was subsequently maintained. After intervention, the median medication error rate decreased from 1.56 to 0.95 per 1000 anesthetics. The frequency of medication error harm events reaching the patient also decreased. Systematic evaluation and standardization of medication handling processes by anesthesia providers in the operating room can decrease medication errors and improve patient safety. © 2017 John Wiley & Sons Ltd.
Test-retest reliability and minimal detectable change of two simplified 3-point balance measures in patients with stroke.

PubMed

Chen, Yi-Miau; Huang, Yi-Jing; Huang, Chien-Yu; Lin, Gong-Hong; Liaw, Lih-Jiun; Lee, Shih-Chieh; Hsieh, Ching-Lin

2017-10-01

The 3-point Berg Balance Scale (BBS-3P) and 3-point Postural Assessment Scale for Stroke Patients (PASS-3P) were simplified from the BBS and PASS to overcome the complex scoring systems. The BBS-3P and PASS-3P were more feasible in busy clinical practice and showed similarly sound validity and responsiveness to the original measures. However, the reliability of the BBS-3P and PASS-3P is unknown limiting their utility and the interpretability of scores. We aimed to examine the test-retest reliability and minimal detectable change (MDC) of the BBS-3P and PASS-3P in patients with stroke. Cross-sectional study. The rehabilitation departments of a medical center and a community hospital. A total of 51 chronic stroke patients (64.7% male). Both balance measures were administered twice 7 days apart. The test-retest reliability of both the BBS-3P and PASS-3P were examined by intraclass correlation coefficients (ICC). The MDC and its percentage over the total score (MDC%) of each measure was calculated for examining the random measurement errors. The ICC values of the BBS-3P and PASS-3P were 0.99 and 0.97, respectively. The MDC% (MDC) of the BBS-3P and PASS-3P were 9.1% (5.1 points) and 8.4% (3.0 points), respectively, indicating that both measures had small and acceptable random measurement errors. Our results showed that both the BBS-3P and the PASS-3P had good test-retest reliability, with small and acceptable random measurement error. These two simplified 3-level balance measures can provide reliable results over time. Our findings support the repeated administration of the BBS-3P and PASS-3P to monitor the balance of patients with stroke. The MDC values can help clinicians and researchers interpret the change scores more precisely.
The effect of information provision on reduction of errors in intravenous drug preparation and administration by nurses in ICU and surgical wards.

PubMed

Abbasinazari, Mohammad; Zareh-Toranposhti, Samaneh; Hassani, Abdollah; Sistanizad, Mohammad; Azizian, Homa; Panahi, Yunes

2012-01-01

Malpractice in preparation and administration of intravenous (IV) medications has been reported frequently. Inadequate knowledge of nurses has been reported as a cause of such errors. We aimed to evaluate the role of nurses' education via installation of wall posters and giving informative pamphlets in reducing the errors in preparation and administration of intravenous drugs in 2 wards (ICU and surgery) of a teaching hospital in Tehran, Iran. A trained observer stationed in 2 wards in different work shifts. He recorded the nurses' practice regarding the preparation and administration of IV drugs and scored them before and after the education process. 400 observations were evaluated. Of them, 200 were related to before education and 200 were related to after education. On a 0-10 quality scale, mean ± SD scores of before and after education were determined. Mean ± SD scores of before and after education at the 2 wards were 4.51 (± 1.24) and 6.15 (± 1.23) respectively. There was a significant difference between the scores before and after intervention in ICU (P<0.001), surgery (P<0.001), and total two wards (P<0.001). Nurses' education by using wall poster and informative pamphlets regarding the correct preparation and administration of IV drugs can reduce the number of errors.
A Practical Method for Identifying Significant Change Scores

ERIC Educational Resources Information Center

Cascio, Wayne F.; Kurtines, William M.

1977-01-01

A test of significance for identifying individuals who are most influenced by an experimental treatment as measured by pre-post test change score is presented. The technique requires true difference scores, the reliability of obtained differences, and their standard error of measurement. (Author/JKS)
Gender nonconformity, intelligence, and sexual orientation.

PubMed

Rahman, Qazi; Bhanot, Suraj; Emrith-Small, Hanna; Ghafoor, Shilan; Roberts, Steven

2012-06-01

The present study explored whether there were relationships among gender nonconformity, intelligence, and sexual orientation. A total of 106 heterosexual men, 115 heterosexual women, and 103 gay men completed measures of demographic variables, recalled childhood gender nonconformity (CGN), and the National Adult Reading Test (NART). NART error scores were used to estimate Wechsler Adult Intelligence Scale (WAIS) Full-Scale IQ (FSIQ) and Verbal IQ (VIQ) scores. Gay men had significantly fewer NART errors than heterosexual men and women (controlling for years of education). In heterosexual men, correlational analysis revealed significant associations between CGN, NART, and FSIQ scores (elevated boyhood femininity correlated with higher IQ scores). In heterosexual women, the direction of the correlations between CGN and all IQ scores was reversed (elevated girlhood femininity correlating with lower IQ scores). There were no significant correlations among these variables in gay men. These data may indicate a "sexuality-specific" effect on general cognitive ability but with limitations. They also support growing evidence that quantitative measures of sex-atypicality are useful in the study of trait sexual orientation.
Differences in Error Detection Skills by Band and Choral Preservice Teachers

ERIC Educational Resources Information Center

Stambaugh, Laura A.

2016-01-01

Band and choral preservice teachers (N = 44) studied band and choral scores, listened to recordings of school ensembles, and identified errors in the recordings. Results indicated that preservice teachers identified significantly more errors when listening to recordings of their primary area (band majors listening to band, p = 0.045; choral majors…
Reliability of Total Test Scores When Considered as Ordinal Measurements

ERIC Educational Resources Information Center

Biswas, Ajoy Kumar

2006-01-01

This article studies the ordinal reliability of (total) test scores. This study is based on a classical-type linear model of observed score (X), true score (T), and random error (E). Based on the idea of Kendall's tau-a coefficient, a measure of ordinal reliability for small-examinee populations is developed. This measure is extended to large…
Asymptotic Standard Errors of Observed-Score Equating with Polytomous IRT Models

ERIC Educational Resources Information Center

Andersson, Björn

2016-01-01

In observed-score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response…
Quantitative computed tomography (QCT) as a radiology reporting tool by using optical character recognition (OCR) and macro program.

PubMed

Lee, Young Han; Song, Ho-Taek; Suh, Jin-Suck

2012-12-01

The objectives are (1) to introduce a new concept of making a quantitative computed tomography (QCT) reporting system by using optical character recognition (OCR) and macro program and (2) to illustrate the practical usages of the QCT reporting system in radiology reading environment. This reporting system was created as a development tool by using an open-source OCR software and an open-source macro program. The main module was designed for OCR to report QCT images in radiology reading process. The principal processes are as follows: (1) to save a QCT report as a graphic file, (2) to recognize the characters from an image as a text, (3) to extract the T scores from the text, (4) to perform error correction, (5) to reformat the values into QCT radiology reporting template, and (6) to paste the reports into the electronic medical record (EMR) or picture archiving and communicating system (PACS). The accuracy test of OCR was performed on randomly selected QCTs. QCT as a radiology reporting tool successfully acted as OCR of QCT. The diagnosis of normal, osteopenia, or osteoporosis is also determined. Error correction of OCR is done with AutoHotkey-coded module. The results of T scores of femoral neck and lumbar vertebrae had an accuracy of 100 and 95.4 %, respectively. A convenient QCT reporting system could be established by utilizing open-source OCR software and open-source macro program. This method can be easily adapted for other QCT applications and PACS/EMR.
Simulated Driving Assessment (SDA) for teen drivers: results from a validation study.

PubMed

McDonald, Catherine C; Kandadai, Venk; Loeb, Helen; Seacrist, Thomas S; Lee, Yi-Ching; Winston, Zachary; Winston, Flaura K

2015-06-01

Driver error and inadequate skill are common critical reasons for novice teen driver crashes, yet few validated, standardised assessments of teen driving skills exist. The purpose of this study is to evaluate the construct and criterion validity of a newly developed Simulated Driving Assessment (SDA) for novice teen drivers. The SDA's 35 min simulated drive incorporates 22 variations of the most common teen driver crash configurations. Driving performance was compared for 21 inexperienced teens (age 16-17 years, provisional license ≤90 days) and 17 experienced adults (age 25-50 years, license ≥5 years, drove ≥100 miles per week, no collisions or moving violations ≤3 years). SDA driving performance (Error Score) was based on driving safety measures derived from simulator and eye-tracking data. Negative driving outcomes included simulated collisions or run-off-the-road incidents. A professional driving evaluator/instructor (DEI Score) reviewed videos of SDA performance. The SDA demonstrated construct validity: (1) teens had a higher Error Score than adults (30 vs. 13, p=0.02); (2) For each additional error committed, the RR of a participant's propensity for a simulated negative driving outcome increased by 8% (95% CI 1.05 to 1.10, p<0.01). The SDA-demonstrated criterion validity: Error Score was correlated with DEI Score (r=-0.66, p<0.001). This study supports the concept of validated simulated driving tests like the SDA to assess novice driver skill in complex and hazardous driving scenarios. The SDA, as a standard protocol to evaluate teen driver performance, has the potential to facilitate screening and assessment of teen driving readiness and could be used to guide targeted skill training. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
The reliability and validity of a portfolio designed as a programmatic assessment of performance in an integrated clinical placement.

PubMed

Roberts, Chris; Shadbolt, Narelle; Clark, Tyler; Simpson, Phillip

2014-09-20

Little is known about the technical adequacy of portfolios in reporting multiple complex academic and performance-based assessments. We explored, first, the influencing factors on the precision of scoring within a programmatic assessment of student learning outcomes within an integrated clinical placement. Second, the degree to which validity evidence supported interpretation of student scores. Within generalisability theory, we estimated the contribution that each wanted factor (i.e. student capability) and unwanted factors (e.g. the impact of assessors) made to the variation in portfolio task scores. Relative and absolute standard errors of measurement provided a confidence interval around a pre-determined pass/fail standard for all six tasks. Validity evidence was sought through demonstrating the internal consistency of the portfolio and exploring the relationship of student scores with clinical experience. The mean portfolio mark for 257 students, across 372 raters, based on six tasks, was 75.56 (SD, 6.68). For a single student on one assessment task, 11% of the variance in scores was due to true differences in student capability. The most significant interaction was context specificity (49%), the tendency for one student to engage with one task and not engage with another task. Rater subjectivity was 29%. An absolute standard error of measurement of 4.74%, gave a 95% CI of +/- 9.30%, and a 68% CI of +/- 4.74% around a pass/fail score of 57%. Construct validity was supported by demonstration of an assessment framework, the internal consistency of the portfolio tasks, and higher scores for students who did the clinical placement later in the academic year. A portfolio designed as a programmatic assessment of an integrated clinical placement has sufficient evidence of validity to support a specific interpretation of student scores around passing a clinical placement. It has modest precision in assessing students' achievement of a competency standard. There were identifiable areas for reducing measurement error and providing more certainty around decision-making. Reducing the measurement error would require engaging with the student body on the value of the tasks, more focussed academic and clinical supervisor training, and revisiting the rubric of the assessment in the light of feedback.
Evaluating role of interactive visualization tool in improving students' conceptual understanding of chemical equilibrium

NASA Astrophysics Data System (ADS)

Sampath Kumar, Bharath

The purpose of this study is to examine the role of partnering visualization tool such as simulation towards development of student's concrete conceptual understanding of chemical equilibrium. Students find chemistry concepts abstract, especially at the microscopic level. Chemical equilibrium is one such topic. While research studies have explored effectiveness of low tech instructional strategies such as analogies, jigsaw, cooperative learning, and using modeling blocks, fewer studies have explored the use of visualization tool such as simulations in the context of dynamic chemical equilibrium. Research studies have identified key reasons behind misconceptions such as lack of systematic understanding of foundational chemistry concepts, failure to recognize the system is dynamic, solving numerical problems on chemical equilibrium in an algorithmic fashion, erroneous application Le Chatelier's principle (LCP) etc. Kress et al. (2001) suggested that external representation in the form of visualization is more than a tool for learning, because it enables learners to make meanings or express their ideas which cannot be readily done so through a verbal representation alone. Mixed method study design was used towards data collection. The qualitative portion of the study is aimed towards understanding the change in student's mental model before and after the intervention. A quantitative instrument was developed based on common areas of misconceptions identified by research studies. A pilot study was conducted prior to the actual study to obtain feedback from students on the quantitative instrument and the simulation. Participants for the pilot study were sampled from a single general chemistry class. Following the pilot study, the research study was conducted with a total of 27 students (N=15 in experimental group and N=12 in control group). Prior to participating in the study, students have completed their midterm test on the topic of chemical equilibrium. Qualitative interviews pre and post revealed students' mental model or thought process towards chemical equilibrium. Simulations used in the study were developed using the SCRATCH software platform. In order to test the effect of visualization tool on students' conceptual understanding of chemical equilibrium, an ANCOVA analysis was conducted. Results from a one-factor ANCOVA showed posttest scores were significantly higher for the experimental group (Mpostadj. = 7.27 SDpost = 1.387) relative to the control group (Mpostadj. = 2.67, SDpost = 1.371) after adjusting for pretest scores, F (1,24) = 71.82, MSE = 1.497, p = 0.03, eta 2p = 0.75, d = 3.33. Cohen's d was converted to an attenuated effect size d* using the procedure outlined in Thompson (2006). The adjusted (for pretest scores) group mean difference estimate without measure error correction for the posttest scores and the pretest scores was 4.2 with a Cohen's d = 3.04. An alternate approach reported in Cho and Preacher (2015) was used to determine effect size. The adjusted (for pretest scores) group mean difference estimate with measurement error correction only for the posttest scores (but not with measurement error correction for the pretest scores) was 4.99 with a Cohen's d = 3.61. Finally, the adjusted (for pretest scores) group mean difference estimate with measurement error correction for both pretest and posttest scores was 4.23 with a Cohen's d = 3.07. From a quantitative perspective, these effect size indicate a strong relationship between the experimental intervention provided and students' conceptual understanding of chemical equilibrium concepts. That is, those students who received the experimental intervention had exceptionally higher. KEYWORDS: Chemical Equilibrium, Visualization, Alternate Conceptions, Ontological Shift. Simulations.
Normative Values of the Sport Concussion Assessment Tool 3 (SCAT3) in High School Athletes.

PubMed

Snedden, Traci R; Brooks, Margaret Alison; Hetzel, Scott; McGuine, Tim

2017-09-01

Establish sex, age, and concussion history-specific normative baseline sport concussion assessment tool 3 (SCAT3) values in adolescent athletes. Prospective cohort. Seven Wisconsin high schools. Seven hundred fifty-eight high school athletes participating in 19 sports. Sex, age, and concussion history. Sport Concussion Assessment Tool 3 (SCAT3): total number of symptoms; symptom severity; total Standardized Assessment of Concussion (SAC); and each SAC component (orientation, immediate memory, concentration, delayed recall); Balance Error Scoring System (BESS) total errors (BESS, floor and foam pad). Males reported a higher total number of symptoms [median (interquartile range): 0 (0-2) vs 0 (0-1), P = 0.001] and severity of symptoms [0 (0-3) vs 0 (0-2), P = 0.001] and a lower mean (SD) total SAC [26.0 (2.3) vs 26.4 (2.0), P = 0.026], and orientation [5 (4-5) vs 5 (5-5), P = 0.021]. There was no difference in baseline scores between sex for immediate memory, concentration, delayed recall or BESS total errors. No differences were found for any test domain based on age. Previously, concussed athletes reported a higher total number of symptoms [1 (0-4) vs 0 (0-2), P = 0.001] and symptom severity [2 (0-5) vs 0 (0-2), P = 0.001]. BESS total scores did not differ by concussion history. This study represents the first published normative baseline SCAT3 values in high school athletes. Results varied by sex and history of previous concussion but not by age. The normative baseline values generated from this study will help clinicians better evaluate and interpret SCAT3 results of concussed adolescent athletes.

MUSCLE STRENGTH AND QUALITATIVE JUMP-LANDING DIFFERENCES IN MALE AND FEMALE MILITARY CADETS: THE JUMP-ACL STUDY.

PubMed

Beutler, Ai; de la Motte, Sj; Marshall, Sw; Padua, DA; Boden, Bp

2009-01-01

Recent studies have focused on gender differences in movement patterns as risk factors for ACL injury. Understanding intrinsic and extrinsic factors which contribute to movement patterns is critical to ACL injury prevention efforts. Isometric lower-extremity muscular strength, anthropometrics, and jump-landing technique were analyzed for 2,753 cadets (1,046 female, 1,707 male) from the U.S. Air Force, Military and Naval Academies. Jump-landings were evaluated using the Landing Error Scoring System (LESS), a valid qualitative movement screening tool. We hypothesized that distinct anthropometric factors (Q-angle, navicular drop, bodyweight) and muscle strength would predict poor jump-landing technique in males versus females, and that female cadets would have higher scores (more errors) on a qualitative movement screen (LESS) than males. Mean LESS scores were significantly higher in female (5.34 ± 1.51) versus male (4.65 ± 1.69) cadets (P<.001). Qualitative movement scores were analyzed using factor analyses, yielding five factors, or "patterns", contributing to poor landing technique. Females were significantly more likely to have poor technique due to landing with less hip and knee flexion at initial contact (P<.001), more knee valgus with wider landing stance (P<.001), and less flexion displacement over the entire landing (P<.001). Males were more likely to have poor technique due to landing toe-out (P<.001), with heels first, and with an asymmetric foot landing (P<.001). Many of the identified factor patterns have been previously proposed to contribute to ACL injury risk. However, univariate and multivariate analyses of muscular strength and anthropometric factors did not strongly predict LESS scores for either gender, suggesting that changing an athlete's alignment, BMI, or muscle strength may not directly improve his or her movement patterns.
Empirically Defined Patterns of Executive Function Deficits in Schizophrenia and Their Relation to Everyday Functioning: A Person-Centered Approach

PubMed Central

Iampietro, Mary; Giovannetti, Tania; Drabick, Deborah A. G.; Kessler, Rachel K.

2013-01-01

Executive function (EF) deficits in schizophrenia (SZ) are well documented, although much less is known about patterns of EF deficits and their association to differential impairments in everyday functioning. The present study empirically defined SZ groups based on measures of various EF abilities and then compared these EF groups on everyday action errors. Participants (n=45) completed various subtests from the Delis–Kaplan Executive Function System (D-KEFS) and the Naturalistic Action Test (NAT), a performance-based measure of everyday action that yields scores reflecting total errors and a range of different error types (e.g., omission, perseveration). Results of a latent class analysis revealed three distinct EF groups, characterized by (a) multiple EF deficits, (b) relatively spared EF, and (c) perseverative responding. Follow-up analyses revealed that the classes differed significantly on NAT total errors, total commission errors, and total perseveration errors; the two classes with EF impairment performed comparably on the NAT but performed worse than the class with relatively spared EF. In sum, people with SZ demonstrate variable patterns of EF deficits, and distinct aspects of these EF deficit patterns (i.e., poor mental control abilities) may be associated with everyday functioning capabilities. PMID:23035705
Medication knowledge, certainty, and risk of errors in health care: a cross-sectional study

PubMed Central

2011-01-01

Background Medication errors are often involved in reported adverse events. Drug therapy, prescribed by physicians, is mostly carried out by nurses, who are expected to master all aspects of medication. Research has revealed the need for improved knowledge in drug dose calculation, and medication knowledge as a whole is poorly investigated. The purpose of this survey was to study registered nurses' medication knowledge, certainty and estimated risk of errors, and to explore factors associated with good results. Methods Nurses from hospitals and primary health care establishments were invited to carry out a multiple-choice test in pharmacology, drug management and drug dose calculations (score range 0-14). Self-estimated certainty in each answer was recorded, graded from 0 = very uncertain to 3 = very certain. Background characteristics and sense of coping were recorded. Risk of error was estimated by combining knowledge and certainty scores. The results are presented as mean (±SD). Results Two-hundred and three registered nurses participated (including 16 males), aged 42.0 (9.3) years with a working experience of 12.4 (9.2) years. Knowledge scores in pharmacology, drug management and drug dose calculations were 10.3 (1.6), 7.5 (1.6), and 11.2 (2.0), respectively, and certainty scores were 1.8 (0.4), 1.9 (0.5), and 2.0 (0.6), respectively. Fifteen percent of the total answers showed a high risk of error, with 25% in drug management. Independent factors associated with high medication knowledge were working in hospitals (p < 0.001), postgraduate specialization (p = 0.01) and completion of courses in drug management (p < 0.01). Conclusions Medication knowledge was found to be unsatisfactory among practicing nurses, with a significant risk for medication errors. The study revealed a need to improve the nurses' basic knowledge, especially when referring to drug management. PMID:21791106
Verification of different forecasts of Hungarian Meteorological Service

NASA Astrophysics Data System (ADS)

Feher, B.

2009-09-01

In this paper I show the results of the forecasts made by the Hungarian Meteorological Service. I focus on the general short- and medium-range forecasts, which contains cloudiness, precipitation, wind speed and temperature for six regions of Hungary. I would like to show the results of some special forecasts as well, such as precipitation predictions which are made for the catchment area of Danube and Tisza rivers, and daily mean temperature predictions used by Hungarian energy companies. The product received by the user is made by the general forecaster, but these predictions are based on the ALADIN and ECMWF outputs. Because of these, the product of the forecaster and the models were also verified. Method like this is able to show us, which weather elements are more difficult to forecast or which regions have higher errors. During the verification procedure the basic errors (mean error, mean absolute error) are calculated. Precipitation amount is classified into five categories, and scores like POD, TS, PC,â¦etc. were defined by contingency table determined by these categories. The procedure runs fully automatically, all the things forecasters have to do is to print the daily result each morning. Beside the daily result, verification is also made for longer periods like week, month or year. Analyzing the results of longer periods we can say that the best predictions are made for the first few days, and precipitation forecasts are less good for mountainous areas, even, the scores of the forecasters sometimes are higher than the errors of the models. Since forecaster receive results next day, it can helps him/her to reduce mistakes and learn the weakness of the models. This paper contains the verification scores, their trends, the method by which these scores are calculated, and some case studies on worse forecasts.
Longitudinal Factor Score Estimation Using the Kalman Filter.

ERIC Educational Resources Information Center

Oud, Johan H.; And Others

1990-01-01

How longitudinal factor score estimation--the estimation of the evolution of factor scores for individual examinees over time--can profit from the Kalman filter technique is described. The Kalman estimates change more cautiously over time, have lower estimation error variances, and reproduce the LISREL program latent state correlations more…
The Accuracy of Aggregate Student Growth Percentiles as Indicators of Educator Performance

ERIC Educational Resources Information Center

Castellano, Katherine E.; McCaffrey, Daniel F.

2017-01-01

Mean or median student growth percentiles (MGPs) are a popular measure of educator performance, but they lack rigorous evaluation. This study investigates the error in MGP due to test score measurement error (ME). Using analytic derivations, we find that errors in the commonly used MGP are correlated with average prior latent achievement: Teachers…
Evaluating the usability of speech recognition to create clinical documentation using a commercial electronic health record.

PubMed

Hodgson, Tobias; Magrabi, Farah; Coiera, Enrico

2018-05-01

To conduct a usability study exploring the value of using speech recognition (SR) for clinical documentation tasks within an electronic health record (EHR) system. Thirty-five emergency department clinicians completed a system usability scale (SUS) questionnaire. The study was undertaken after participants undertook randomly allocated clinical documentation tasks using keyboard and mouse (KBM) or SR. SUS scores were analyzed and the results with KBM were compared to SR results. Significant difference in SUS scores between EHR system use with and without SR were observed (KBM 67, SR 61; P = 0.045; CI, 0.1 to 12.0). Nineteen of 35 participants scored higher for EHR with KBM, 11 higher for EHR with SR and 5 gave the same score for both. Factor analysis showed no significant difference in scores for the sub-element of usability (EHR with KBM 65, EHR with SR 62; P = 0.255; CI, -2.6 to 9.5). Scores for the sub-element of learnability were significantly different (KBM 72, SR 55; P < 0.001; CI, 9.8 to 23.5). A significant correlation was found between the perceived usability of the two system configurations (EHR with KBM or SR) and the efficiency of documentation (time to document) (P = 0.002; CI, 10.5 to -0.1) but not with safety (number of errors) (P = 0.90; CI, -2.3 to 2.6). SR was associated with significantly reduced overall usability scores, even though it is often positioned as ease of use technology. SR was perceived to impose larger costs in terms of learnability via training and support requirements for EHR based documentation when compared to using KBM. Lower usability scores were significantly associated with longer documentation times. The usability of EHR systems with any input modality is an area that requires continued development. The addition of an SR component to an EHR system may cause a significant reduction in terms of perceived usability by clinicians. Copyright © 2018 Elsevier B.V. All rights reserved.
A Comparison of the Forecast Skills among Three Numerical Models

NASA Astrophysics Data System (ADS)

Lu, D.; Reddy, S. R.; White, L. J.

2003-12-01

Three numerical weather forecast models, MM5, COAMPS and WRF, operating with a joint effort of NOAA HU-NCAS and Jackson State University (JSU) during summer 2003 have been chosen to study their forecast skills against observations. The models forecast over the same region with the same initialization, boundary condition, forecast length and spatial resolution. AVN global dataset have been ingested as initial conditions. Grib resolution of 27 km is chosen to represent the current mesoscale model. The forecasts with the length of 36h are performed to output the result with 12h interval. The key parameters used to evaluate the forecast skill include 12h accumulated precipitation, sea level pressure, wind, surface temperature and dew point. Precipitation is evaluated statistically using conventional skill scores, Threat Score (TS) and Bias Score (BS), for different threshold values based on 12h rainfall observations whereas other statistical methods such as Mean Error (ME), Mean Absolute Error(MAE) and Root Mean Square Error (RMSE) are applied to other forecast parameters.
SU-E-T-310: Targeting Safety Improvements Through Analysis of Near-Miss Error Detection Points in An Incident Learning Database

DOE Office of Scientific and Technical Information (OSTI.GOV)

Novak, A; Nyflot, M; Sponseller, P

2014-06-01

Purpose: Radiation treatment planning involves a complex workflow that can make safety improvement efforts challenging. This study utilizes an incident reporting system to identify detection points of near-miss errors, in order to guide our departmental safety improvement efforts. Previous studies have examined where errors arise, but not where they are detected or their patterns. Methods: 1377 incidents were analyzed from a departmental nearmiss error reporting system from 3/2012–10/2013. All incidents were prospectively reviewed weekly by a multi-disciplinary team, and assigned a near-miss severity score ranging from 0–4 reflecting potential harm (no harm to critical). A 98-step consensus workflow was usedmore » to determine origination and detection points of near-miss errors, categorized into 7 major steps (patient assessment/orders, simulation, contouring/treatment planning, pre-treatment plan checks, therapist/on-treatment review, post-treatment checks, and equipment issues). Categories were compared using ANOVA. Results: In the 7-step workflow, 23% of near-miss errors were detected within the same step in the workflow, while an additional 37% were detected by the next step in the workflow, and 23% were detected two steps downstream. Errors detected further from origination were more severe (p<.001; Figure 1). The most common source of near-miss errors was treatment planning/contouring, with 476 near misses (35%). Of those 476, only 72(15%) were found before leaving treatment planning, 213(45%) were found at physics plan checks, and 191(40%) were caught at the therapist pre-treatment chart review or on portal imaging. Errors that passed through physics plan checks and were detected by therapists were more severe than other errors originating in contouring/treatment planning (1.81 vs 1.33, p<0.001). Conclusion: Errors caught by radiation treatment therapists tend to be more severe than errors caught earlier in the workflow, highlighting the importance of safety checks in dosimetry and physics. We are utilizing our findings to improve manual and automated checklists for dosimetry and physics.« less
Review of Pre-Analytical Errors in Oral Glucose Tolerance Testing in a Tertiary Care Hospital.

PubMed

Nanda, Rachita; Patel, Suprava; Sahoo, Sibashish; Mohapatra, Eli

2018-03-13

The pre-pre-analytical and pre-analytical phases form a major chunk of the errors in a laboratory. The process has taken into consideration a very common procedure which is the oral glucose tolerance test to identify the pre-pre-analytical errors. Quality indicators provide evidence of quality, support accountability and help in the decision making of laboratory personnel. The aim of this research is to evaluate pre-analytical performance of the oral glucose tolerance test procedure. An observational study that was conducted overa period of three months, in the phlebotomy and accessioning unit of our laboratory using questionnaire that examined the pre-pre-analytical errors through a scoring system. The pre-analytical phase was analyzed for each sample collected as per seven quality indicators. About 25% of the population gave wrong answer with regard to the question that tested the knowledge of patient preparation. The appropriateness of test result QI-1 had the most error. Although QI-5 for sample collection had a low error rate, it is a very important indicator as any wrongly collected sample can alter the test result. Evaluating the pre-analytical and pre-pre-analytical phase is essential and must be conducted routinely on a yearly basis to identify errors and take corrective action and to facilitate their gradual introduction into routine practice.
Error quantification of abnormal extreme high waves in Operational Oceanographic System in Korea

NASA Astrophysics Data System (ADS)

Jeong, Sang-Hun; Kim, Jinah; Heo, Ki-Young; Park, Kwang-Soon

2017-04-01

In winter season, large-height swell-like waves have occurred on the East coast of Korea, causing property damages and loss of human life. It is known that those waves are generated by a local strong wind made by temperate cyclone moving to eastward in the East Sea of Korean peninsula. Because the waves are often occurred in the clear weather, in particular, the damages are to be maximized. Therefore, it is necessary to predict and forecast large-height swell-like waves to prevent and correspond to the coastal damages. In Korea, an operational oceanographic system (KOOS) has been developed by the Korea institute of ocean science and technology (KIOST) and KOOS provides daily basis 72-hours' ocean forecasts such as wind, water elevation, sea currents, water temperature, salinity, and waves which are computed from not only meteorological and hydrodynamic model (WRF, ROMS, MOM, and MOHID) but also wave models (WW-III and SWAN). In order to evaluate the model performance and guarantee a certain level of accuracy of ocean forecasts, a Skill Assessment (SA) system was established as a one of module in KOOS. It has been performed through comparison of model results with in-situ observation data and model errors have been quantified with skill scores. Statistics which are used in skill assessment are including a measure of both errors and correlations such as root-mean-square-error (RMSE), root-mean-square-error percentage (RMSE%), mean bias (MB), correlation coefficient (R), scatter index (SI), circular correlation (CC) and central frequency (CF) that is a frequency with which errors lie within acceptable error criteria. It should be utilized simultaneously not only to quantify an error but also to improve an accuracy of forecasts by providing a feedback interactively. However, in an abnormal phenomena such as high-height swell-like waves in the East coast of Korea, it requires more advanced and optimized error quantification method that allows to predict the abnormal waves well and to improve the accuracy of forecasts by supporting modification of physics and numeric on numerical models through sensitivity test. In this study, we proposed an appropriate method of error quantification especially on abnormal high waves which are occurred by local weather condition. Furthermore, we introduced that how the quantification errors are contributed to improve wind-wave modeling by applying data assimilation and utilizing reanalysis data.
Teamwork and clinical error reporting among nurses in Korean hospitals.

PubMed

Hwang, Jee-In; Ahn, Jeonghoon

2015-03-01

To examine levels of teamwork and its relationships with clinical error reporting among Korean hospital nurses. The study employed a cross-sectional survey design. We distributed a questionnaire to 674 nurses in two teaching hospitals in Korea. The questionnaire included items on teamwork and the reporting of clinical errors. We measured teamwork using the Teamwork Perceptions Questionnaire, which has five subscales including team structure, leadership, situation monitoring, mutual support, and communication. Using logistic regression analysis, we determined the relationships between teamwork and error reporting. The response rate was 85.5%. The mean score of teamwork was 3.5 out of 5. At the subscale level, mutual support was rated highest, while leadership was rated lowest. Of the participating nurses, 522 responded that they had experienced at least one clinical error in the last 6 months. Among those, only 53.0% responded that they always or usually reported clinical errors to their managers and/or the patient safety department. Teamwork was significantly associated with better error reporting. Specifically, nurses with a higher team communication score were more likely to report clinical errors to their managers and the patient safety department (odds ratio = 1.82, 95% confidence intervals [1.05, 3.14]). Teamwork was rated as moderate and was positively associated with nurses' error reporting performance. Hospital executives and nurse managers should make substantial efforts to enhance teamwork, which will contribute to encouraging the reporting of errors and improving patient safety. Copyright © 2015. Published by Elsevier B.V.
Safety climate and its association with office type and team involvement in primary care.

PubMed

Gehring, Katrin; Schwappach, David L B; Battaglia, Markus; Buff, Roman; Huber, Felix; Sauter, Peter; Wieser, Markus

2013-09-01

To assess differences in safety climate perceptions between occupational groups and types of office organization in primary care. Primary care physicians and nurses working in outpatient offices were surveyed about safety climate. Explorative factor analysis was performed to determine the factorial structure. Differences in mean climate scores between staff groups and types of office were tested. Logistic regression analysis was conducted to determine predictors for a 'favorable' safety climate. 630 individuals returned the survey (response rate, 50%). Differences between occupational groups were observed in the means of the 'team-based error prevention'-scale (physician 4.0 vs. nurse 3.8, P < 0.001). Medical centers scored higher compared with single-handed offices and joint practices on the 'team-based error prevention'-scale (4.3 vs. 3.8 vs. 3.9, P < 0.001) but less favorable on the 'rules and risks'-scale (3.5 vs. 3.9 vs. 3.7, P < 0.001). Characteristics on the individual and office level predicted favorable 'team-based error prevention'-scores. Physicians (OR = 0.4, P = 0.01) and less experienced staff (OR 0.52, P = 0.04) were less likely to provide favorable scores. Individuals working at medical centers were more likely to provide positive scores compared with single-handed offices (OR 3.33, P = 0.001). The largest positive effect was associated with at least monthly team meetings (OR 6.2, P < 0.001) and participation in quality circles (OR 4.49, P < 0.001). Results indicate that frequent quality circle participation and team meetings involving all team members are effective ways to strengthen safety climate in terms of team-based strategies and activities in error prevention.
Improving the quality of child anthropometry: Manual anthropometry in the Body Imaging for Nutritional Assessment Study (BINA).

PubMed

Conkle, Joel; Ramakrishnan, Usha; Flores-Ayala, Rafael; Suchdev, Parminder S; Martorell, Reynaldo

2017-01-01

Anthropometric data collected in clinics and surveys are often inaccurate and unreliable due to measurement error. The Body Imaging for Nutritional Assessment Study (BINA) evaluated the ability of 3D imaging to correctly measure stature, head circumference (HC) and arm circumference (MUAC) for children under five years of age. This paper describes the protocol for and the quality of manual anthropometric measurements in BINA, a study conducted in 2016-17 in Atlanta, USA. Quality was evaluated by examining digit preference, biological plausibility of z-scores, z-score standard deviations, and reliability. We calculated z-scores and analyzed plausibility based on the 2006 WHO Child Growth Standards (CGS). For reliability, we calculated intra- and inter-observer Technical Error of Measurement (TEM) and Intraclass Correlation Coefficient (ICC). We found low digit preference; 99.6% of z-scores were biologically plausible, with z-score standard deviations ranging from 0.92 to 1.07. Total TEM was 0.40 for stature, 0.28 for HC, and 0.25 for MUAC in centimeters. ICC ranged from 0.99 to 1.00. The quality of manual measurements in BINA was high and similar to that of the anthropometric data used to develop the WHO CGS. We attributed high quality to vigorous training, motivated and competent field staff, reduction of non-measurement error through the use of technology, and reduction of measurement error through adequate monitoring and supervision. Our anthropometry measurement protocol, which builds on and improves upon the protocol used for the WHO CGS, can be used to improve anthropometric data quality. The discussion illustrates the need to standardize anthropometric data quality assessment, and we conclude that BINA can provide a valuable evaluation of 3D imaging for child anthropometry because there is comparison to gold-standard, manual measurements.
Quantifying usability: an evaluation of a diabetes mHealth system on effectiveness, efficiency, and satisfaction metrics with associated user characteristics.

PubMed

Georgsson, Mattias; Staggers, Nancy

2016-01-01

Mobile health (mHealth) systems are becoming more common for chronic disease management, but usability studies are still needed on patients' perspectives and mHealth interaction performance. This deficiency is addressed by our quantitative usability study of a mHealth diabetes system evaluating patients' task performance, satisfaction, and the relationship of these measures to user characteristics. We used metrics in the International Organization for Standardization (ISO) 9241-11 standard. After standardized training, 10 patients performed representative tasks and were assessed on individual task success, errors, efficiency (time on task), satisfaction (System Usability Scale [SUS]) and user characteristics. Tasks of exporting and correcting values proved the most difficult, had the most errors, the lowest task success rates, and consumed the longest times on task. The average SUS satisfaction score was 80.5, indicating good but not excellent system usability. Data trends showed males were more successful in task completion, and younger participants had higher performance scores. Educational level did not influence performance, but a more recent diabetes diagnosis did. Patients with more experience in information technology (IT) also had higher performance rates. Difficult task performance indicated areas for redesign. Our methods can assist others in identifying areas in need of improvement. Data about user background and IT skills also showed how user characteristics influence performance and can provide future considerations for targeted mHealth designs. Using the ISO 9241-11 usability standard, the SUS instrument for satisfaction and measuring user characteristics provided objective measures of patients' experienced usability. These could serve as an exemplar for standardized, quantitative methods for usability studies on mHealth systems. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Quantifying usability: an evaluation of a diabetes mHealth system on effectiveness, efficiency, and satisfaction metrics with associated user characteristics

PubMed Central

Staggers, Nancy

2016-01-01

Objective Mobile health (mHealth) systems are becoming more common for chronic disease management, but usability studies are still needed on patients’ perspectives and mHealth interaction performance. This deficiency is addressed by our quantitative usability study of a mHealth diabetes system evaluating patients’ task performance, satisfaction, and the relationship of these measures to user characteristics. Materials and Methods We used metrics in the International Organization for Standardization (ISO) 9241-11 standard. After standardized training, 10 patients performed representative tasks and were assessed on individual task success, errors, efficiency (time on task), satisfaction (System Usability Scale [SUS]) and user characteristics. Results Tasks of exporting and correcting values proved the most difficult, had the most errors, the lowest task success rates, and consumed the longest times on task. The average SUS satisfaction score was 80.5, indicating good but not excellent system usability. Data trends showed males were more successful in task completion, and younger participants had higher performance scores. Educational level did not influence performance, but a more recent diabetes diagnosis did. Patients with more experience in information technology (IT) also had higher performance rates. Discussion Difficult task performance indicated areas for redesign. Our methods can assist others in identifying areas in need of improvement. Data about user background and IT skills also showed how user characteristics influence performance and can provide future considerations for targeted mHealth designs. Conclusion Using the ISO 9241-11 usability standard, the SUS instrument for satisfaction and measuring user characteristics provided objective measures of patients’ experienced usability. These could serve as an exemplar for standardized, quantitative methods for usability studies on mHealth systems. PMID:26377990
Interrater and Test-Retest Reliability and Minimal Detectable Change of the Balance Evaluation Systems Test (BESTest) and Subsystems With Community-Dwelling Older Adults.

PubMed

Wang-Hsu, Elizabeth; Smith, Susan S

2017-01-10

Falls are a common cause of injuries and hospital admissions in older adults. Balance limitation is a potentially modifiable factor contributing to falls. The Balance Evaluation Systems Test (BESTest), a clinical balance measure, categorizes balance into 6 underlying subsystems. Each of the subsystems is scored individually and summed to obtain a total score. The reliability of the BESTest and its individual subsystems has been reported in patients with various neurological disorders and cancer survivors. However, the reliability and minimal detectable change (MDC) of the BESTest with community-dwelling older adults have not been reported. The purposes of our study were to (1) determine the interrater and test-retest reliability of the BESTest total and subsystem scores; and (2) estimate the MDC of the BESTest and its individual subsystem scores with community-dwelling older adults. We used a prospective cohort methodological design. Community-dwelling older adults (N = 70; aged 70-94 years; mean = 85.0 [5.5] years) were recruited from a senior independent living community. Trained testers (N = 3) administered the BESTest. All participants were tested with the BESTest by the same tester initially and then retested 7 to 14 days later. With 32 of the participants, a second tester concurrently scored the retest for interrater reliability. Testers were blinded to each other's scores. Intraclass correlation coefficients [ICC(2,1)] were used to determine the interrater and test-retest reliability. Test-retest reliability was also analyzed using method error and the associated coefficients of variation (CVME). MDC was calculated using standard error of measurement. Interrater reliability (N = 32) of the BESTest total score was ICC(2, 1) = 0.97 (95% confidence interval [CI], 0.94-0.99). The ICCs for the individual subsystem scores ranged from 0.85 to 0.94. Test-retest reliability (N = 70) of the BESTest total score was ICC(2,1) = 0.93 (95% CI, 0.89-0.96). ICCs for the individual subsystem scores ranged from 0.72 to 0.89. The CVME (N = 70) of the BESTest total score was 4.1%. The CVME for the subsystem scores ranged from 5.0% to 10.7%. MDC (N = 70) for the BESTest total score at the 95% CI was 7.6%, or 8.2 points. MDC at the 95% CI for subsystem scores ranged from 11.7% to 19.0% (2.1-3.4 points). Results demonstrated generally good to excellent interrater and test-retest reliability in both the BESTest total and subsystem scores with community-dwelling older adults. The BESTest total and individual subsystem scores demonstrate good to excellent interrater and test-retest reliability with community-dwelling older adults. A change of 7.6% (8.2 points) or more in the BESTest total and a percentage change ranged from 11.7% to 19.0% (2.1-3.4 points) in the subsystem scores are suggested for clinicians to be 95% confident of true change when evaluating change in this population.
Comparative study of anatomical normalization errors in SPM and 3D-SSP using digital brain phantom.

PubMed

Onishi, Hideo; Matsutake, Yuki; Kawashima, Hiroki; Matsutomo, Norikazu; Amijima, Hizuru

2011-01-01

In single photon emission computed tomography (SPECT) cerebral blood flow studies, two major algorithms are widely used statistical parametric mapping (SPM) and three-dimensional stereotactic surface projections (3D-SSP). The aim of this study is to compare an SPM algorithm-based easy Z score imaging system (eZIS) and a 3D-SSP system in the errors of anatomical standardization using 3D-digital brain phantom images. We developed a 3D-brain digital phantom based on MR images to simulate the effects of head tilt, perfusion defective region size, and count value reduction rate on the SPECT images. This digital phantom was used to compare the errors of anatomical standardization by the eZIS and the 3D-SSP algorithms. While the eZIS allowed accurate standardization of the images of the phantom simulating a head in rotation, lateroflexion, anteflexion, or retroflexion without angle dependency, the standardization by 3D-SSP was not accurate enough at approximately 25° or more head tilt. When the simulated head contained perfusion defective regions, one of the 3D-SSP images showed an error of 6.9% from the true value. Meanwhile, one of the eZIS images showed an error as large as 63.4%, revealing a significant underestimation. When required to evaluate regions with decreased perfusion due to such causes as hemodynamic cerebral ischemia, the 3D-SSP is desirable. In a statistical image analysis, we must reconfirm the image after anatomical standardization by all means.
Synthetic Spike-in Standards Improve Run-Specific Systematic Error Analysis for DNA and RNA Sequencing

PubMed Central

Zook, Justin M.; Samarov, Daniel; McDaniel, Jennifer; Sen, Shurjo K.; Salit, Marc

2012-01-01

While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a data set used to calculate association of SSEs with various features in the reads and sequence context. This data set is typically either from a part of the data set being “recalibrated” (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 Phred-scaled quality score units, and by as much as 13 units at CpG sites. In addition, since the spike-in data used for recalibration are independent of the genome being sequenced, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration. PMID:22859977
Simultaneous treatment of unspecified heteroskedastic model error distribution and mismeasured covariates for restricted moment models.

PubMed

Garcia, Tanya P; Ma, Yanyuan

2017-10-01

We develop consistent and efficient estimation of parameters in general regression models with mismeasured covariates. We assume the model error and covariate distributions are unspecified, and the measurement error distribution is a general parametric distribution with unknown variance-covariance. We construct root- n consistent, asymptotically normal and locally efficient estimators using the semiparametric efficient score. We do not estimate any unknown distribution or model error heteroskedasticity. Instead, we form the estimator under possibly incorrect working distribution models for the model error, error-prone covariate, or both. Empirical results demonstrate robustness to different incorrect working models in homoscedastic and heteroskedastic models with error-prone covariates.

Is the Speech Transmission Index (STI) a robust measure of sound system speech intelligibility performance?

NASA Astrophysics Data System (ADS)

Mapp, Peter

2002-11-01

Although RaSTI is a good indicator of the speech intelligibility capability of auditoria and similar spaces, during the past 2-3 years it has been shown that RaSTI is not a robust predictor of sound system intelligibility performance. Instead, it is now recommended, within both national and international codes and standards, that full STI measurement and analysis be employed. However, new research is reported, that indicates that STI is not as flawless, nor robust as many believe. The paper highlights a number of potential error mechanisms. It is shown that the measurement technique and signal excitation stimulus can have a significant effect on the overall result and accuracy, particularly where DSP-based equipment is employed. It is also shown that in its current state of development, STI is not capable of appropriately accounting for a number of fundamental speech and system attributes, including typical sound system frequency response variations and anomalies. This is particularly shown to be the case when a system is operating under reverberant conditions. Comparisons between actual system measurements and corresponding word score data are reported where errors of up to 50 implications for VA and PA system performance verification will be discussed.
SU-E-T-325: The New Evaluation Method of the VMAT Plan Delivery Using Varian DynaLog Files and Modulation Complexity Score (MCS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tateoka, K; Graduate School of Medicine, Sapporo Medical University, Sapporo, JP; Fujimomo, K

2014-06-01

Purpose: The aim of the study is to evaluate the use of Varian DynaLog files to verify VMAT plans delivery and modulation complexity score (MCS) of VMAT. Methods: Delivery accuracy of machine performance was quantified by multileaf collimator (MLC) position errors, gantry angle errors and fluence delivery accuracy for volumetric modulated arc therapy (VMAT). The relationship between machine performance and plan complexity were also investigated using the modulation complexity score (MCS). Plan and Actual MLC positions, gantry angles and delivered fraction of monitor units were extracted from Varian DynaLog files. These factors were taken from the record and verify systemmore » of MLC control file. Planned and delivered beam data were compared to determine leaf position errors and gantry angle errors. Analysis was also performed on planned and actual fluence maps reconstructed from those of the DynaLog files. This analysis was performed for all treatment fractions of 5 prostate VMAT plans. The analysis of DynaLog files have been carried out by in-house programming in Visual C++. Results: The root mean square of leaf position and gantry angle errors were about 0.12 and 0.15, respectively. The Gamma of planned and actual fluence maps at 3%/3 mm criterion was about 99.21. The gamma of the leaf position errors were not directly related to plan complexity as determined by the MCS. Therefore, the gamma of the gantry angle errors were directly related to plan complexity as determined by the MCS. Conclusion: This study shows Varian dynalog files for VMAT plan can be diagnosed delivery errors not possible with phantom based quality assurance. Furthermore, the MCS of VMAT plan can evaluate delivery accuracy for patients receiving of VMAT. Machine performance was found to be directly related to plan complexity but this is not the dominant determinant of delivery accuracy.« less
Psychometric Properties of Raw and Scale Scores on Mixed-Format Tests

ERIC Educational Resources Information Center

Kolen, Michael J.; Lee, Won-Chan

2011-01-01

This paper illustrates that the psychometric properties of scores and scales that are used with mixed-format educational tests can impact the use and interpretation of the scores that are reported to examinees. Psychometric properties that include reliability and conditional standard errors of measurement are considered in this paper. The focus is…
A General Approach for Estimating Scale Score Reliability for Panel Survey Data

ERIC Educational Resources Information Center

Biemer, Paul P.; Christ, Sharon L.; Wiesen, Christopher A.

2009-01-01

Scale score measures are ubiquitous in the psychological literature and can be used as both dependent and independent variables in data analysis. Poor reliability of scale score measures leads to inflated standard errors and/or biased estimates, particularly in multivariate analysis. Reliability estimation is usually an integral step to assess…
The Landing Error Scoring System as a Screening Tool for an Anterior Cruciate Ligament Injury–Prevention Program in Elite-Youth Soccer Athletes

PubMed Central

Padua, Darin A.; DiStefano, Lindsay J.; Beutler, Anthony I.; de la Motte, Sarah J.; DiStefano, Michael J.; Marshall, Steven W.

2015-01-01

Context Identifying neuromuscular screening factors for anterior cruciate ligament (ACL) injury is a critical step toward large-scale deployment of effective ACL injury-prevention programs. The Landing Error Scoring System (LESS) is a valid and reliable clinical assessment of jump-landing biomechanics. Objective To investigate the ability of the LESS to identify individuals at risk for ACL injury in an elite-youth soccer population. Design Cohort study. Setting Field-based functional movement screening performed at soccer practice facilities. Patients or Other Participants A total of 829 elite-youth soccer athletes (348 boys, 481 girls; age = 13.9 ± 1.8 years, age range = 11 to 18 years), of whom 25% (n = 207) were less than 13 years of age. Intervention(s) Baseline preseason testing for all participants consisted of a jump-landing task (3 trials). Participants were followed prospectively throughout their soccer seasons for diagnosis of ACL injuries (1217 athlete-seasons of follow-up). Main Outcome Measure(s) Landings were scored for “errors” in technique using the LESS. We used receiver operator characteristic curves to determine a cutpoint on the LESS. Sensitivity and specificity of the LESS in predicting ACL injury were assessed. Results Seven participants sustained ACL injuries during the follow-up period; the mechanism of injury was noncontact or indirect contact for all injuries. Uninjured participants had lower LESS scores (4.43 ± 1.71) than injured participants (6.24 ± 1.75; t1215 = −2.784, P = .005). The receiver operator characteristic curve analyses suggested that 5 was the optimal cutpoint for the LESS, generating a sensitivity of 86% and a specificity of 64%. Conclusions Despite sample-size limitations, the LESS showed potential as a screening tool to determine ACL injury risk in elite-youth soccer athletes. PMID:25811846
Multimodal impairment-based physical therapy for the treatment of patients with post-concussion syndrome: A retrospective analysis on safety and feasibility.

PubMed

Grabowski, Patrick; Wilson, John; Walker, Alyssa; Enz, Dan; Wang, Sijian

2017-01-01

Demonstrate implementation, safety and feasibility of multimodal, impairment-based physical therapy (PT) combining vestibular/oculomotor and cervical rehabilitation with sub-symptom threshold exercise for the treatment of patients with post-concussion syndrome (PCS). University hospital outpatient sports medicine facility. Twenty-five patients (12-20 years old) meeting World Health Organization criteria for PCS following sport-related concussion referred for supervised PT consisting of sub-symptom cardiovascular exercise, vestibular/oculomotor and cervical spine rehabilitation. Retrospective cohort. Post-Concussion Symptom Scale (PCSS) total score, maximum symptom-free heart rate (SFHR) during graded exercise testing (GXT), GXT duration, balance error scoring system (BESS) score, and number of adverse events. Patients demonstrated a statistically significant decreasing trend (p < 0.01) for total PCSS scores (pre-PT M = 18.2 (SD = 14.2), post-PT M = 9.1 (SD = 10.8), n = 25). Maximum SFHR achieved on GXT increased 23% (p < 0.01, n = 14), and BESS errors decreased 52% (p < 0.01, n = 13). Two patients reported mild symptom exacerbation with aerobic exercise at home, attenuated by adjustment of the home exercise program. Multimodal, impairment-based PT is safe and associated with diminishing PCS symptoms. This establishes feasibility for future clinical trials to determine viable treatment approaches to reduce symptoms and improve function while avoiding negative repercussions of physical inactivity and premature return to full activity. Copyright © 2016 Elsevier Ltd. All rights reserved.
Estimating genotype error rates from high-coverage next-generation sequence data.

PubMed

Wall, Jeffrey D; Tang, Ling Fung; Zerbe, Brandon; Kvale, Mark N; Kwok, Pui-Yan; Schaefer, Catherine; Risch, Neil

2014-11-01

Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on the platform and the depth of coverage. Additionally, we found (1) no difference in the error profiles or rates between blood and saliva samples; (2) Complete Genomics sequences had substantially higher error rates than Illumina sequences had; (3) error rates were higher (up to 6%) for rare or unique variants; (4) error rates generally declined with genotype quality (GQ) score, but in a nonlinear fashion for the Illumina data, likely due to loss of specificity of GQ scores greater than 60; and (5) error rates increased with increasing depth of coverage for the Illumina data. These findings, especially (3)-(5), suggest that caution should be taken in interpreting the results of next-generation sequencing-based association studies, and even more so in clinical application of this technology in the absence of validation by other more robust sequencing or genotyping methods. © 2014 Wall et al.; Published by Cold Spring Harbor Laboratory Press.
Patient safety awareness among Undergraduate Medical Students in Pakistani Medical School.

PubMed

Kamran, Rizwana; Bari, Attia; Khan, Rehan Ahmed; Al-Eraky, Mohamed

2018-01-01

To measure the level of awareness of patient safety among undergraduate medical students in Pakistani Medical School and to find the difference with respect to gender and prior experience with medical error. This cross-sectional study was conducted at the University of Lahore (UOL), Pakistan from January to March 2017, and comprised final year medical students. Data was collected using a questionnaire 'APSQ- III' on 7 point Likert scale. Eight questions were reverse coded. Survey was anonymous. SPSS package 20 was used for statistical analysis. Questionnaire was filled by 122 students, with 81% response rate. The best score 6.17 was given for the 'team functioning', followed by 6.04 for 'long working hours as a cause of medical error'. The domains regarding involvement of patient, confidence to report medical errors and role of training and learning on patient safety scored high in the agreed range of >5. Reverse coded questions about 'professional incompetence as an error cause' and 'disclosure of errors' showed negative perception. No significant differences of perceptions were found with respect to gender and prior experience with medical error (p= >0.05). Undergraduate medical students at UOL had a positive attitude towards patient safety. However, there were misconceptions about causes of medical errors and error disclosure among students and patient safety education needs to be incorporated in medical curriculum of Pakistan.
Application Bayesian Model Averaging method for ensemble system for Poland

NASA Astrophysics Data System (ADS)

Guzikowski, Jakub; Czerwinska, Agnieszka

2014-05-01

The aim of the project is to evaluate methods for generating numerical ensemble weather prediction using a meteorological data from The Weather Research & Forecasting Model and calibrating this data by means of Bayesian Model Averaging (WRF BMA) approach. We are constructing height resolution short range ensemble forecasts using meteorological data (temperature) generated by nine WRF's models. WRF models have 35 vertical levels and 2.5 km x 2.5 km horizontal resolution. The main emphasis is that the used ensemble members has a different parameterization of the physical phenomena occurring in the boundary layer. To calibrate an ensemble forecast we use Bayesian Model Averaging (BMA) approach. The BMA predictive Probability Density Function (PDF) is a weighted average of predictive PDFs associated with each individual ensemble member, with weights that reflect the member's relative skill. For test we chose a case with heat wave and convective weather conditions in Poland area from 23th July to 1st August 2013. From 23th July to 29th July 2013 temperature oscillated below or above 30 Celsius degree in many meteorology stations and new temperature records were added. During this time the growth of the hospitalized patients with cardiovascular system problems was registered. On 29th July 2013 an advection of moist tropical air masses was recorded in the area of Poland causes strong convection event with mesoscale convection system (MCS). MCS caused local flooding, damage to the transport infrastructure, destroyed buildings, trees and injuries and direct threat of life. Comparison of the meteorological data from ensemble system with the data recorded on 74 weather stations localized in Poland is made. We prepare a set of the model - observations pairs. Then, the obtained data from single ensemble members and median from WRF BMA system are evaluated on the basis of the deterministic statistical error Root Mean Square Error (RMSE), Mean Absolute Error (MAE). To evaluation probabilistic data The Brier Score (BS) and Continuous Ranked Probability Score (CRPS) were used. Finally comparison between BMA calibrated data and data from ensemble members will be displayed.
Measurement properties and usability of non-contact scanners for measuring transtibial residual limb volume.

PubMed

Kofman, Rianne; Beekman, Anna M; Emmelot, Cornelis H; Geertzen, Jan H B; Dijkstra, Pieter U

2018-06-01

Non-contact scanners may have potential for measurement of residual limb volume. Different non-contact scanners have been introduced during the last decades. Reliability and usability (practicality and user friendliness) should be assessed before introducing these systems in clinical practice. The aim of this study was to analyze the measurement properties and usability of four non-contact scanners (TT Design, Omega Scanner, BioSculptor Bioscanner, and Rodin4D Scanner). Quasi experimental. Nine (geometric and residual limb) models were measured on two occasions, each consisting of two sessions, thus in total 4 sessions. In each session, four observers used the four systems for volume measurement. Mean for each model, repeatability coefficients for each system, variance components, and their two-way interactions of measurement conditions were calculated. User satisfaction was evaluated with the Post-Study System Usability Questionnaire. Systematic differences between the systems were found in volume measurements. Most of the variances were explained by the model (97%), while error variance was 3%. Measurement system and the interaction between system and model explained 44% of the error variance. Repeatability coefficient of the systems ranged from 0.101 (Omega Scanner) to 0.131 L (Rodin4D). Differences in Post-Study System Usability Questionnaire scores between the systems were small and not significant. The systems were reliable in determining residual limb volume. Measurement systems and the interaction between system and residual limb model explained most of the error variances. The differences in repeatability coefficient and usability between the four CAD/CAM systems were small. Clinical relevance If accurate measurements of residual limb volume are required (in case of research), modern non-contact scanners should be taken in consideration nowadays.
Development of a partial least squares-artificial neural network (PLS-ANN) hybrid model for the prediction of consumer liking scores of ready-to-drink green tea beverages.

PubMed

Yu, Peigen; Low, Mei Yin; Zhou, Weibiao

2018-01-01

In order to develop products that would be preferred by consumers, the effects of the chemical compositions of ready-to-drink green tea beverages on consumer liking were studied through regression analyses. Green tea model systems were prepared by dosing solutions of 0.1% green tea extract with differing concentrations of eight flavour keys deemed to be important for green tea aroma and taste, based on a D-optimal experimental design, before undergoing commercial sterilisation. Sensory evaluation of the green tea model system was carried out using an untrained consumer panel to obtain hedonic liking scores of the samples. Regression models were subsequently trained to objectively predict the consumer liking scores of the green tea model systems. A linear partial least squares (PLS) regression model was developed to describe the effects of the eight flavour keys on consumer liking, with a coefficient of determination (R 2 ) of 0.733, and a root-mean-square error (RMSE) of 3.53%. The PLS model was further augmented with an artificial neural network (ANN) to establish a PLS-ANN hybrid model. The established hybrid model was found to give a better prediction of consumer liking scores, based on its R 2 (0.875) and RMSE (2.41%). Copyright © 2017 Elsevier Ltd. All rights reserved.
A five-step procedure for the clinical use of the MPD in neuropsychological assessment of children.

PubMed

Wallbrown, F H; Fuller, G B

1984-01-01

Described a five-step procedure that can be used to detect organicity on the basis of children's performance on the Minnesota Percepto Diagnostic Test (MPD). The first step consists of examining the T score for rotations to determine whether it is below the cut-off score, which has been established empirically as an indicator of organicity. The second step consists of matching the examinee's configuration of error scores, separation of circle-diamond (SpCD), distortion of circle-diamond (DCD), and distortion of dots (DD), with empirically derived tables. The third step consists of considering the T score for rotations and error configuration jointly. The fourth step consists of using empirically established discriminant equations, and the fifth step involves using data from limits testing and other data sources. The clinical and empirical bases for the five-step procedure also are discussed.
Identification of priorities for medication safety in neonatal intensive care.

PubMed

Kunac, Desireé L; Reith, David M

2005-01-01

Although neonates are reported to be at greater risk of medication error than infants and older children, little is known about the causes and characteristics of error in this patient group. Failure mode and effects analysis (FMEA) is a technique used in industry to evaluate system safety and identify potential hazards in advance. The aim of this study was to identify and prioritize potential failures in the neonatal intensive care unit (NICU) medication use process through application of FMEA. Using the FMEA framework and a systems-based approach, an eight-member multidisciplinary panel worked as a team to create a flow diagram of the neonatal unit medication use process. Then by brainstorming, the panel identified all potential failures, their causes and their effects at each step in the process. Each panel member independently rated failures based on occurrence, severity and likelihood of detection to allow calculation of a risk priority score (RPS). The panel identified 72 failures, with 193 associated causes and effects. Vulnerabilities were found to be distributed across the entire process, but multiple failures and associated causes were possible when prescribing the medication and when preparing the drug for administration. The top ranking issue was a perceived lack of awareness of medication safety issues (RPS score 273), due to a lack of medication safety training. The next highest ranking issues were found to occur at the administration stage. Common potential failures related to errors in the dose, timing of administration, infusion pump settings and route of administration. Perceived causes were multiple, but were largely associated with unsafe systems for medication preparation and storage in the unit, variable staff skill level and lack of computerised technology. Interventions to decrease medication-related adverse events in the NICU should aim to increase staff awareness of medication safety issues and focus on medication administration processes.
The development of an automatic recognition system for earmark and earprint comparisons.

PubMed

Junod, Stéphane; Pasquier, Julien; Champod, Christophe

2012-10-10

The value of earmarks as an efficient means of personal identification is still subject to debate. It has been argued that the field is lacking a firm systematic and structured data basis to help practitioners to form their conclusions. Typically, there is a paucity of research guiding as to the selectivity of the features used in the comparison process between an earmark and reference earprints taken from an individual. This study proposes a system for the automatic comparison of earprints and earmarks, operating without any manual extraction of key-points or manual annotations. For each donor, a model is created using multiple reference prints, hence capturing the donor within source variability. For each comparison between a mark and a model, images are automatically aligned and a proximity score, based on a normalized 2D correlation coefficient, is calculated. Appropriate use of this score allows deriving a likelihood ratio that can be explored under known state of affairs (both in cases where it is known that the mark has been left by the donor that gave the model and conversely in cases when it is established that the mark originates from a different source). To assess the system performance, a first dataset containing 1229 donors elaborated during the FearID research project was used. Based on these data, for mark-to-print comparisons, the system performed with an equal error rate (EER) of 2.3% and about 88% of marks are found in the first 3 positions of a hitlist. When performing print-to-print transactions, results show an equal error rate of 0.5%. The system was then tested using real-case data obtained from police forces. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
[Attention system functions and their relationship with self-reported health in patients with brain damage due to tumor].

PubMed

Egorov, V N; Razumnikova, O M; Perfil'ev, A M; Stupak, V V

2015-01-01

To compare parameters of attention in healthy people and patients with neoplasms in different regions of the cerebral cortex and to evaluate quality of life (QoL) indices with regard to impairment of different attention systems. Twenty patients with oncological lesions of the brain (mean age 56.5±8.8 years) who did not undergo surgery were studied. Tumor localization was confirmed using contrast-enhanced computed tomography, the tumor type was histologically verified. A control group included 18 healthy people matched for age, sex and education level. To determine attention system functions, we developed a computed version of the Attention Network Test. Error rate and reaction time for correct responses to the target stimulus, displayed along with neutral, congruent and incongruent signals, were the indicators of the efficacy of selective processes. QoL indices were assessed using SF-36 health survey questionnaire. The readiness to respond to incoming stimuli was mostly impaired in patients with brain tumors. Efficacy of executive attention, assessed as the increase in the number of errors in selection of visual stimuli, was decreased while temporary parameters of the functions of this system were not changed in patients compared to controls. The SF-36 total score was stable in patients with marked reduction in scores on the Role and Emotional Functioning scales. The most severe health impairment measured on the SF-36 scales of role/social emotional functioning and viability was recorded in patients with the lesions of frontal cortical areas compared to temporal/parietal areas. The relationship between SF-36 Health self-rating and attention systems was found. This finding puts the question of the importance of attention characteristics and QoL for survival prognosis of patients with brain tumors.
Evaluation of voice codecs for the Australian mobile satellite system

NASA Technical Reports Server (NTRS)

Bundrock, Tony; Wilkinson, Mal

1990-01-01

The evaluation procedure to choose a low bit rate voice coding algorithm is described for the Australian land mobile satellite system. The procedure is designed to assess both the inherent quality of the codec under 'normal' conditions and its robustness under 'severe' conditions. For the assessment, normal conditions were chosen to be random bit error rate with added background acoustic noise and the severe condition is designed to represent burst error conditions when mobile satellite channel suffers from signal fading due to roadside vegetation. The assessment is divided into two phases. First, a reduced set of conditions is used to determine a short list of candidate codecs for more extensive testing in the second phase. The first phase conditions include quality and robustness and codecs are ranked with a 60:40 weighting on the two. Second, the short listed codecs are assessed over a range of input voice levels, BERs, background noise conditions, and burst error distributions. Assessment is by subjective rating on a five level opinion scale and all results are then used to derive a weighted Mean Opinion Score using appropriate weights for each of the test conditions.
An embedded checklist in the Anesthesia Information Management System improves pre-anaesthetic induction setup: a randomised controlled trial in a simulation setting.

PubMed

Wetmore, Douglas; Goldberg, Andrew; Gandhi, Nishant; Spivack, John; McCormick, Patrick; DeMaria, Samuel

2016-10-01

Anaesthesiologists work in a high stress, high consequence environment in which missed steps in preparation may lead to medical errors and potential patient harm. The pre-anaesthetic induction period has been identified as a time in which medical errors can occur. The Anesthesia Patient Safety Foundation has developed a Pre-Anesthetic Induction Patient Safety (PIPS) checklist. We conducted this study to test the effectiveness of this checklist, when embedded in our institutional Anesthesia Information Management System (AIMS), on resident performance in a simulated environment. Using a randomised, controlled, observer-blinded design, we compared performance of anaesthesiology residents in a simulated operating room under production pressure using a checklist in completing a thorough pre-anaesthetic induction evaluation and setup with that of residents with no checklist. The checklist was embedded in the simulated operating room's electronic medical record. Data for 38 anaesthesiology residents shows a statistically significant difference in performance in pre-anaesthetic setup and evaluation as scored by blinded raters (maximum score 22 points), with the checklist group performing better by 7.8 points (p<0.01). The effects of gender and year of residency on total score were not significant. Simulation duration (time to anaesthetic agent administration) was increased significantly by the use of the checklist. Required use of a pre-induction checklist improves anaesthesiology resident performance in a simulated environment. The PIPS checklist as an integrated part of a departmental AIMS warrant further investigation as a quality measure. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Assessment and Rehabilitation of Central Sensory Impairments for Balance in mTBI

DTIC Science & Technology

2016-10-01

place; 95% complete. ● Purchasing and testing software of Opals ; awaiting release of newer, updated sensor from APDM to determine need for more sensors...2016. ● Develop new algorithm to automatically quantify head movements from Opal sensor; 100% complete 23-Sep-2016. ● Set up and test gait paradigm...Interaction in Balance (mCTSIB), Modified Balance Error Scoring System (mBESS) and walking tests, subjects wear five Opal inertial sensors (APDM, Inc
Attentional Control and Subjective Executive Function in Treatment-Naive Adults with Attention Deficit Hyperactivity Disorder

PubMed Central

Grane, Venke Arntsberg; Endestad, Tor; Pinto, Arnfrid Farbu; Solbakk, Anne-Kristin

2014-01-01

We investigated performance-derived measures of executive control, and their relationship with self- and informant reported executive functions in everyday life, in treatment-naive adults with newly diagnosed Attention Deficit Hyperactivity Disorder (ADHD; n = 36) and in healthy controls (n = 35). Sustained attentional control and response inhibition were examined with the Test of Variables of Attention (T.O.V.A.). Delayed responses, increased reaction time variability, and higher omission error rate to Go signals in ADHD patients relative to controls indicated fluctuating levels of attention in the patients. Furthermore, an increment in NoGo commission errors when Go stimuli increased relative to NoGo stimuli suggests reduced inhibition of task-irrelevant stimuli in conditions demanding frequent responding. The ADHD group reported significantly more cognitive and behavioral executive problems than the control group on the Behavior Rating Inventory of Executive Function-Adult Version (BRIEF-A). There were overall not strong associations between task performance and ratings of everyday executive function. However, for the ADHD group, T.O.V.A. omission errors predicted self-reported difficulties on the Organization of Materials scale, and commission errors predicted informant reported difficulties on the same scale. Although ADHD patients endorsed more symptoms of depression and anxiety on the Achenbach System of Empirically Based Assessment (ASEBA) than controls, ASEBA scores were not significantly associated with T.O.V.A. performance scores. Altogether, the results indicate multifaceted alteration of attentional control in adult ADHD, and accompanying subjective difficulties with several aspects of executive function in everyday living. The relationships between the two sets of data were modest, indicating that the measures represent non-redundant features of adult ADHD. PMID:25545156
Attentional control and subjective executive function in treatment-naive adults with Attention Deficit Hyperactivity Disorder.

PubMed

Grane, Venke Arntsberg; Endestad, Tor; Pinto, Arnfrid Farbu; Solbakk, Anne-Kristin

2014-01-01

We investigated performance-derived measures of executive control, and their relationship with self- and informant reported executive functions in everyday life, in treatment-naive adults with newly diagnosed Attention Deficit Hyperactivity Disorder (ADHD; n = 36) and in healthy controls (n = 35). Sustained attentional control and response inhibition were examined with the Test of Variables of Attention (T.O.V.A.). Delayed responses, increased reaction time variability, and higher omission error rate to Go signals in ADHD patients relative to controls indicated fluctuating levels of attention in the patients. Furthermore, an increment in NoGo commission errors when Go stimuli increased relative to NoGo stimuli suggests reduced inhibition of task-irrelevant stimuli in conditions demanding frequent responding. The ADHD group reported significantly more cognitive and behavioral executive problems than the control group on the Behavior Rating Inventory of Executive Function-Adult Version (BRIEF-A). There were overall not strong associations between task performance and ratings of everyday executive function. However, for the ADHD group, T.O.V.A. omission errors predicted self-reported difficulties on the Organization of Materials scale, and commission errors predicted informant reported difficulties on the same scale. Although ADHD patients endorsed more symptoms of depression and anxiety on the Achenbach System of Empirically Based Assessment (ASEBA) than controls, ASEBA scores were not significantly associated with T.O.V.A. performance scores. Altogether, the results indicate multifaceted alteration of attentional control in adult ADHD, and accompanying subjective difficulties with several aspects of executive function in everyday living. The relationships between the two sets of data were modest, indicating that the measures represent non-redundant features of adult ADHD.

Parametric decadal climate forecast recalibration (DeFoReSt 1.0)

NASA Astrophysics Data System (ADS)

Pasternack, Alexander; Bhend, Jonas; Liniger, Mark A.; Rust, Henning W.; Müller, Wolfgang A.; Ulbrich, Uwe

2018-01-01

Near-term climate predictions such as decadal climate forecasts are increasingly being used to guide adaptation measures. For near-term probabilistic predictions to be useful, systematic errors of the forecasting systems have to be corrected. While methods for the calibration of probabilistic forecasts are readily available, these have to be adapted to the specifics of decadal climate forecasts including the long time horizon of decadal climate forecasts, lead-time-dependent systematic errors (drift) and the errors in the representation of long-term changes and variability. These features are compounded by small ensemble sizes to describe forecast uncertainty and a relatively short period for which typically pairs of reforecasts and observations are available to estimate calibration parameters. We introduce the Decadal Climate Forecast Recalibration Strategy (DeFoReSt), a parametric approach to recalibrate decadal ensemble forecasts that takes the above specifics into account. DeFoReSt optimizes forecast quality as measured by the continuous ranked probability score (CRPS). Using a toy model to generate synthetic forecast observation pairs, we demonstrate the positive effect on forecast quality in situations with pronounced and limited predictability. Finally, we apply DeFoReSt to decadal surface temperature forecasts from the MiKlip prototype system and find consistent, and sometimes considerable, improvements in forecast quality compared with a simple calibration of the lead-time-dependent systematic errors.
The Gulliver Effect: The Impact of Error in an Elephantine Subpopulation on Estimates for Lilliputian Subpopulations

ERIC Educational Resources Information Center

Micceri, Theodore; Parasher, Pradnya; Waugh, Gordon W.; Herreid, Charlene

2009-01-01

An extensive review of the research literature and a study comparing over 36,000 survey responses with archival true scores indicated that one should expect a minimum of at least three percent random error for the least ambiguous of self-report measures. The Gulliver Effect occurs when a small proportion of error in a sizable subpopulation exerts…
SU-C-BRD-02: A Team Focused Clinical Implementation and Failure Mode and Effects Analysis of HDR Skin Brachytherapy Using Valencia and Leipzig Surface Applicators

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sayler, E; Harrison, A; Eldredge-Hindy, H

Purpose: and Leipzig applicators (VLAs) are single-channel brachytherapy surface applicators used to treat skin lesions up to 2cm diameter. Source dwell times can be calculated and entered manually after clinical set-up or ultrasound. This procedure differs dramatically from CT-based planning; the novelty and unfamiliarity could lead to severe errors. To build layers of safety and ensure quality, a multidisciplinary team created a protocol and applied Failure Modes and Effects Analysis (FMEA) to the clinical procedure for HDR VLA skin treatments. Methods: team including physicists, physicians, nurses, therapists, residents, and administration developed a clinical procedure for VLA treatment. The procedure wasmore » evaluated using FMEA. Failure modes were identified and scored by severity, occurrence, and detection. The clinical procedure was revised to address high-scoring process nodes. Results: Several key components were added to the clinical procedure to minimize risk probability numbers (RPN): -Treatments are reviewed at weekly QA rounds, where physicians discuss diagnosis, prescription, applicator selection, and set-up. Peer review reduces the likelihood of an inappropriate treatment regime. -A template for HDR skin treatments was established in the clinical EMR system to standardize treatment instructions. This reduces the chances of miscommunication between the physician and planning physicist, and increases the detectability of an error during the physics second check. -A screen check was implemented during the second check to increase detectability of an error. -To reduce error probability, the treatment plan worksheet was designed to display plan parameters in a format visually similar to the treatment console display. This facilitates data entry and verification. -VLAs are color-coded and labeled to match the EMR prescriptions, which simplifies in-room selection and verification. Conclusion: Multidisciplinary planning and FMEA increased delectability and reduced error probability during VLA HDR Brachytherapy. This clinical model may be useful to institutions implementing similar procedures.« less
Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test

ERIC Educational Resources Information Center

Lee, Yi-Hsuan; Zhang, Jinming

2017-01-01

Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
Optimal Scoring Methods of Hand-Strength Tests in Patients with Stroke

ERIC Educational Resources Information Center

Huang, Sheau-Ling; Hsieh, Ching-Lin; Lin, Jau-Hong; Chen, Hui-Mei

2011-01-01

The purpose of this study was to determine the optimal scoring methods for measuring strength of the more-affected hand in patients with stroke by examining the effect of reducing measurement errors. Three hand-strength tests of grip, palmar pinch, and lateral pinch were administered at two sessions in 56 patients with stroke. Five scoring methods…
The impact of video games on training surgeons in the 21st century.

PubMed

Rosser, James C; Lynch, Paul J; Cuddihy, Laurie; Gentile, Douglas A; Klonsky, Jonathan; Merrell, Ronald

2007-02-01

Video games have become extensively integrated into popular culture. Anecdotal observations of young surgeons suggest that video game play contributes to performance excellence in laparoscopic surgery. Training benefits for surgeons who play video games should be quantifiable. There is a potential link between video game play and laparoscopic surgical skill and suturing. Cross-sectional analysis of the performance of surgical residents and attending physicians participating in the Rosser Top Gun Laparoscopic Skills and Suturing Program (Top Gun). Three different video game exercises were performed, and surveys were completed to assess past experience with video games and current level of play, and each subject's level of surgical training, number of laparoscopic cases performed, and number of years in medical practice. Academic medical center and surgical training program. Thirty-three residents and attending physicians participating in Top Gun from May 10 to August 24, 2002. The primary outcome measures were compared between participants' laparoscopic skills and suturing capability, video game scores, and video game experience. Past video game play in excess of 3 h/wk correlated with 37% fewer errors (P<.02) and 27% faster completion (P<.03). Overall Top Gun score (time and errors) was 33% better (P<.005) for video game players and 42% better (P<.01) if they played more than 3 h/wk. Current video game players made 32% fewer errors (P=.04), performed 24% faster (P<.04), and scored 26% better overall (time and errors) (P<.005) than their nonplaying colleagues. When comparing demonstrated video gaming skills, those in the top tertile made 47% fewer errors, performed 39% faster, and scored 41% better (P<.001 for all) on the overall Top Gun score. Regression analysis also indicated that video game skill and past video game experience are significant predictors of demonstrated laparoscopic skills. Video game skill correlates with laparoscopic surgical skills. Training curricula that include video games may help thin the technical interface between surgeons and screen-mediated applications, such as laparoscopic surgery. Video games may be a practical teaching tool to help train surgeons.
A new method for the assessment of patient safety competencies during a medical school clerkship using an objective structured clinical examination

PubMed Central

Daud-Gallotti, Renata Mahfuz; Morinaga, Christian Valle; Arlindo-Rodrigues, Marcelo; Velasco, Irineu Tadeu; Arruda Martins, Milton; Tiberio, Iolanda Calvo

2011-01-01

INTRODUCTION: Patient safety is seldom assessed using objective evaluations during undergraduate medical education. OBJECTIVE: To evaluate the performance of fifth-year medical students using an objective structured clinical examination focused on patient safety after implementation of an interactive program based on adverse events recognition and disclosure. METHODS: In 2007, a patient safety program was implemented in the internal medicine clerkship of our hospital. The program focused on human error theory, epidemiology of incidents, adverse events, and disclosure. Upon completion of the program, students completed an objective structured clinical examination with five stations and standardized patients. One station focused on patient safety issues, including medical error recognition/disclosure, the patient-physician relationship and humanism issues. A standardized checklist was completed by each standardized patient to assess the performance of each student. The student's global performance at each station and performance in the domains of medical error, the patient-physician relationship and humanism were determined. The correlations between the student performances in these three domains were calculated. RESULTS: A total of 95 students participated in the objective structured clinical examination. The mean global score at the patient safety station was 87.59±1.24 points. Students' performance in the medical error domain was significantly lower than their performance on patient-physician relationship and humanistic issues. Less than 60% of students (n = 54) offered the simulated patient an apology after a medical error occurred. A significant correlation was found between scores obtained in the medical error domains and scores related to both the patient-physician relationship and humanistic domains. CONCLUSIONS: An objective structured clinical examination is a useful tool to evaluate patient safety competencies during the medical student clerkship. PMID:21876976
Improving Histopathology Laboratory Productivity: Process Consultancy and A3 Problem Solving.

PubMed

Yörükoğlu, Kutsal; Özer, Erdener; Alptekin, Birsen; Öcal, Cem

2017-01-01

The ISO 17020 quality program has been run in our pathology laboratory for four years to establish an action plan for correction and prevention of identified errors. In this study, we aimed to evaluate the errors that we could not identify through ISO 17020 and/or solve by means of process consulting. Process consulting is carefully intervening in a group or team to help it to accomplish its goals. The A3 problem solving process was run under the leadership of a 'workflow, IT and consultancy manager'. An action team was established consisting of technical staff. A root cause analysis was applied for target conditions, and the 6-S method was implemented for solution proposals. Applicable proposals were activated and the results were rated by six-sigma analysis. Non-applicable proposals were reported to the laboratory administrator. A mislabelling error was the most complained issue triggering all pre-analytical errors. There were 21 non-value added steps grouped in 8 main targets on the fish bone graphic (transporting, recording, moving, individual, waiting, over-processing, over-transaction and errors). Unnecessary redundant requests, missing slides, archiving issues, redundant activities, and mislabelling errors were proposed to be solved by improving visibility and fixing spaghetti problems. Spatial re-organization, organizational marking, re-defining some operations, and labeling activities raised the six sigma score from 24% to 68% for all phases. Operational transactions such as implementation of a pathology laboratory system was suggested for long-term improvement. Laboratory management is a complex process. Quality control is an effective method to improve productivity. Systematic checking in a quality program may not always find and/or solve the problems. External observation may reveal crucial indicators about the system failures providing very simple solutions.
Traditional Nurse Triage vs. Physician Tele-Presence in a Pediatric Emergency Department

PubMed Central

Marconi, Greg P.; Chang, Todd; Pham, Phung K.; Grajower, Daniel N.; Nager, Alan L.

2014-01-01

Objectives To compare traditional nurse triage (TNT) in a Pediatric Emergency Department (PED) to physician tele-presence (PTP). Methods Prospective, 2×2 crossover study with random assignment using a sample of walk-in patients seeking care in a PED at a large, tertiary care children’s hospital, from May 2012 to January 2013. Outcomes of triage times, documentation errors, triage scores, and survey responses were compared between TNT and PTP. Comparison between PTP to actual treating PED physicians regarding the accuracy of ordering blood and urine tests, throat cultures, and radiologic imaging was also studied. Results Paired samples t-tests showed a statistically significant difference in triage time between TNT and PTP (p=0.03), but no significant difference in documentation errors (p=0.10). Triage scores of TNT were 71% accurate, compared to PTP, which were 95% accurate. Both parents and children had favorable scores regarding PTP and the majority indicated they would prefer PTP again at their next PED visit. PTP diagnostic ordering was comparable to the actual PED physician ordering, showing no statistical differences. Conclusions Utilizing physician tele-presence technology to remotely perform triage is a feasible alternative to traditional nurse triage, with no clinically significant differences in time, triage scores, errors and patient and parent satisfaction. PMID:24445223
Sustained attention deficits among HIV-positive individuals with comorbid bipolar disorder.

PubMed

Posada, Carolina; Moore, David J; Deutsch, Reena; Rooney, Alexandra; Gouaux, Ben; Letendre, Scott; Grant, Igor; Atkinson, J Hampton

2012-01-01

Difficulties with sustained attention have been found among both persons with HIV infection (HIV+) and bipolar disorder (BD). The authors examined sustained attention among 39 HIV+ individuals with BD (HIV+/BD+) and 33 HIV-infected individuals without BD (HIV+/BD-), using the Conners' Continuous Performance Test-II (CPT-II). A Global Assessment of Functioning (GAF) score was also assigned to each participant as an overall indicator of daily functioning abilities. HIV+/BD+ participants had significantly worse performance on CPT-II omission errors, hit reaction time SE (Hit RT SE), variability of SE, and perseverations than HIV+/BD- participants. When examining CPT-II performance over the six study blocks, both HIV+/BD+ and HIV+/BD- participants evidenced worse performance on scores of commission errors and reaction times as the test progressed. The authors also examined the effect of current mood state (i.e., manic, depressive, euthymic) on CPT-II performance, but no significant differences were observed across the various mood states. HIV+/BD+ participants had significantly worse GAF scores than HIV+/BD- participants, which indicates poorer overall functioning in the dually-affected group; among HIV+/BD+ persons, significant negative correlations were found between GAF scores and CPT-II omission and commission errors, detectability, and perseverations, indicating a possible relationship between decrements in sustained attention and worse daily-functioning outcomes.
Measurement error in the Liebowitz Social Anxiety Scale: results from a general adult population in Japan.

PubMed

Takada, Koki; Takahashi, Kana; Hirao, Kazuki

2018-01-17

Although the self-report version of Liebowitz Social Anxiety Scale (LSAS) is frequently used to measure social anxiety, data is lacking on the smallest detectable change (SDC), an important index of measurement error. We therefore aimed to determine the SDC of LSAS. Japanese adults aged 20-69 years were invited from a panel managed by a nationwide internet research agency. We then conducted a test-retest internet survey with a two-week interval to estimate the SDC at the individual (SDC ind ) and group (SDC group ) levels. The analysis included 1300 participants. The SDC ind and SDC group for the total fear subscale (scoring range: 0-72) were 23.52 points (32.7%) and 0.65 points (0.9%), respectively. The SDC ind and SDC group for the total avoidance subscale (scoring range: 0-72) were 32.43 points (45.0%) and 0.90 points (1.2%), respectively. The SDC ind and SDC group for the overall total score (scoring range: 0-144) were 45.90 points (31.9%) and 1.27 points (0.9%), respectively. Measurement error is large and indicate the potential for major problems when attempting to use the LSAS to detect changes at the individual level. These results should be considered when using the LSAS as measures of treatment change.
Post-processing of a low-flow forecasting system in the Thur basin (Switzerland)

NASA Astrophysics Data System (ADS)

Bogner, Konrad; Joerg-Hess, Stefanie; Bernhard, Luzi; Zappa, Massimiliano

2015-04-01

Low-flows and droughts are natural hazards with potentially severe impacts and economic loss or damage in a number of environmental and socio-economic sectors. As droughts develop slowly there is time to prepare and pre-empt some of these impacts. Real-time information and forecasting of a drought situation can therefore be an effective component of drought management. Although Switzerland has traditionally been more concerned with problems related to floods, in recent years some unprecedented low-flow situations have been experienced. Driven by the climate change debate a drought information platform has been developed to guide water resources management during situations where water resources drop below critical low-flow levels characterised by the indices duration (time between onset and offset), severity (cumulative water deficit) and magnitude (severity/duration). However to gain maximum benefit from such an information system it is essential to remove the bias from the meteorological forecast, to derive optimal estimates of the initial conditions, and to post-process the stream-flow forecasts. Quantile mapping methods for pre-processing the meteorological forecasts and improved data assimilation methods of snow measurements, which accounts for much of the seasonal stream-flow predictability for the majority of the basins in Switzerland, have been tested previously. The objective of this study is the testing of post-processing methods in order to remove bias and dispersion errors and to derive the predictive uncertainty of a calibrated low-flow forecast system. Therefore various stream-flow error correction methods with different degrees of complexity have been applied and combined with the Hydrological Uncertainty Processor (HUP) in order to minimise the differences between the observations and model predictions and to derive posterior probabilities. The complexity of the analysed error correction methods ranges from simple AR(1) models to methods including wavelet transformations and support vector machines. These methods have been combined with forecasts driven by Numerical Weather Prediction (NWP) systems with different temporal and spatial resolutions, lead-times and different numbers of ensembles covering short to medium to extended range forecasts (COSMO-LEPS, 10-15 days, monthly and seasonal ENS) as well as climatological forecasts. Additionally the suitability of various skill scores and efficiency measures regarding low-flow predictions will be tested. Amongst others the novel 2afc (2 alternatives forced choices) score and the quantile skill score and its decompositions will be applied to evaluate the probabilistic forecasts and the effects of post-processing. First results of the performance of the low-flow predictions of the hydrological model PREVAH initialised with different NWP's will be shown.
Evaluating Professionalism, Practice-Based Learning and Improvement, and Systems-Based Practice: Utilization of a Compliance Form and Correlation with Conflict Styles

PubMed Central

Ogunyemi, Dotun; Eno, Michelle; Rad, Steve; Fong, Alex; Alexander, Carolyn; Azziz, Ricardo

2010-01-01

Objective The purpose of this article was to develop and determine the utility of a compliance form in evaluating and teaching the Accreditation Council for Graduate Medical Education competencies of professionalism, practice-based learning and improvement, and systems-based practice. Methods In 2006, we introduced a 17-item compliance form in an obstetrics and gynecology residency program. The form prospectively monitored residents on attendance at required activities (5 items), accountability of required obligations (9 items), and completion of assigned projects (3 items). Scores were compared to faculty evaluations of residents, resident status as a contributor or a concerning resident, and to the residents' conflict styles, using the Thomas-Kilmann Conflict MODE Instrument. Results Our analysis of 18 residents for academic year 2007–2008 showed a mean (standard error of mean) of 577 (65.3) for postgraduate year (PGY)-1, 692 (42.4) for PGY-2, 535 (23.3) for PGY-3, and 651.6 (37.4) for PGY-4. Non-Hispanic white residents had significantly higher scores on compliance, faculty evaluations on interpersonal and communication skills, and competence in systems-based practice. Contributing residents had significantly higher scores on compliance compared with concerning residents. Senior residents had significantly higher accountability scores compared with junior residents, and junior residents had increased project completion scores. Attendance scores increased and accountability scores decreased significantly between the first and second 6 months of the academic year. There were positive correlations between compliance scores with competing and collaborating conflict styles, and significant negative correlations between compliance with avoiding and accommodating conflict styles. Conclusions Maintaining a compliance form allows residents and residency programs to focus on issues that affect performance and facilitate assessment of the ACGME competencies. Postgraduate year, behavior, and conflict styles appear to be associated with compliance. A lack of association with faculty evaluations suggests measurement of different perceptions of residents' behavior. PMID:21976093
Evaluating professionalism, practice-based learning and improvement, and systems-based practice: utilization of a compliance form and correlation with conflict styles.

PubMed

Ogunyemi, Dotun; Eno, Michelle; Rad, Steve; Fong, Alex; Alexander, Carolyn; Azziz, Ricardo

2010-09-01

The purpose of this article was to develop and determine the utility of a compliance form in evaluating and teaching the Accreditation Council for Graduate Medical Education competencies of professionalism, practice-based learning and improvement, and systems-based practice. In 2006, we introduced a 17-item compliance form in an obstetrics and gynecology residency program. The form prospectively monitored residents on attendance at required activities (5 items), accountability of required obligations (9 items), and completion of assigned projects (3 items). Scores were compared to faculty evaluations of residents, resident status as a contributor or a concerning resident, and to the residents' conflict styles, using the Thomas-Kilmann Conflict MODE Instrument. Our analysis of 18 residents for academic year 2007-2008 showed a mean (standard error of mean) of 577 (65.3) for postgraduate year (PGY)-1, 692 (42.4) for PGY-2, 535 (23.3) for PGY-3, and 651.6 (37.4) for PGY-4. Non-Hispanic white residents had significantly higher scores on compliance, faculty evaluations on interpersonal and communication skills, and competence in systems-based practice. Contributing residents had significantly higher scores on compliance compared with concerning residents. Senior residents had significantly higher accountability scores compared with junior residents, and junior residents had increased project completion scores. Attendance scores increased and accountability scores decreased significantly between the first and second 6 months of the academic year. There were positive correlations between compliance scores with competing and collaborating conflict styles, and significant negative correlations between compliance with avoiding and accommodating conflict styles. Maintaining a compliance form allows residents and residency programs to focus on issues that affect performance and facilitate assessment of the ACGME competencies. Postgraduate year, behavior, and conflict styles appear to be associated with compliance. A lack of association with faculty evaluations suggests measurement of different perceptions of residents' behavior.
Recommendations, evaluation and validation of a semi-automated, fluorescent-based scoring protocol for micronucleus testing in human cells.

PubMed

Seager, Anna L; Shah, Ume-Kulsoom; Brüsehafer, Katja; Wills, John; Manshian, Bella; Chapman, Katherine E; Thomas, Adam D; Scott, Andrew D; Doherty, Ann T; Doak, Shareen H; Johnson, George E; Jenkins, Gareth J S

2014-05-01

Micronucleus (MN) induction is an established cytogenetic end point for evaluating structural and numerical chromosomal alterations in genotoxicity testing. A semi-automated scoring protocol for the assessment of MN preparations from human cell lines and a 3D skin cell model has been developed and validated. Following exposure to a range of test agents, slides were stained with 4'-6-diamidino-2-phenylindole (DAPI) and scanned by use of the MicroNuc module of metafer 4, after the development of a modified classifier for selecting MN in binucleate cells. A common difficulty observed with automated systems is an artefactual output of high false positives, in the case of the metafer system this is mainly due to the loss of cytoplasmic boundaries during slide preparation. Slide quality is paramount to obtain accurate results. We show here that to avoid elevated artefactual-positive MN outputs, diffuse cell density and low-intensity nuclear staining are critical. Comparisons between visual (Giemsa stained) and automated (DAPI stained) MN frequencies and dose-response curves were highly correlated (R (2) = 0.70 for hydrogen peroxide, R (2) = 0.98 for menadione, R (2) = 0.99 for mitomycin C, R (2) = 0.89 for potassium bromate and R (2) = 0.68 for quantum dots), indicating the system is adequate to produce biologically relevant and reliable results. Metafer offers many advantages over conventional scoring including increased output and statistical power, and reduced scoring subjectivity, labour and costs. Further, the metafer system is easily adaptable for use with a range of different cells, both suspension and adherent human cell lines. Awareness of the points raised here reduces the automatic positive errors flagged and drastically reduces slide scoring time, making metafer an ideal candidate for genotoxic biomonitoring and population studies and regulatory genotoxic testing.
Predicting Fatigue and Psychophysiological Test Performance from Speech for Safety-Critical Environments.

PubMed

Baykaner, Khan Richard; Huckvale, Mark; Whiteley, Iya; Andreeva, Svetlana; Ryumin, Oleg

2015-01-01

Automatic systems for estimating operator fatigue have application in safety-critical environments. A system which could estimate level of fatigue from speech would have application in domains where operators engage in regular verbal communication as part of their duties. Previous studies on the prediction of fatigue from speech have been limited because of their reliance on subjective ratings and because they lack comparison to other methods for assessing fatigue. In this paper, we present an analysis of voice recordings and psychophysiological test scores collected from seven aerospace personnel during a training task in which they remained awake for 60 h. We show that voice features and test scores are affected by both the total time spent awake and the time position within each subject's circadian cycle. However, we show that time spent awake and time-of-day information are poor predictors of the test results, while voice features can give good predictions of the psychophysiological test scores and sleep latency. Mean absolute errors of prediction are possible within about 17.5% for sleep latency and 5-12% for test scores. We discuss the implications for the use of voice as a means to monitor the effects of fatigue on cognitive performance in practical applications.
Predicting Fatigue and Psychophysiological Test Performance from Speech for Safety-Critical Environments

PubMed Central

Baykaner, Khan Richard; Huckvale, Mark; Whiteley, Iya; Andreeva, Svetlana; Ryumin, Oleg

2015-01-01

Automatic systems for estimating operator fatigue have application in safety-critical environments. A system which could estimate level of fatigue from speech would have application in domains where operators engage in regular verbal communication as part of their duties. Previous studies on the prediction of fatigue from speech have been limited because of their reliance on subjective ratings and because they lack comparison to other methods for assessing fatigue. In this paper, we present an analysis of voice recordings and psychophysiological test scores collected from seven aerospace personnel during a training task in which they remained awake for 60 h. We show that voice features and test scores are affected by both the total time spent awake and the time position within each subject’s circadian cycle. However, we show that time spent awake and time-of-day information are poor predictors of the test results, while voice features can give good predictions of the psychophysiological test scores and sleep latency. Mean absolute errors of prediction are possible within about 17.5% for sleep latency and 5–12% for test scores. We discuss the implications for the use of voice as a means to monitor the effects of fatigue on cognitive performance in practical applications. PMID:26380259
Approximating frustration scores in complex networks via perturbed Laplacian spectra

NASA Astrophysics Data System (ADS)

Savol, Andrej J.; Chennubhotla, Chakra S.

2015-12-01

Systems of many interacting components, as found in physics, biology, infrastructure, and the social sciences, are often modeled by simple networks of nodes and edges. The real-world systems frequently confront outside intervention or internal damage whose impact must be predicted or minimized, and such perturbations are then mimicked in the models by altering nodes or edges. This leads to the broad issue of how to best quantify changes in a model network after some type of perturbation. In the case of node removal there are many centrality metrics which associate a scalar quantity with the removed node, but it can be difficult to associate the quantities with some intuitive aspect of physical behavior in the network. This presents a serious hurdle to the application of network theory: real-world utility networks are rarely altered according to theoretic principles unless the kinetic impact on the network's users are fully appreciated beforehand. In pursuit of a kinetically interpretable centrality score, we discuss the f-score, or frustration score. Each f-score quantifies whether a selected node accelerates or inhibits global mean first passage times to a second, independently selected target node. We show that this is a natural way of revealing the dynamical importance of a node in some networks. After discussing merits of the f-score metric, we combine spectral and Laplacian matrix theory in order to quickly approximate the exact f-score values, which can otherwise be expensive to compute. Following tests on both synthetic and real medium-sized networks, we report f-score runtime improvements over exact brute force approaches in the range of 0 to 400 % with low error (<3 % ).
Psoriasis image representation using patch-based dictionary learning for erythema severity scoring.

PubMed

George, Yasmeen; Aldeen, Mohammad; Garnavi, Rahil

2018-06-01

Psoriasis is a chronic skin disease which can be life-threatening. Accurate severity scoring helps dermatologists to decide on the treatment. In this paper, we present a semi-supervised computer-aided system for automatic erythema severity scoring in psoriasis images. Firstly, the unsupervised stage includes a novel image representation method. We construct a dictionary, which is then used in the sparse representation for local feature extraction. To acquire the final image representation vector, an aggregation method is exploited over the local features. Secondly, the supervised phase is where various multi-class machine learning (ML) classifiers are trained for erythema severity scoring. Finally, we compare the proposed system with two popular unsupervised feature extractor methods, namely: bag of visual words model (BoVWs) and AlexNet pretrained model. Root mean square error (RMSE) and F1 score are used as performance measures for the learned dictionaries and the trained ML models, respectively. A psoriasis image set consisting of 676 images, is used in this study. Experimental results demonstrate that the use of the proposed procedure can provide a setup where erythema scoring is accurate and consistent. Also, it is revealed that dictionaries with large number of atoms and small patch sizes yield the best representative erythema severity features. Further, random forest (RF) outperforms other classifiers with F1 score 0.71, followed by support vector machine (SVM) and boosting with 0.66 and 0.64 scores, respectively. Furthermore, the conducted comparative studies confirm the effectiveness of the proposed approach with improvement of 9% and 12% over BoVWs and AlexNet based features, respectively. Crown Copyright © 2018. Published by Elsevier Ltd. All rights reserved.
Development and validation of a prognostic score to predict mortality in patients with acute-on-chronic liver failure.

PubMed

Jalan, Rajiv; Saliba, Faouzi; Pavesi, Marco; Amoros, Alex; Moreau, Richard; Ginès, Pere; Levesque, Eric; Durand, Francois; Angeli, Paolo; Caraceni, Paolo; Hopf, Corinna; Alessandria, Carlo; Rodriguez, Ezequiel; Solis-Muñoz, Pablo; Laleman, Wim; Trebicka, Jonel; Zeuzem, Stefan; Gustot, Thierry; Mookerjee, Rajeshwar; Elkrief, Laure; Soriano, German; Cordoba, Joan; Morando, Filippo; Gerbes, Alexander; Agarwal, Banwari; Samuel, Didier; Bernardi, Mauro; Arroyo, Vicente

2014-11-01

Acute-on-chronic liver failure (ACLF) is a frequent syndrome (30% prevalence), characterized by acute decompensation of cirrhosis, organ failure(s) and high short-term mortality. This study develops and validates a specific prognostic score for ACLF patients. Data from 1349 patients included in the CANONIC study were used. First, a simplified organ function scoring system (CLIF Consortium Organ Failure score, CLIF-C OFs) was developed to diagnose ACLF using data from all patients. Subsequently, in 275 patients with ACLF, CLIF-C OFs and two other independent predictors of mortality (age and white blood cell count) were combined to develop a specific prognostic score for ACLF (CLIF Consortium ACLF score [CLIF-C ACLFs]). A concordance index (C-index) was used to compare the discrimination abilities of CLIF-C ACLF, MELD, MELD-sodium (MELD-Na), and Child-Pugh (CPs) scores. The CLIF-C ACLFs was validated in an external cohort and assessed for sequential use. The CLIF-C ACLFs showed a significantly higher predictive accuracy than MELDs, MELD-Nas, and CPs, reducing (19-28%) the corresponding prediction error rates at all main time points after ACLF diagnosis (28, 90, 180, and 365 days) in both the CANONIC and the external validation cohort. CLIF-C ACLFs computed at 48 h, 3-7 days, and 8-15 days after ACLF diagnosis predicted the 28-day mortality significantly better than at diagnosis. The CLIF-C ACLFs at ACLF diagnosis is superior to the MELDs and MELD-Nas in predicting mortality. The CLIF-C ACLFs is a clinically relevant, validated scoring system that can be used sequentially to stratify the risk of mortality in ACLF patients. Copyright © 2014 European Association for the Study of the Liver. Published by Elsevier B.V. All rights reserved.

Four Bootstrap Confidence Intervals for the Binomial-Error Model.

ERIC Educational Resources Information Center

Lin, Miao-Hsiang; Hsiung, Chao A.

1992-01-01

Four bootstrap methods are identified for constructing confidence intervals for the binomial-error model. The extent to which similar results are obtained and the theoretical foundation of each method and its relevance and ranges of modeling the true score uncertainty are discussed. (SLD)
Performance of biometric quality measures.

PubMed

Grother, Patrick; Tabassi, Elham

2007-04-01

We document methods for the quantitative evaluation of systems that produce a scalar summary of a biometric sample's quality. We are motivated by a need to test claims that quality measures are predictive of matching performance. We regard a quality measurement algorithm as a black box that converts an input sample to an output scalar. We evaluate it by quantifying the association between those values and observed matching results. We advance detection error trade-off and error versus reject characteristics as metrics for the comparative evaluation of sample quality measurement algorithms. We proceed this with a definition of sample quality, a description of the operational use of quality measures. We emphasize the performance goal by including a procedure for annotating the samples of a reference corpus with quality values derived from empirical recognition scores.
Two-step estimation in ratio-of-mediator-probability weighted causal mediation analysis.

PubMed

Bein, Edward; Deutsch, Jonah; Hong, Guanglei; Porter, Kristin E; Qin, Xu; Yang, Cheng

2018-04-15

This study investigates appropriate estimation of estimator variability in the context of causal mediation analysis that employs propensity score-based weighting. Such an analysis decomposes the total effect of a treatment on the outcome into an indirect effect transmitted through a focal mediator and a direct effect bypassing the mediator. Ratio-of-mediator-probability weighting estimates these causal effects by adjusting for the confounding impact of a large number of pretreatment covariates through propensity score-based weighting. In step 1, a propensity score model is estimated. In step 2, the causal effects of interest are estimated using weights derived from the prior step's regression coefficient estimates. Statistical inferences obtained from this 2-step estimation procedure are potentially problematic if the estimated standard errors of the causal effect estimates do not reflect the sampling uncertainty in the estimation of the weights. This study extends to ratio-of-mediator-probability weighting analysis a solution to the 2-step estimation problem by stacking the score functions from both steps. We derive the asymptotic variance-covariance matrix for the indirect effect and direct effect 2-step estimators, provide simulation results, and illustrate with an application study. Our simulation results indicate that the sampling uncertainty in the estimated weights should not be ignored. The standard error estimation using the stacking procedure offers a viable alternative to bootstrap standard error estimation. We discuss broad implications of this approach for causal analysis involving propensity score-based weighting. Copyright © 2018 John Wiley & Sons, Ltd.
Using meta-quality to assess the utility of volunteered geographic information for science.

PubMed

Langley, Shaun A; Messina, Joseph P; Moore, Nathan

2017-11-06

Volunteered geographic information (VGI) has strong potential to be increasingly valuable to scientists in collaboration with non-scientists. The abundance of mobile phones and other wireless forms of communication open up significant opportunities for the public to get involved in scientific research. As these devices and activities become more abundant, questions of uncertainty and error in volunteer data are emerging as critical components for using volunteer-sourced spatial data. Here we present a methodology for using VGI and assessing its sensitivity to three types of error. More specifically, this study evaluates the reliability of data from volunteers based on their historical patterns. The specific context is a case study in surveillance of tsetse flies, a health concern for being the primary vector of African Trypanosomiasis. Reliability, as measured by a reputation score, determines the threshold for accepting the volunteered data for inclusion in a tsetse presence/absence model. Higher reputation scores are successful in identifying areas of higher modeled tsetse prevalence. A dynamic threshold is needed but the quality of VGI will improve as more data are collected and the errors in identifying reliable participants will decrease. This system allows for two-way communication between researchers and the public, and a way to evaluate the reliability of VGI. Boosting the public's ability to participate in such work can improve disease surveillance and promote citizen science. In the absence of active surveillance, VGI can provide valuable spatial information given that the data are reliable.
Patient Safety Culture in Intensive Care Units from the Perspective of Nurses: A Cross-Sectional Study.

PubMed

Farzi, Sedigheh; Moladoost, Azam; Bahrami, Masoud; Farzi, Saba; Etminani, Reza

2017-01-01

One of the goals of nursing is providing safe care, prevention of injury, and health promotion of patients. Patient safety in intensive care units is threatened for various reasons. This study aimed to survey patient safety culture from the perspective of nurses in intensive care units. This cross-sectional study was conducted in 2016. Sampling was done using the convenience method. The sample consisted of 367 nurses working in intensive care units of teaching hospitals affiliated to Isfahan University of Medical Sciences. Data collection was performed using a two-part questionnaire that included demographic and hospital survey on Patient Safety Culture (HSOPSC) questionnaire. Data analysis was done using descriptive statistics (mean and standard deviation). Among the 12 dimensions of safety culture, the nurses assigned the highest score to "team work within units" (97.3%) and "Organizational learning-continuous improvement" (84%). They assigned the least score to "handoffs and transitions"(21.1%), "non-punitive response to errors" (24.7%), "Staffing" (35.6%), "Communication openness" (47.5%), and "Teamwork across units" (49.4%). The patient safety culture dimensions have low levels that require adequate attention and essential measures of health care centers including facilitating teamwork, providing adequate staff, and developing a checklist of handoffs and transitions. Furthermore, to increase reporting error and to promote a patient safety culture in intensive care units, some strategies should be adopted including a system-based approach to deal with the error.
SU-F-J-47: Inherent Uncertainty in the Positional Shifts Determined by a Volumetric Cone Beam Imaging System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Giri, U; Ganesh, T; Saini, V

2016-06-15

Purpose: To quantify inherent uncertainty associated with a volumetric imaging system in its determination of positional shifts. Methods: The study was performed on an Elekta Axesse™ linac’s XVI cone beam computed tomography (CBCT) system. A CT image data set of a Penta- Guide phantom was used as reference image by placing isocenter at the center of the phantom.The phantom was placed arbitrarily on the couch close to isocenter and CBCT images were obtained. The CBCT dataset was matched with the reference image using XVI software and the shifts were determined in 6-dimensions. Without moving the phantom, this process was repeatedmore » 20 times consecutively within 30 minutes on a single day. Mean shifts and their standard deviations in all 6-dimensions were determined for all the 20 instances of imaging. For any given day, the first set of shifts obtained was kept as reference and the deviations of the subsequent 19 sets from the reference set were scored. Mean differences and their standard deviations were determined. In this way, data were obtained for 30 consecutive working days. Results: Tabulating the mean deviations and their standard deviations observed on each day for the 30 measurement days, systematic and random errors in the determination of shifts by XVI software were calculated. The systematic errors were found to be 0.03, 0.04 and 0.03 mm while random errors were 0.05, 0.06 and 0.06 mm in lateral, craniocaudal and anterio-posterior directions respectively. For rotational shifts, the systematic errors were 0.02°, 0.03° and 0.03° and random errors were 0.06°, 0.05° and 0.05° in pitch, roll and yaw directions respectively. Conclusion: The inherent uncertainties in every image guidance system should be assessed and baseline values established at the time of its commissioning. These shall be periodically tested as part of the QA protocol.« less
A comparison of breeding and ensemble transform vectors for global ensemble generation

NASA Astrophysics Data System (ADS)

Deng, Guo; Tian, Hua; Li, Xiaoli; Chen, Jing; Gong, Jiandong; Jiao, Meiyan

2012-02-01

To compare the initial perturbation techniques using breeding vectors and ensemble transform vectors, three ensemble prediction systems using both initial perturbation methods but with different ensemble member sizes based on the spectral model T213/L31 are constructed at the National Meteorological Center, China Meteorological Administration (NMC/CMA). A series of ensemble verification scores such as forecast skill of the ensemble mean, ensemble resolution, and ensemble reliability are introduced to identify the most important attributes of ensemble forecast systems. The results indicate that the ensemble transform technique is superior to the breeding vector method in light of the evaluation of anomaly correlation coefficient (ACC), which is a deterministic character of the ensemble mean, the root-mean-square error (RMSE) and spread, which are of probabilistic attributes, and the continuous ranked probability score (CRPS) and its decomposition. The advantage of the ensemble transform approach is attributed to its orthogonality among ensemble perturbations as well as its consistence with the data assimilation system. Therefore, this study may serve as a reference for configuration of the best ensemble prediction system to be used in operation.
Robust LOD scores for variance component-based linkage analysis.

PubMed

Blangero, J; Williams, J T; Almasy, L

2000-01-01

The variance component method is now widely used for linkage analysis of quantitative traits. Although this approach offers many advantages, the importance of the underlying assumption of multivariate normality of the trait distribution within pedigrees has not been studied extensively. Simulation studies have shown that traits with leptokurtic distributions yield linkage test statistics that exhibit excessive Type I error when analyzed naively. We derive analytical formulae relating the deviation from the expected asymptotic distribution of the lod score to the kurtosis and total heritability of the quantitative trait. A simple correction constant yields a robust lod score for any deviation from normality and for any pedigree structure, and effectively eliminates the problem of inflated Type I error due to misspecification of the underlying probability model in variance component-based linkage analysis.
Modification of the Mantel-Haenszel and Logistic Regression DIF Procedures to Incorporate the SIBTEST Regression Correction

ERIC Educational Resources Information Center

DeMars, Christine E.

2009-01-01

The Mantel-Haenszel (MH) and logistic regression (LR) differential item functioning (DIF) procedures have inflated Type I error rates when there are large mean group differences, short tests, and large sample sizes.When there are large group differences in mean score, groups matched on the observed number-correct score differ on true score,…
Graduate Student WAIS-III Scoring Accuracy Is a Function of Full Scale IQ and Complexity of Examiner Tasks

ERIC Educational Resources Information Center

Hopwood, Christopher J.; Richard, David C. S.

2005-01-01

Research on the Wechsler Adult Intelligence Scale-Revised and Wechsler Adult Intelligence Scale-Third Edition (WAIS-III) suggests that practicing clinical psychologists and graduate students make item-level scoring errors that affect IQ, index, and subtest scores. Studies have been limited in that Full-Scale IQ (FSIQ) and examiner administration,…
The Mote In Thy Brother's Eye, and The Beam in Thine Own: Predicting One's Own and Others' Personality Test Scores.

ERIC Educational Resources Information Center

Furnham, Adrian; Henderson, Monika

1983-01-01

Examined the similarity between subjects' (N=63) ratings of themselves and others, on various tests of personality. Results revealed that subjects correctly estimated several of their own scores, but only two scores of another person. They believed themselves to be similar to their friend, thereby showing attributional errors. (JAC)
A TDM link with channel coding and digital voice.

NASA Technical Reports Server (NTRS)

Jones, M. W.; Tu, K.; Harton, P. L.

1972-01-01

The features of a TDM (time-division multiplexed) link model are described. A PCM telemetry sequence was coded for error correction and multiplexed with a digitized voice channel. An all-digital implementation of a variable-slope delta modulation algorithm was used to digitize the voice channel. The results of extensive testing are reported. The measured coding gain and the system performance over a Gaussian channel are compared with theoretical predictions and computer simulations. Word intelligibility scores are reported as a measure of voice channel performance.
Correction to: CASPer, an online pre-interview screen for personal/professional characteristics: prediction of national licensure scores.

PubMed

Dore, Kelly L; Reiter, Harold I; Kreuger, Sharyn; Norman, Geoffrey R

2017-12-01

In re-examining the paper "CASPer, an online pre-interview screen for personal/professional characteristics: prediction of national licensure scores" published in AHSE (22(2), 327-336), we recognized two errors of interpretation.
Measuring Reading Performance Informally.

ERIC Educational Resources Information Center

Powell, William R.

To improve the accuracy of the informal reading inventory (IRI), a differential set of criteria is necessary for both word recognition and comprehension scores for different levels and reading conditions. In initial evaluation, word recognition scores should reflect only errors of insertions, omissions, mispronunciations, substitiutions, unkown…
Validation of automatic joint space width measurements in hand radiographs in rheumatoid arthritis.

PubMed

Schenk, Olga; Huo, Yinghe; Vincken, Koen L; van de Laar, Mart A; Kuper, Ina H H; Slump, Kees C H; Lafeber, Floris P J G; Bernelot Moens, Hein J

2016-10-01

Computerized methods promise quick, objective, and sensitive tools to quantify progression of radiological damage in rheumatoid arthritis (RA). Measurement of joint space width (JSW) in finger and wrist joints with these systems performed comparable to the Sharp-van der Heijde score (SHS). A next step toward clinical use, validation of precision and accuracy in hand joints with minimal damage, is described with a close scrutiny of sources of error. A recently developed system to measure metacarpophalangeal (MCP) and proximal interphalangeal (PIP) joints was validated in consecutive hand images of RA patients. To assess the impact of image acquisition, measurements on radiographs from a multicenter trial and from a recent prospective cohort in a single hospital were compared. Precision of the system was tested by comparing the joint space in mm in pairs of subsequent images with a short interval without progression of SHS. In case of incorrect measurements, the source of error was analyzed with a review by human experts. Accuracy was assessed by comparison with reported measurements with other systems. In the two series of radiographs, the system could automatically locate and measure 1003/1088 (92.2%) and 1143/1200 (95.3%) individual joints, respectively. In joints with a normal SHS, the average (SD) size of MCP joints was [Formula: see text] and [Formula: see text] in the two series of radiographs, and of PIP joints [Formula: see text] and [Formula: see text]. The difference in JSW between two serial radiographs with an interval of 6 to 12 months and unchanged SHS was [Formula: see text], indicating very good precision. Errors occurred more often in radiographs from the multicenter cohort than in a more recent series from a single hospital. Detailed analysis of the 55/1125 (4.9%) measurements that had a discrepant paired measurement revealed that variation in the process of image acquisition (exposure in 15% and repositioning in 57%) was a more frequent source of error than incorrect delineation by the software (25%). Various steps in the validation of an automated measurement system for JSW of MCP and PIP joints are described. The use of serial radiographs from different sources, with a short interval and limited damage, is helpful to detect sources of error. Image acquisition, in particular repositioning, is a dominant source of error.
Kernel Equating Under the Non-Equivalent Groups With Covariates Design

PubMed Central

Bränberg, Kenny

2015-01-01

When equating two tests, the traditional approach is to use common test takers and/or common items. Here, the idea is to use variables correlated with the test scores (e.g., school grades and other test scores) as a substitute for common items in a non-equivalent groups with covariates (NEC) design. This is performed in the framework of kernel equating and with an extension of the method developed for post-stratification equating in the non-equivalent groups with anchor test design. Real data from a college admissions test were used to illustrate the use of the design. The equated scores from the NEC design were compared with equated scores from the equivalent group (EG) design, that is, equating with no covariates as well as with equated scores when a constructed anchor test was used. The results indicate that the NEC design can produce lower standard errors compared with an EG design. When covariates were used together with an anchor test, the smallest standard errors were obtained over a large range of test scores. The results obtained, that an EG design equating can be improved by adjusting for differences in test score distributions caused by differences in the distribution of covariates, are useful in practice because not all standardized tests have anchor tests. PMID:29881012
Kernel Equating Under the Non-Equivalent Groups With Covariates Design.

PubMed

Wiberg, Marie; Bränberg, Kenny

2015-07-01

When equating two tests, the traditional approach is to use common test takers and/or common items. Here, the idea is to use variables correlated with the test scores (e.g., school grades and other test scores) as a substitute for common items in a non-equivalent groups with covariates (NEC) design. This is performed in the framework of kernel equating and with an extension of the method developed for post-stratification equating in the non-equivalent groups with anchor test design. Real data from a college admissions test were used to illustrate the use of the design. The equated scores from the NEC design were compared with equated scores from the equivalent group (EG) design, that is, equating with no covariates as well as with equated scores when a constructed anchor test was used. The results indicate that the NEC design can produce lower standard errors compared with an EG design. When covariates were used together with an anchor test, the smallest standard errors were obtained over a large range of test scores. The results obtained, that an EG design equating can be improved by adjusting for differences in test score distributions caused by differences in the distribution of covariates, are useful in practice because not all standardized tests have anchor tests.
Vision and academic performance of learning disabled children.

PubMed

Wharry, R E; Kirkpatrick, S W

1986-02-01

The purpose of this study was to assess difference in academic performance among myopic, hyperopic, and emmetropic children who were learning disabled. More specifically, myopic children were expected to perform better on mathematical and spatial tasks than would hyperopic ones and that hyperopic and emmetropic children would perform better on verbal measures than would myopic ones. For 439 learning disabled students visual anomalies were determined via a Generated Retinal Reflex Image Screening System. Test data were obtained from school files. Partial support for the hypothesis was obtained. Myopic learning disabled children outperformed hyperopic and emmetropic children on the Key Math test. Myopic children scored better than hyperopic children on the WRAT Reading subtest and on the Durrell Analysis of Reading Difficulty Oral Reading Comprehension, Oral Rate, Flashword, and Spelling subtests, and on the Key Math Measurement and Total Scores. Severity of refractive error significantly affected the Wechsler Intelligence Scale for Children--Revised Full Scale, Performance Scale, Verbal Scale, and Digit Span scores but did not affect any academic test scores. Several other findings were also reported. Those with nonametropic problems scored higher than those without problems on the Key Math Time subtest. Implications supportive of the theories of Benbow and Benbow and Geschwind and Behan were stated.
Factors associated with disclosure of medical errors by housestaff.

PubMed

Kronman, Andrea C; Paasche-Orlow, Michael; Orlander, Jay D

2012-04-01

Attributes of the organisational culture of residency training programmes may impact patient safety. Training environments are complex, composed of clinical teams, residency programmes, and clinical units. We examined the relationship between residents' perceptions of their training environment and disclosure of or apology for their worst error. Anonymous, self-administered surveys were distributed to Medicine and Surgery residents at Boston Medical Center in 2005. Surveys asked residents to describe their worst medical error, and to answer selected questions from validated surveys measuring elements of working environments that promote learning from error. Subscales measured the microenvironments of the clinical team, residency programme, and clinical unit. Univariate and bivariate statistical analyses examined relationships between trainee characteristics, their perceived learning environment(s), and their responses to the error. Out of 109 surveys distributed to residents, 99 surveys were returned (91% overall response rate), two incomplete surveys were excluded, leaving 97: 61% internal medicine, 39% surgery, 59% male residents. While 31% reported apologising for the situation associated with the error, only 17% reported disclosing the error to patients and/or family. More male residents disclosed the error than female residents (p=0.04). Surgery residents scored higher on the subscales of safety culture pertaining to the residency programme (p=0.02) and managerial commitment to safety (p=0.05). Our Medical Culture Summary score was positively associated with disclosure (p=0.04) and apology (p=0.05). Factors in the learning environments of residents are associated with responses to medical errors. Organisational safety culture can be measured, and used to evaluate environmental attributes of clinical training that are associated with disclosure of, and apology for, medical error.
Implementation of a quality management system according to 9001 standard in a hospital in the home unit: changes and achievements.

PubMed

Rodríguez-Cerrillo, Matilde; Fernández-Diaz, Eddita; Iñurrieta-Romero, Amaia; Poza-Montoro, Ana

2012-01-01

The purpose of this paper is to describe changes and results obtained after implementation of a quality management system (QMS) according to ISO standards in a Hospital in the Home (HIH) Unit. The paper describes changes made and outcomes achieved. This took part in the HiH Unit, Clinico Hospital, Madrid, Spain, and looked at admissions, mean stay, patient satisfaction, adverse events, returns to hospital, no admitted referrals, complaints, compliance to protocols, equipment failures and resolution of urgent consultations. In June 2008, HiH Unit, Clinico Hospital obtained ISO certification. The main results achieved are as follows. There was an increase in patients' satisfaction--in June 2008, assessment of the quality of care provided by staff was scored at 4.7 (on a scale of 1 to 5); in 2010 it has been scored at 4.96. Patient satisfaction rate has increased from 92 percent to 98.8 percent. No complaints from patients were received. Unscheduled returns to hospital have decreased from 7 percent to 3 percent. There were no medical equipment failures. External suppliers' performance has improved. Material and medication needed by staff was available when necessary. The number of admissions has increased. Compliance to protocols has reached 97 percent. Inappropriate referrals have decreased by 8 percent. Six medications-related incidents were detected; in two cases the incident was not due to an error. In the other four cases error could have been detected before reaching the patient. Implementations of an ISO quality management system allow improved quality of care and patient satisfaction in a HIH Unit.

The performance of the standard rate turn (SRT) by student naval helicopter pilots.

PubMed

Chapman, F; Temme, L A; Still, D L

2001-04-01

During flight training, student naval helicopter pilots learn the use of flight instruments through a prescribed series of simulator training events. The training simulator is a 6-degrees-of-freedom, motion-based, high-fidelity instrument trainer. From the final basic instrument simulator flights of student pilots, we selected for evaluation and analysis their performance of the Standard Rate Turn (SRT), a routine flight maneuver. The performance of the SRT was scored with air speed, altitude and heading average error from target values and standard deviations. These average errors and standard deviations were used in a Multiple Analysis of Variance (MANOVA) to evaluate the effects of three independent variables: 1) direction of turn (left vs. right), 2) degree of turn (180 vs. 360 degrees); and 3) segment of turn (roll-in, first 30 s, last 30 s, and roll-out of turn). Only the main effects of the three independent variables were significant; there were no significant interactions. This result greatly reduces the number of different conditions that should be scored separately for the evaluation of SRT performance. The results also showed that the magnitude of the heading and altitude errors at the beginning of the SRT correlated with the magnitude of the heading and altitude errors throughout the turn. This result suggests that for the turn to be well executed, it is important for it to begin with little error in these two response parameters. The observations reported here should be considered when establishing SRT performance norms and comparing student scores. Furthermore, it seems easier for pilots to maintain good performance than to correct poor performance.
Refractive Errors and Academic Achievements of Primary School Children.

PubMed

Joseph, Lucyamma

2014-01-01

The current study was conducted among school children of selected schools of Thiruvananthapuram district of Kerala. It was designed to investigate the effect of refractive errors on academic achievement of primary school children. Experimental method was used in the study and the study used a sample of 185 children. An equated sample without myopia were selected as control group. Academic achievement tests based on the study syllabus were prepared and administered to both groups. The children with myopia were given corrective devices such as glasses prescribed by the ophthalmologist. After five months academic achievement tests were again given to both groups and the results of the scores between two groups as well as the scores before and after correction of errors were compared, which showed a significant influence of myopia on academic achievement and examination anxiety of children.
Mapping from disease-specific measures to health-state utility values in individuals with migraine.

PubMed

Gillard, Patrick J; Devine, Beth; Varon, Sepideh F; Liu, Lei; Sullivan, Sean D

2012-05-01

The objective of this study was to develop empirical algorithms that estimate health-state utility values from disease-specific quality-of-life scores in individuals with migraine. Data from a cross-sectional, multicountry study were used. Individuals with episodic and chronic migraine were randomly assigned to training or validation samples. Spearman's correlation coefficients between paired EuroQol five-dimensional (EQ-5D) questionnaire utility values and both Headache Impact Test (HIT-6) scores and Migraine-Specific Quality-of-Life Questionnaire version 2.1 (MSQ) domain scores (role restrictive, role preventive, and emotional function) were examined. Regression models were constructed to estimate EQ-5D questionnaire utility values from the HIT-6 score or the MSQ domain scores. Preferred algorithms were confirmed in the validation samples. In episodic migraine, the preferred HIT-6 and MSQ algorithms explained 22% and 25% of the variance (R(2)) in the training samples, respectively, and had similar prediction errors (root mean square errors of 0.30). In chronic migraine, the preferred HIT-6 and MSQ algorithms explained 36% and 45% of the variance in the training samples, respectively, and had similar prediction errors (root mean square errors 0.31 and 0.29). In episodic and chronic migraine, no statistically significant differences were observed between the mean observed and the mean estimated EQ-5D questionnaire utility values for the preferred HIT-6 and MSQ algorithms in the validation samples. The relationship between the EQ-5D questionnaire and the HIT-6 or the MSQ is adequate to use regression equations to estimate EQ-5D questionnaire utility values. The preferred HIT-6 and MSQ algorithms will be useful in estimating health-state utilities in migraine trials in which no preference-based measure is present. Copyright © 2012 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Improving the quality of child anthropometry: Manual anthropometry in the Body Imaging for Nutritional Assessment Study (BINA)

PubMed Central

2017-01-01

Anthropometric data collected in clinics and surveys are often inaccurate and unreliable due to measurement error. The Body Imaging for Nutritional Assessment Study (BINA) evaluated the ability of 3D imaging to correctly measure stature, head circumference (HC) and arm circumference (MUAC) for children under five years of age. This paper describes the protocol for and the quality of manual anthropometric measurements in BINA, a study conducted in 2016–17 in Atlanta, USA. Quality was evaluated by examining digit preference, biological plausibility of z-scores, z-score standard deviations, and reliability. We calculated z-scores and analyzed plausibility based on the 2006 WHO Child Growth Standards (CGS). For reliability, we calculated intra- and inter-observer Technical Error of Measurement (TEM) and Intraclass Correlation Coefficient (ICC). We found low digit preference; 99.6% of z-scores were biologically plausible, with z-score standard deviations ranging from 0.92 to 1.07. Total TEM was 0.40 for stature, 0.28 for HC, and 0.25 for MUAC in centimeters. ICC ranged from 0.99 to 1.00. The quality of manual measurements in BINA was high and similar to that of the anthropometric data used to develop the WHO CGS. We attributed high quality to vigorous training, motivated and competent field staff, reduction of non-measurement error through the use of technology, and reduction of measurement error through adequate monitoring and supervision. Our anthropometry measurement protocol, which builds on and improves upon the protocol used for the WHO CGS, can be used to improve anthropometric data quality. The discussion illustrates the need to standardize anthropometric data quality assessment, and we conclude that BINA can provide a valuable evaluation of 3D imaging for child anthropometry because there is comparison to gold-standard, manual measurements. PMID:29240796
Fusing face-verification algorithms and humans.

PubMed

O'Toole, Alice J; Abdi, Hervé; Jiang, Fang; Phillips, P Jonathon

2007-10-01

It has been demonstrated recently that state-of-the-art face-recognition algorithms can surpass human accuracy at matching faces over changes in illumination. The ranking of algorithms and humans by accuracy, however, does not provide information about whether algorithms and humans perform the task comparably or whether algorithms and humans can be fused to improve performance. In this paper, we fused humans and algorithms using partial least square regression (PLSR). In the first experiment, we applied PLSR to face-pair similarity scores generated by seven algorithms participating in the Face Recognition Grand Challenge. The PLSR produced an optimal weighting of the similarity scores, which we tested for generality with a jackknife procedure. Fusing the algorithms' similarity scores using the optimal weights produced a twofold reduction of error rate over the most accurate algorithm. Next, human-subject-generated similarity scores were added to the PLSR analysis. Fusing humans and algorithms increased the performance to near-perfect classification accuracy. These results are discussed in terms of maximizing face-verification accuracy with hybrid systems consisting of multiple algorithms and humans.
Goal or gold: overlapping reward processes in soccer players upon scoring and winning money.

PubMed

Häusler, Alexander Niklas; Becker, Benjamin; Bartling, Marcel; Weber, Bernd

2015-01-01

Social rewards are important incentives for human behavior. This is especially true in team sports such as the most popular one worldwide: soccer. We investigated reward processing upon scoring a soccer goal in a standard two-versus-one situation and in comparison to winning in a monetary incentive task. The results show a strong overlap in brain activity between the two conditions in established reward regions of the mesolimbic dopaminergic system, including the ventral striatum and ventromedial pre-frontal cortex. The three main components of reward-associated learning, i.e., reward probability (RP), reward reception (RR) and reward prediction errors (RPE) showed highly similar activation in both con-texts, with only the RR and RPE components displaying overlapping reward activity. Passing and shooting behavior did not correlate with individual egoism scores, but we observe a positive correlation be-tween egoism and activity in the left middle frontal gyrus upon scoring after a pass versus a direct shot. Our findings suggest that rewards in the context of soccer and monetary incentives are based on similar neural processes.
Goal or Gold: Overlapping Reward Processes in Soccer Players upon Scoring and Winning Money

PubMed Central

Häusler, Alexander Niklas; Becker, Benjamin; Bartling, Marcel; Weber, Bernd

2015-01-01

Social rewards are important incentives for human behavior. This is especially true in team sports such as the most popular one worldwide: soccer. We investigated reward processing upon scoring a soccer goal in a standard two-versus-one situation and in comparison to winning in a monetary incentive task. The results show a strong overlap in brain activity between the two conditions in established reward regions of the mesolimbic dopaminergic system, including the ventral striatum and ventromedial pre-frontal cortex. The three main components of reward-associated learning i.e. reward probability (RP), reward reception (RR) and reward prediction errors (RPE) showed highly similar activation in both con-texts, with only the RR and RPE components displaying overlapping reward activity. Passing and shooting behavior did not correlate with individual egoism scores, but we observe a positive correlation be-tween egoism and activity in the left middle frontal gyrus upon scoring after a pass versus a direct shot. Our findings suggest that rewards in the context of soccer and monetary incentives are based on similar neural processes. PMID:25875594
Writing Evaluation: Rater and Task Effects on the Reliability of Writing Scores for Children in Grades 3 and 4

PubMed Central

Kim, Grace Young-Suk; Schatschneider, Christopher; Wanzek, Jeanne; Gatlin, Brandy; Al Otaiba, Stephanie

2017-01-01

We examined how raters and tasks influence measurement error in writing evaluation and how many raters and tasks are needed to reach a desirable level of .90 and .80 reliabilities for children in Grades 3 and 4. A total of 211 children (102 boys) were administered three tasks in narrative and expository genres, respectively, and their written compositions were evaluated in widely used evaluation methods for developing writers: holistic scoring, productivity, and curriculum-based writing scores. Results showed that 54% and 52% of variance in narrative and expository compositions were attributable to true individual differences in writing. Students’ scores varied largely by tasks (30.44% and 28.61% of variance), but not by raters. To reach the reliability of .90, multiple tasks and raters were needed, and for the reliability of .80, a single rater and multiple tasks were needed. These findings offer important implications about reliably evaluating children’s writing skills, given that writing is typically evaluated by a single task and a single rater in classrooms and even in state accountability systems. PMID:29075050
Computational technique for stepwise quantitative assessment of equation correctness

NASA Astrophysics Data System (ADS)

Othman, Nuru'l Izzah; Bakar, Zainab Abu

2017-04-01

Many of the computer-aided mathematics assessment systems that are available today possess the capability to implement stepwise correctness checking of a working scheme for solving equations. The computational technique for assessing the correctness of each response in the scheme mainly involves checking the mathematical equivalence and providing qualitative feedback. This paper presents a technique, known as the Stepwise Correctness Checking and Scoring (SCCS) technique that checks the correctness of each equation in terms of structural equivalence and provides quantitative feedback. The technique, which is based on the Multiset framework, adapts certain techniques from textual information retrieval involving tokenization, document modelling and similarity evaluation. The performance of the SCCS technique was tested using worked solutions on solving linear algebraic equations in one variable. 350 working schemes comprising of 1385 responses were collected using a marking engine prototype, which has been developed based on the technique. The results show that both the automated analytical scores and the automated overall scores generated by the marking engine exhibit high percent agreement, high correlation and high degree of agreement with manual scores with small average absolute and mixed errors.
Dysfunctional error-related processing in incarcerated youth with elevated psychopathic traits

PubMed Central

Maurer, J. Michael; Steele, Vaughn R.; Cope, Lora M.; Vincent, Gina M.; Stephen, Julia M.; Calhoun, Vince D.; Kiehl, Kent A.

2016-01-01

Adult psychopathic offenders show an increased propensity towards violence, impulsivity, and recidivism. A subsample of youth with elevated psychopathic traits represent a particularly severe subgroup characterized by extreme behavioral problems and comparable neurocognitive deficits as their adult counterparts, including perseveration deficits. Here, we investigate response-locked event-related potential (ERP) components (the error-related negativity [ERN/Ne] related to early error-monitoring processing and the error-related positivity [Pe] involved in later error-related processing) in a sample of incarcerated juvenile male offenders (n = 100) who performed a response inhibition Go/NoGo task. Psychopathic traits were assessed using the Hare Psychopathy Checklist: Youth Version (PCL:YV). The ERN/Ne and Pe were analyzed with classic windowed ERP components and principal component analysis (PCA). Using linear regression analyses, PCL:YV scores were unrelated to the ERN/Ne, but were negatively related to Pe mean amplitude. Specifically, the PCL:YV Facet 4 subscale reflecting antisocial traits emerged as a significant predictor of reduced amplitude of a subcomponent underlying the Pe identified with PCA. This is the first evidence to suggest a negative relationship between adolescent psychopathy scores and Pe mean amplitude. PMID:26930170
Local alignment of two-base encoded DNA sequence

PubMed Central

Homer, Nils; Merriman, Barry; Nelson, Stanley F

2009-01-01

Background DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. Results We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions. Conclusion The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data. PMID:19508732
A regularization corrected score method for nonlinear regression models with covariate error.

PubMed

Zucker, David M; Gorfine, Malka; Li, Yi; Tadesse, Mahlet G; Spiegelman, Donna

2013-03-01

Many regression analyses involve explanatory variables that are measured with error, and failing to account for this error is well known to lead to biased point and interval estimates of the regression coefficients. We present here a new general method for adjusting for covariate error. Our method consists of an approximate version of the Stefanski-Nakamura corrected score approach, using the method of regularization to obtain an approximate solution of the relevant integral equation. We develop the theory in the setting of classical likelihood models; this setting covers, for example, linear regression, nonlinear regression, logistic regression, and Poisson regression. The method is extremely general in terms of the types of measurement error models covered, and is a functional method in the sense of not involving assumptions on the distribution of the true covariate. We discuss the theoretical properties of the method and present simulation results in the logistic regression setting (univariate and multivariate). For illustration, we apply the method to data from the Harvard Nurses' Health Study concerning the relationship between physical activity and breast cancer mortality in the period following a diagnosis of breast cancer. Copyright © 2013, The International Biometric Society.
Retention-error patterns in complex alphanumeric serial-recall tasks.

PubMed

Mathy, Fabien; Varré, Jean-Stéphane

2013-01-01

We propose a new method based on an algorithm usually dedicated to DNA sequence alignment in order to both reliably score short-term memory performance on immediate serial-recall tasks and analyse retention-error patterns. There can be considerable confusion on how performance on immediate serial list recall tasks is scored, especially when the to-be-remembered items are sampled with replacement. We discuss the utility of sequence-alignment algorithms to compare the stimuli to the participants' responses. The idea is that deletion, substitution, translocation, and insertion errors, which are typical in DNA, are also typical putative errors in short-term memory (respectively omission, confusion, permutation, and intrusion errors). We analyse four data sets in which alphanumeric lists included a few (or many) repetitions. After examining the method on two simple data sets, we show that sequence alignment offers 1) a compelling method for measuring capacity in terms of chunks when many regularities are introduced in the material (third data set) and 2) a reliable estimator of individual differences in short-term memory capacity. This study illustrates the difficulty of arriving at a good measure of short-term memory performance, and also attempts to characterise the primary factors underpinning remembering and forgetting.
Risk of error estimated from Palestine pharmacists' knowledge and certainty on the adverse effects and contraindications of active pharmaceutical ingredients and excipients.

PubMed

Shawahna, Ramzi; Al-Rjoub, Mohammed; Al-Horoub, Mohammed M; Al-Hroub, Wasif; Al-Rjoub, Bisan; Al-Nabi, Bashaaer Abd

2016-01-01

This study aimed to investigate community pharmacists' knowledge and certainty of adverse effects and contraindications of pharmaceutical products to estimate the risk of error. Factors influencing their knowledge and certainty were also investigated. The knowledge of community pharmacists was assessed in a cross-sectional design using a multiple-choice questions test on the adverse effects and contraindications of active pharmaceutical ingredients and excipients from May 2014 to March 2015. Self-rated certainty scores were also recorded for each question. Knowledge and certainty scores were combined to estimate the risk of error. Out of 315 subjects, 129 community pharmacists (41.0%) completed the 30 multiple-choice questions test on active ingredients and excipients. Knowledge on active ingredients was associated with the year of graduation and obtaining a licence to practice pharmacy. Knowledge on excipients was associated with the degree obtained. There was higher risk of error in items on excipients than those on ingredients (P<0.01). The knowledge of community pharmacists in Palestine was insufficient with high risk of errors. Knowledge of community pharmacists on the safety issues of active ingredients and excipients need to be improved.
Experimental investigation of false positive errors in auditory species occurrence surveys

USGS Publications Warehouse

Miller, David A.W.; Weir, Linda A.; McClintock, Brett T.; Grant, Evan H. Campbell; Bailey, Larissa L.; Simons, Theodore R.

2012-01-01

False positive errors are a significant component of many ecological data sets, which in combination with false negative errors, can lead to severe biases in conclusions about ecological systems. We present results of a field experiment where observers recorded observations for known combinations of electronically broadcast calling anurans under conditions mimicking field surveys to determine species occurrence. Our objectives were to characterize false positive error probabilities for auditory methods based on a large number of observers, to determine if targeted instruction could be used to reduce false positive error rates, and to establish useful predictors of among-observer and among-species differences in error rates. We recruited 31 observers, ranging in abilities from novice to expert, that recorded detections for 12 species during 180 calling trials (66,960 total observations). All observers made multiple false positive errors and on average 8.1% of recorded detections in the experiment were false positive errors. Additional instruction had only minor effects on error rates. After instruction, false positive error probabilities decreased by 16% for treatment individuals compared to controls with broad confidence interval overlap of 0 (95% CI: -46 to 30%). This coincided with an increase in false negative errors due to the treatment (26%; -3 to 61%). Differences among observers in false positive and in false negative error rates were best predicted by scores from an online test and a self-assessment of observer ability completed prior to the field experiment. In contrast, years of experience conducting call surveys was a weak predictor of error rates. False positive errors were also more common for species that were played more frequently, but were not related to the dominant spectral frequency of the call. Our results corroborate other work that demonstrates false positives are a significant component of species occurrence data collected by auditory methods. Instructing observers to only report detections they are completely certain are correct is not sufficient to eliminate errors. As a result, analytical methods that account for false positive errors will be needed, and independent testing of observer ability is a useful predictor for among-observer variation in observation error rates.
Predictors of driving safety in early Alzheimer disease

PubMed Central

Dawson, J D.; Anderson, S W.; Uc, E Y.; Dastrup, E; Rizzo, M

2009-01-01

Objective: To measure the association of cognition, visual perception, and motor function with driving safety in Alzheimer disease (AD). Methods: Forty drivers with probable early AD (mean Mini-Mental State Examination score 26.5) and 115 elderly drivers without neurologic disease underwent a battery of cognitive, visual, and motor tests, and drove a standardized 35-mile route in urban and rural settings in an instrumented vehicle. A composite cognitive score (COGSTAT) was calculated for each subject based on eight neuropsychological tests. Driving safety errors were noted and classified by a driving expert based on video review. Results: Drivers with AD committed an average of 42.0 safety errors/drive (SD = 12.8), compared to an average of 33.2 (SD = 12.2) for drivers without AD (p < 0.0001); the most common errors were lane violations. Increased age was predictive of errors, with a mean of 2.3 more errors per drive observed for each 5-year age increment. After adjustment for age and gender, COGSTAT was a significant predictor of safety errors in subjects with AD, with a 4.1 increase in safety errors observed for a 1 SD decrease in cognitive function. Significant increases in safety errors were also found in subjects with AD with poorer scores on Benton Visual Retention Test, Complex Figure Test-Copy, Trail Making Subtest-A, and the Functional Reach Test. Conclusion: Drivers with Alzheimer disease (AD) exhibit a range of performance on tests of cognition, vision, and motor skills. Since these tests provide additional predictive value of driving performance beyond diagnosis alone, clinicians may use these tests to help predict whether a patient with AD can safely operate a motor vehicle. GLOSSARY AD = Alzheimer disease; AVLT = Auditory Verbal Learning Test; Blocks = Block Design subtest; BVRT = Benton Visual Retention Test; CFT = Complex Figure Test; CI = confidence interval; COWA = Controlled Oral Word Association; CS = contrast sensitivity; FVA = far visual acuity; JLO = Judgment of Line Orientation; MCI = mild cognitive impairment; MMSE = Mini-Mental State Examination; NVA = near visual acuity; SFM = structure from motion; TMT = Trail-Making Test; UFOV = Useful Field of View. PMID:19204261
Using Microcomputers for Assessment and Error Analysis. Monograph #23.

ERIC Educational Resources Information Center

Hasselbring, Ted S.; And Others

This monograph provides an overview of computer-based assessment and error analysis in the instruction of elementary students with complex medical, learning, and/or behavioral problems. Information on generating and scoring tests using the microcomputer is offered, as are ideas for using computers in the analysis of mathematical strategies and…
Sex Differences in Vestibular/Ocular and Neurocognitive Outcomes After Sport-Related Concussion.

PubMed

Sufrinko, Alicia M; Mucha, Anne; Covassin, Tracey; Marchetti, Greg; Elbin, R J; Collins, Michael W; Kontos, Anthony P

2017-03-01

To examine sex differences in vestibular and oculomotor symptoms and impairment in athletes with sport-related concussion (SRC). The secondary purpose was to replicate previously reported sex differences in total concussion symptoms, and performance on neurocognitive and balance testing. Prospective cross-sectional study of consecutively enrolled clinic patients within 21 days of a SRC. Specialty Concussion Clinic. Included male (n = 36) and female (n = 28) athletes ages 9 to 18 years. Vestibular symptoms and impairment was measured with the Vestibular/Ocular Motor Screening (VOMS). Participants completed the Immediate Post-concussion Assessment and Cognitive Test (ImPACT), Post-concussion Symptom Scale (PCSS), and Balance Error Scoring System (BESS). Sex differences on clinical measures. Females had higher PCSS scores (P = 0.01) and greater VOMS vestibular ocular reflex (VOR) score (P = 0.01) compared with males. There were no sex differences on BESS or ImPACT. Total PCSS scores together with female sex accounted for 45% of the variance in VOR scores. Findings suggest higher VOR scores after SRC in female compared with male athletes. Findings did not extend to other components of the VOMS tool suggesting that sex differences may be specific to certain types of vestibular impairment after SRC. Additional research on the clinical significance of the current findings is needed.
Score-level fusion of two-dimensional and three-dimensional palmprint for personal recognition systems

NASA Astrophysics Data System (ADS)

Chaa, Mourad; Boukezzoula, Naceur-Eddine; Attia, Abdelouahab

2017-01-01

Two types of scores extracted from two-dimensional (2-D) and three-dimensional (3-D) palmprint for personal recognition systems are merged, introducing a local image descriptor for 2-D palmprint-based recognition systems, named bank of binarized statistical image features (B-BSIF). The main idea of B-BSIF is that the extracted histograms from the binarized statistical image features (BSIF) code images (the results of applying the different BSIF descriptor size with the length 12) are concatenated into one to produce a large feature vector. 3-D palmprint contains the depth information of the palm surface. The self-quotient image (SQI) algorithm is applied for reconstructing illumination-invariant 3-D palmprint images. To extract discriminative Gabor features from SQI images, Gabor wavelets are defined and used. Indeed, the dimensionality reduction methods have shown their ability in biometrics systems. Given this, a principal component analysis (PCA)+linear discriminant analysis (LDA) technique is employed. For the matching process, the cosine Mahalanobis distance is applied. Extensive experiments were conducted on a 2-D and 3-D palmprint database with 10,400 range images from 260 individuals. Then, a comparison was made between the proposed algorithm and other existing methods in the literature. Results clearly show that the proposed framework provides a higher correct recognition rate. Furthermore, the best results were obtained by merging the score of B-BSIF descriptor with the score of the SQI+Gabor wavelets+PCA+LDA method, yielding an equal error rate of 0.00% and a recognition rate of rank-1=100.00%.
Inheriting the Learner's View: A Google Glass-Based Wearable Computing Platform for Improving Surgical Trainee Performance.

PubMed

Brewer, Zachary E; Fann, Hutchinson C; Ogden, W David; Burdon, Thomas A; Sheikh, Ahmad Y

2016-01-01

It is speculated that, in operative environments, real-time visualization of the trainee's viewpoint by the instructor may improve performance and teaching efficacy. We hypothesized that introduction of a wearable surgical visualization system allowing the instructor to visualize otherwise "blind" areas in the operative field could improve trainee performance in a simulated operative setting. A total of 11 surgery residents (4 in general surgery training and 7 in an integrated 6-year cardiothoracic surgery program) participated in the study. Google (Mountain View, CA) Glass hardware running proprietary software from CrowdOptic (San Francisco, CA) was utilized for creation of the wearable surgical visualization system. Both the learner and trainer wore the system, and video was streamed from the learner's system in real time to the trainer, who directed the learner to place needles in a simulated operative field. Subjects placed a total of 5 needles in each of 4 quadrants. A composite error score was calculated based on the accuracy of needle placement in relation to the intended needle trajectories as described by the trainer. Time to task completion (TTC) was also measured and participants completed an exit questionnaire. All residents completed the protocol tasks and the survey. Introduction of the wearable surgical visualization system did not affect mean time to task completion (278 ± 50 vs. 282 ± 69 seconds, p = NS). However, mean composite error score fell significantly once the wearable system was deployed (18 ± 5 vs. 15 ± 4, p < 0.05), demonstrating improved accuracy of needle placement. Most of the participants deemed the device unobtrusive, easy to operate, and useful for communication and instruction. This study suggests that wearable surgical visualization systems allowing for adoption of the learner's perspective may be a useful educational adjunct in the training of surgeons. Further evaluations of the efficacy of wearable technology in the operating room environment are warranted. Copyright © 2016 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.

Supporting diagnosis of attention-deficit hyperactive disorder with novelty detection.

PubMed

Lee, Hyoung-Joo; Cho, Sungzoon; Shin, Min-Sup

2008-03-01

Computerized continuous performance test (CPT) is a widely used diagnostic tool for attention-deficit hyperactivity disorder (ADHD). It measures the number of correctly detected stimuli as well as response times. Typically, when calculating a cut-off score for discriminating between normal and abnormal, only the normal children's data are collected. Then the average and standard deviation of each measure or variable is computed. If any of variables is larger than 2 sigma above the average, that child is diagnosed as abnormal. We will call this approach as "T-score 70" classifier. However, its performance has a lot to be desired due to a high false negative error. In order to improve the classification accuracy we propose to use novelty detection approaches for supporting ADHD diagnosis. Novelty detection is a model building framework where a classifier is constructed using only one class of training data and a new input pattern is classified according to its similarity to the training data. A total of eight novelty detectors are introduced and applied to our ADHD datasets collected from two modes of tests, visual and auditory. They are evaluated and compared with the T-score model on validation datasets in terms of false positive and negative error rates, and area under receiver operating characteristics curve (AuROC). Experimental results show that the cut-off score of 70 is suboptimal which leads to a low false positive error but a very high false negative error. A few novelty detectors such as Parzen density estimators yield much more balanced classification performances. Moreover, most novelty detectors outperform the T-score method for most age groups statistically with a significance level of 1% in terms of AuROC. In particular, we recommend the Parzen and Gaussian density estimators, kernel principal component analysis, one-class support vector machine, and K-means clustering novelty detector which can improve upon the T-score method on average by at least 30% for the visual test and 40% for the auditory test. In addition, their performances are relatively stable over various parameter values as long as they are within reasonable ranges. The proposed novelty detection approaches can replace the T-score method which has been considered the "gold standard" for supporting ADHD diagnosis. Furthermore, they can be applied to other psychological tests where only normal data are available.
TEST-RETEST RELIABILITY OF THE CLOSED KINETIC CHAIN UPPER EXTREMITY STABILITY TEST (CKCUEST) IN ADOLESCENTS: RELIABILITY OF CKCUEST IN ADOLESCENTS.

PubMed

de Oliveira, Valéria M A; Pitangui, Ana C R; Nascimento, Vinícius Y S; da Silva, Hítalo A; Dos Passos, Muana H P; de Araújo, Rodrigo C

2017-02-01

The Closed Kinetic Chain Upper Extremity Stability Test (CKCUEST) has been proposed as an option to assess upper limb function and stability; however, there are few studies that support the use of this test in adolescents. The purpose of the present study was to investigate the intersession reliability and agreement of three CKCUEST scores in adolescents and establish clinimetric values for this test. Test-retest reliability. Twenty-five healthy adolescents of both sexes were evaluated. The subjects performed two CKCUEST with an interval of one week between the tests. An intraclass correlation coefficient (ICC 3,3 ) two-way mixed model with a 95% interval of confidence was utilized to determine intersession reliability. A Bland-Altman graph was plotted to analyze the agreement between assessments. The presence of systematic error was evaluated by a one-sample t test. The difference between the evaluation and reevaluation was observed using a paired-sample t test. The level of significance was set at 0.05. Standard error of measurements and minimum detectable changes were calculated. The intersession reliability of the average touches score, normalized score, and power score were 0.68, 0.68 and 0.87, the standard error of measurement were 2.17, 1.35 and 6.49, and the minimal detectable change was 6.01, 3.74 and 17.98, respectively. The presence of systematic error (p < 0.014), the significant difference between the measurements (p < 0.05), and the analysis of the Bland-Altman graph infer that CKCUEST is a discordant test with moderate to excellent reliability when used with adolescents. The CKCUEST is a measurement with moderate to excellent reliability for adolescents. 2b.
Identification of Outliers in Grace Data for Indo-Gangetic Plain Using Various Methods (Z-Score, Modified Z-score and Adjusted Boxplot) and Its Removal

NASA Astrophysics Data System (ADS)

Srivastava, S.

2015-12-01

Gravity Recovery and Climate Experiment (GRACE) data are widely used for the hydrological studies for large scale basins (≥100,000 sq km). GRACE data (Stokes Coefficients or Equivalent Water Height) used for hydrological studies are not direct observations but result from high level processing of raw data from the GRACE mission. Different partner agencies like CSR, GFZ and JPL implement their own methodology and their processing methods are independent from each other. The primary source of errors in GRACE data are due to measurement and modeling errors and the processing strategy of these agencies. Because of different processing methods, the final data from all the partner agencies are inconsistent with each other at some epoch. GRACE data provide spatio-temporal variations in Earth's gravity which is mainly attributed to the seasonal fluctuations in water level on Earth surfaces and subsurface. During the quantification of error/uncertainties, several high positive and negative peaks were observed which do not correspond to any hydrological processes but may emanate from a combination of primary error sources, or some other geophysical processes (e.g. Earthquakes, landslide, etc.) resulting in redistribution of earth's mass. Such peaks can be considered as outliers for hydrological studies. In this work, an algorithm has been designed to extract outliers from the GRACE data for Indo-Gangetic plain, which considers the seasonal variations and the trend in data. Different outlier detection methods have been used such as Z-score, modified Z-score and adjusted boxplot. For verification, assimilated hydrological (GLDAS) and hydro-meteorological data are used as the reference. The results have shown that the consistency amongst all data sets improved significantly after the removal of outliers.
Partially supervised P300 speller adaptation for eventual stimulus timing optimization: target confidence is superior to error-related potential score as an uncertain label

NASA Astrophysics Data System (ADS)

Zeyl, Timothy; Yin, Erwei; Keightley, Michelle; Chau, Tom

2016-04-01

Objective. Error-related potentials (ErrPs) have the potential to guide classifier adaptation in BCI spellers, for addressing non-stationary performance as well as for online optimization of system parameters, by providing imperfect or partial labels. However, the usefulness of ErrP-based labels for BCI adaptation has not been established in comparison to other partially supervised methods. Our objective is to make this comparison by retraining a two-step P300 speller on a subset of confident online trials using naïve labels taken from speller output, where confidence is determined either by (i) ErrP scores, (ii) posterior target scores derived from the P300 potential, or (iii) a hybrid of these scores. We further wish to evaluate the ability of partially supervised adaptation and retraining methods to adjust to a new stimulus-onset asynchrony (SOA), a necessary step towards online SOA optimization. Approach. Eleven consenting able-bodied adults attended three online spelling sessions on separate days with feedback in which SOAs were set at 160 ms (sessions 1 and 2) and 80 ms (session 3). A post hoc offline analysis and a simulated online analysis were performed on sessions two and three to compare multiple adaptation methods. Area under the curve (AUC) and symbols spelled per minute (SPM) were the primary outcome measures. Main results. Retraining using supervised labels confirmed improvements of 0.9 percentage points (session 2, p < 0.01) and 1.9 percentage points (session 3, p < 0.05) in AUC using same-day training data over using data from a previous day, which supports classifier adaptation in general. Significance. Using posterior target score alone as a confidence measure resulted in the highest SPM of the partially supervised methods, indicating that ErrPs are not necessary to boost the performance of partially supervised adaptive classification. Partial supervision significantly improved SPM at a novel SOA, showing promise for eventual online SOA optimization.
Use of Ganga Hospital Open Injury Severity Scoring for determination of salvage versus amputation in open type IIIB injuries of lower limbs in children-An analysis of 52 type IIIB open fractures.

PubMed

Venkatadass, K; Grandhi, Tarani Sai Prasanth; Rajasekaran, S

2017-11-01

Open injuries in children are rare compared to adults. In children with major open injuries, there is no specific scoring system to guide when to amputate or salvage the limb. The use of available adult scoring systems may lead to errors in management. The role of Ganga Hospital Open Injury Severity Scoring (GHOISS) for open injuries in adults is well established and its applicability for pediatric open injuries has not been studied. This study was done to analyse the usefulness of GHOISS in pediatric open injuries and to compare it with MESS(Mangled Extremity Severity Score). All children (0-18 years) who were admitted with Open type IIIB injuries of lower limbs between January 2008 and March 2015 were included. MESS and GHOISS were calculated for all the patients. There were 50 children with 52 type IIIB Open injuries of which 39 had open tibial fractures and 13 had open femur fractures. Out of 52 type IIIB open injuries, 48 were salvaged and 4 were amputated. A MESS score of 7 and above had sensitivity of 25% for amputation while GHOISS of 17 and above was found to be more accurate for determining amputation with sensitivity of 75% and specificity of 93.75%. GHOISS is a reliable predictor of injury severity in type IIIB open fractures in children and can be used as a guide for decision-making. The use of MESS score in children has a lower predictive value compared to GHOISS in deciding amputation versus salvage. A GHOISS of 17 or more has the highest sensitivity and specificity to predict amputation. Copyright © 2017 Elsevier Ltd. All rights reserved.
An NCME Instructional Module on Quality Control Procedures in the Scoring, Equating, and Reporting of Test Scores

ERIC Educational Resources Information Center

Allalouf, Avi

2007-01-01

There is significant potential for error in long production processes that consist of sequential stages, each of which is heavily dependent on the previous stage, such as the SER (Scoring, Equating, and Reporting) process. Quality control procedures are required in order to monitor this process and to reduce the number of mistakes to a minimum. In…
The Exchangeability of Brief Intelligence Tests for Children with Intellectual Giftedness: Illuminating Error Variance Components' Influence on IQs

ERIC Educational Resources Information Center

Irby, Sarah M.; Floyd, Randy G.

2017-01-01

This study examined the exchangeability of total scores (i.e., intelligent quotients [IQs]) from three brief intelligence tests. Tests were administered to 36 children with intellectual giftedness, scored live by one set of primary examiners and later scored by a secondary examiner. For each student, six IQs were calculated, and all 216 values…
Development and initial validation of an endoscopic part-task training box.

PubMed

Thompson, Christopher C; Jirapinyo, Pichamol; Kumar, Nitin; Ou, Amy; Camacho, Andrew; Lengyel, Balazs; Ryan, Michele B

2014-09-01

There is currently no objective and validated methodology available to assess the progress of endoscopy trainees or to determine when technical competence has been achieved. The aims of the current study were to develop an endoscopic part-task simulator and to assess scoring system validity. Fundamental endoscopic skills were determined via kinematic analysis, literature review, and expert interviews. Simulator prototypes and scoring systems were developed to reflect these skills. Validity evidence for content, internal structure, and response process was evaluated. The final training box consisted of five modules (knob control, torque, retroflexion, polypectomy, and navigation and loop reduction). A total of 5 minutes were permitted per module with extra points for early completion. Content validity index (CVI)-realism was 0.88, CVI-relevance was 1.00, and CVI-representativeness was 0.88, giving a composite CVI of 0.92. Overall, 82 % of participants considered the simulator to be capable of differentiating between ability levels, and 93 % thought the simulator should be used to assess ability prior to performing procedures in patients. Inter-item assessment revealed correlations from 0.67 to 0.93, suggesting that tasks were sufficiently correlated to assess the same underlying construct, with each task remaining independent. Each module represented 16.0 % - 26.1 % of the total score, suggesting that no module contributed disproportionately to the composite score. Average box scores were 272.6 and 284.4 (P = 0.94) when performed sequentially, and average score for all participants with proctor 1 was 297.6 and 308.1 with proctor 2 (P = 0.94), suggesting reproducibility and minimal error associated with test administration. A part-task training box and scoring system were developed to assess fundamental endoscopic skills, and validity evidence regarding content, internal structure, and response process was demonstrated. © Georg Thieme Verlag KG Stuttgart · New York.
Construct validity and expert benchmarking of the haptic virtual reality dental simulator.

PubMed

Suebnukarn, Siriwan; Chaisombat, Monthalee; Kongpunwijit, Thanapohn; Rhienmora, Phattanapon

2014-10-01

The aim of this study was to demonstrate construct validation of the haptic virtual reality (VR) dental simulator and to define expert benchmarking criteria for skills assessment. Thirty-four self-selected participants (fourteen novices, fourteen intermediates, and six experts in endodontics) at one dental school performed ten repetitions of three mode tasks of endodontic cavity preparation: easy (mandibular premolar with one canal), medium (maxillary premolar with two canals), and hard (mandibular molar with three canals). The virtual instrument's path length was registered by the simulator. The outcomes were assessed by an expert. The error scores in easy and medium modes accurately distinguished the experts from novices and intermediates at the onset of training, when there was a significant difference between groups (ANOVA, p<0.05). The trend was consistent until trial 5. From trial 6 on, the three groups achieved similar scores. No significant difference was found between groups at the end of training. Error score analysis was not able to distinguish any group at the hard level of training. Instrument path length showed a difference in performance according to groups at the onset of training (ANOVA, p<0.05). This study established construct validity for the haptic VR dental simulator by demonstrating its discriminant capabilities between that of experts and non-experts. The experts' error scores and path length were used to define benchmarking criteria for optimal performance.
Accurate template-based modeling in CASP12 using the IntFOLD4-TS, ModFOLD6, and ReFOLD methods.

PubMed

McGuffin, Liam J; Shuid, Ahmad N; Kempster, Robert; Maghrabi, Ali H A; Nealon, John O; Salehe, Bajuna R; Atkins, Jennifer D; Roche, Daniel B

2018-03-01

Our aim in CASP12 was to improve our Template-Based Modeling (TBM) methods through better model selection, accuracy self-estimate (ASE) scores and refinement. To meet this aim, we developed two new automated methods, which we used to score, rank, and improve upon the provided server models. Firstly, the ModFOLD6_rank method, for improved global Quality Assessment (QA), model ranking and the detection of local errors. Secondly, the ReFOLD method for fixing errors through iterative QA guided refinement. For our automated predictions we developed the IntFOLD4-TS protocol, which integrates the ModFOLD6_rank method for scoring the multiple-template models that were generated using a number of alternative sequence-structure alignments. Overall, our selection of top models and ASE scores using ModFOLD6_rank was an improvement on our previous approaches. In addition, it was worthwhile attempting to repair the detected errors in the top selected models using ReFOLD, which gave us an overall gain in performance. According to the assessors' formula, the IntFOLD4 server ranked 3rd/5th (average Z-score > 0.0/-2.0) on the server only targets, and our manual predictions (McGuffin group) ranked 1st/2nd (average Z-score > -2.0/0.0) compared to all other groups. © 2017 Wiley Periodicals, Inc.
Factors Associated With Negative Attitudes Toward Speaking in Preschool-Age Children Who Do and Do Not Stutter.

PubMed

Groner, Stephen; Walden, Tedra; Jones, Robin

2016-01-01

This study explored relations between the negativity of children's speech-related attitudes as measured by the Communication Attitude Test for Preschool and Kindergarten Children Who Stutter (KiddyCAT; Vanryckeghem & Brutten, 2007) and (a) age; (b) caregiver reports of stuttering and its social consequences; (c) types of disfluencies; and (d) standardized speech, vocabulary, and language scores. Participants were 46 preschool-age children who stutter (CWS; 12 females, 34 males) and 66 preschool-age children who do not stutter (CWNS; 35 females, 31 males). After a conversation, children completed standardized tests and the KiddyCAT while their caregivers completed scales on observed stuttering behaviors and their consequences. The KiddyCAT scores of both the CWS and the CWNS were significantly negatively correlated with age. Both groups' KiddyCAT scores increased with higher scores on the Speech Fluency Rating Scale of the Test of Childhood Stuttering (Gillam, Logan, & Pearson, 2009). Repetitions were a significant contributor to the CWNS's KiddyCAT scores, but no specific disfluency significantly contributed to the CWS's KiddyCAT scores. Greater articulation errors were associated with higher KiddyCAT scores in the CWNS. No standardized test scores were associated with KiddyCAT scores in the CWS. Attitudes that speech is difficult are not associated with similar aspects of communication for CWS and CWNS. Age significantly contributed to negative speech attitudes for CWS, whereas age, repetitions, and articulation errors contributed to negative speech attitudes for CWNS.
Developing a Weighted Measure of Speech Sound Accuracy

PubMed Central

Preston, Jonathan L.; Ramsdell, Heather L.; Oller, D. Kimbrough; Edwards, Mary Louise; Tobin, Stephen J.

2010-01-01

Purpose The purpose is to develop a system for numerically quantifying a speaker’s phonetic accuracy through transcription-based measures. With a focus on normal and disordered speech in children, we describe a system for differentially weighting speech sound errors based on various levels of phonetic accuracy with a Weighted Speech Sound Accuracy (WSSA) score. We then evaluate the reliability and validity of this measure. Method Phonetic transcriptions are analyzed from several samples of child speech, including preschoolers and young adolescents with and without speech sound disorders and typically developing toddlers. The new measure of phonetic accuracy is compared to existing measures, is used to discriminate typical and disordered speech production, and is evaluated to determine whether it is sensitive to changes in phonetic accuracy over time. Results Initial psychometric data indicate that WSSA scores correlate with other measures of phonetic accuracy as well as listeners’ judgments of severity of a child’s speech disorder. The measure separates children with and without speech sound disorders. WSSA scores also capture growth in phonetic accuracy in toddler’s speech over time. Conclusion Results provide preliminary support for the WSSA as a valid and reliable measure of phonetic accuracy in children’s speech. PMID:20699344
The Impact of British Airways Wind Observations on the Goddard Earth Observing System Analyses and Forecasts

NASA Technical Reports Server (NTRS)

Rukhovets, Leonid; Sienkiewicz, M.; Tenenbaum, J.; Kondratyeva, Y.; Owens, T.; Oztunali, M.; Atlas, Robert (Technical Monitor)

2001-01-01

British Airways flight data recorders can provide valuable meteorological information, but they are not available in real-time on the Global Telecommunication System. Information from the flight recorders was used in the Global Aircraft Data Set (GADS) experiment as independent observations to estimate errors in wind analyses produced by major operational centers. The GADS impact on the Goddard Earth Observing System Data Assimilation System (GEOS DAS) analyses was investigated using GEOS-1 DAS version. Recently, a new Data Assimilation System (fvDAS) has been developed at the Data Assimilation Office, NASA Goddard. Using fvDAS , the, GADS impact on analyses and forecasts was investigated. It was shown the GADS data intensify wind speed analyses of jet streams for some cases. Five-day forecast anomaly correlations and root mean squares were calculated for 300, 500 hPa and SLP for six different areas: Northern and Southern Hemispheres, North America, Europe, Asia, USA These scores were obtained as averages over 21 forecasts from January 1998. Comparisons with scores for control experiments without GADS showed a positive impact of the GADS data on forecasts beyond 2-3 days for all levels at the most areas.
Quality of internet-based decision aids for shoulder arthritis: what are patients reading?

PubMed

Somerson, Jeremy S; Bois, Aaron J; Jeng, Jeffrey; Bohsali, Kamal I; Hinchey, John W; Wirth, Michael A

2018-04-11

The objective of this study was to assess the source, quality, accuracy, and completeness of Internet-based information for shoulder arthritis. A web search was performed using three common Internet search engines and the top 50 sites from each search were analyzed. Information sources were categorized into academic, commercial, non-profit, and physician sites. Information quality was measured using the Health On the Net (HON) Foundation principles, content accuracy by counting factual errors and completeness using a custom template. After removal of duplicates and sites that did not provide an overview of shoulder arthritis, 49 websites remained for analysis. The majority of sites were from commercial (n = 16, 33%) and physician (n = 16, 33%) sources. An additional 12 sites (24%) were from an academic institution and five sites (10%) were from a non-profit organization. Commercial sites had the highest number of errors, with a five-fold likelihood of containing an error compared to an academic site. Non-profit sites had the highest HON scores, with an average of 9.6 points on a 16-point scale. The completeness score was highest for academic sites, with an average score of 19.2 ± 6.7 (maximum score of 49 points); other information sources had lower scores (commercial, 15.2 ± 2.9; non-profit, 18.7 ± 6.8; physician, 16.6 ± 6.3). Patient information on the Internet regarding shoulder arthritis is of mixed accuracy, quality, and completeness. Surgeons should actively direct patients to higher-quality Internet sources.
Is adult gait less susceptible than paediatric gait to hip joint centre regression equation error?

PubMed

Kiernan, D; Hosking, J; O'Brien, T

2016-03-01

Hip joint centre (HJC) regression equation error during paediatric gait has recently been shown to have clinical significance. In relation to adult gait, it has been inferred that comparable errors with children in absolute HJC position may in fact result in less significant kinematic and kinetic error. This study investigated the clinical agreement of three commonly used regression equation sets (Bell et al., Davis et al. and Orthotrak) for adult subjects against the equations of Harrington et al. The relationship between HJC position error and subject size was also investigated for the Davis et al. set. Full 3-dimensional gait analysis was performed on 12 healthy adult subjects with data for each set compared to Harrington et al. The Gait Profile Score, Gait Variable Score and GDI-kinetic were used to assess clinical significance while differences in HJC position between the Davis and Harrington sets were compared to leg length and subject height using regression analysis. A number of statistically significant differences were present in absolute HJC position. However, all sets fell below the clinically significant thresholds (GPS <1.6°, GDI-Kinetic <3.6 points). Linear regression revealed a statistically significant relationship for both increasing leg length and increasing subject height with decreasing error in anterior/posterior and superior/inferior directions. Results confirm a negligible clinical error for adult subjects suggesting that any of the examined sets could be used interchangeably. Decreasing error with both increasing leg length and increasing subject height suggests that the Davis set should be used cautiously on smaller subjects. Copyright © 2016 Elsevier B.V. All rights reserved.
Team safety and innovation by learning from errors in long-term care settings.

PubMed

Buljac-Samardžić, Martina; van Woerkom, Marianne; Paauwe, Jaap

2012-01-01

Team safety and team innovation are underexplored in the context of long-term care. Understanding the issues requires attention to how teams cope with error. Team managers could have an important role in developing a team's error orientation and managing team membership instabilities. The aim of this study was to examine the impact of team member stability, team coaching, and a team's error orientation on team safety and innovation. A cross-sectional survey method was employed within 2 long-term care organizations. Team members and team managers received a survey that measured safety and innovation. Team members assessed member stability, team coaching, and team error orientation (i.e., problem-solving and blaming approach). The final sample included 933 respondents from 152 teams. Stable teams and teams with managers who take on the role of coach are more likely to adopt a problem-solving approach and less likely to adopt a blaming approach toward errors. Both error orientations are related to team member ratings of safety and innovation, but only the blaming approach is (negatively) related to manager ratings of innovation. Differences between members' and managers' ratings of safety are greater in teams with relatively high scores for the blaming approach and relatively low scores for the problem-solving approach. Team coaching was found to be positively related to innovation, especially in unstable teams. Long-term care organizations that wish to enhance team safety and innovation should encourage a problem-solving approach and discourage a blaming approach. Team managers can play a crucial role in this by coaching team members to see errors as sources of learning and improvement and ensuring that individuals will not be blamed for errors.
The Effect of an Electronic Checklist on Critical Care Provider Workload, Errors, and Performance.

PubMed

Thongprayoon, Charat; Harrison, Andrew M; O'Horo, John C; Berrios, Ronaldo A Sevilla; Pickering, Brian W; Herasevich, Vitaly

2016-03-01

The strategy used to improve effective checklist use in intensive care unit (ICU) setting is essential for checklist success. This study aimed to test the hypothesis that an electronic checklist could reduce ICU provider workload, errors, and time to checklist completion, as compared to a paper checklist. This was a simulation-based study conducted at an academic tertiary hospital. All participants completed checklists for 6 ICU patients: 3 using an electronic checklist and 3 using an identical paper checklist. In both scenarios, participants had full access to the existing electronic medical record system. The outcomes measured were workload (defined using the National Aeronautics and Space Association task load index [NASA-TLX]), the number of checklist errors, and time to checklist completion. Two independent clinician reviewers, blinded to participant results, served as the reference standard for checklist error calculation. Twenty-one ICU providers participated in this study. This resulted in the generation of 63 simulated electronic checklists and 63 simulated paper checklists. The median NASA-TLX score was 39 for the electronic checklist and 50 for the paper checklist (P = .005). The median number of checklist errors for the electronic checklist was 5, while the median number of checklist errors for the paper checklist was 8 (P = .003). The time to checklist completion was not significantly different between the 2 checklist formats (P = .76). The electronic checklist significantly reduced provider workload and errors without any measurable difference in the amount of time required for checklist completion. This demonstrates that electronic checklists are feasible and desirable in the ICU setting. © The Author(s) 2014.
Using failure mode and effects analysis to improve the safety of neonatal parenteral nutrition.

PubMed

Arenas Villafranca, Jose Javier; Gómez Sánchez, Araceli; Nieto Guindo, Miriam; Faus Felipe, Vicente

2014-07-15

Failure mode and effects analysis (FMEA) was used to identify potential errors and to enable the implementation of measures to improve the safety of neonatal parenteral nutrition (PN). FMEA was used to analyze the preparation and dispensing of neonatal PN from the perspective of the pharmacy service in a general hospital. A process diagram was drafted, illustrating the different phases of the neonatal PN process. Next, the failures that could occur in each of these phases were compiled and cataloged, and a questionnaire was developed in which respondents were asked to rate the following aspects of each error: incidence, detectability, and severity. The highest scoring failures were considered high risk and identified as priority areas for improvements to be made. The evaluation process detected a total of 82 possible failures. Among the phases with the highest number of possible errors were transcription of the medical order, formulation of the PN, and preparation of material for the formulation. After the classification of these 82 possible failures and of their relative importance, a checklist was developed to achieve greater control in the error-detection process. FMEA demonstrated that use of the checklist reduced the level of risk and improved the detectability of errors. FMEA was useful for detecting medication errors in the PN preparation process and enabling corrective measures to be taken. A checklist was developed to reduce errors in the most critical aspects of the process. Copyright © 2014 by the American Society of Health-System Pharmacists, Inc. All rights reserved.
Attitudes of veterinary nurses to the assessment of pain and the use of pain scales.

PubMed

Coleman, D L; Slingsby, L S

2007-04-21

In April 2004, a questionnaire was distributed to veterinary nurses across the UK to assess their attitudes towards the assessment and management of pain in practice. During the six-week collection period, a total of 541 questionnaires were returned, of which 24 (4.25 per cent) were discounted due to completion errors. Overall, the pain scores for procedures involving dogs were higher than those for cats; the veterinary nurses' pain scores were higher for all procedures than those of veterinary surgeons in a previous study. Both veterinary nurses and veterinary surgeons were primarily involved with monitoring pain postoperatively, and 96 per cent of veterinary nurses felt that their knowledge of pain management could be enhanced; 8.1 per cent of the practices used a formal pain scoring system, with the simple descriptive scale most commonly used; 80.3 per cent of the veterinary nurses agreed that a pain scale was a useful clinical tool.
Characterisation of false-positive observations in botanical surveys

PubMed Central

2017-01-01

Errors in botanical surveying are a common problem. The presence of a species is easily overlooked, leading to false-absences; while misidentifications and other mistakes lead to false-positive observations. While it is common knowledge that these errors occur, there are few data that can be used to quantify and describe these errors. Here we characterise false-positive errors for a controlled set of surveys conducted as part of a field identification test of botanical skill. Surveys were conducted at sites with a verified list of vascular plant species. The candidates were asked to list all the species they could identify in a defined botanically rich area. They were told beforehand that their final score would be the sum of the correct species they listed, but false-positive errors counted against their overall grade. The number of errors varied considerably between people, some people create a high proportion of false-positive errors, but these are scattered across all skill levels. Therefore, a person’s ability to correctly identify a large number of species is not a safeguard against the generation of false-positive errors. There was no phylogenetic pattern to falsely observed species; however, rare species are more likely to be false-positive as are species from species rich genera. Raising the threshold for the acceptance of an observation reduced false-positive observations dramatically, but at the expense of more false negative errors. False-positive errors are higher in field surveying of plants than many people may appreciate. Greater stringency is required before accepting species as present at a site, particularly for rare species. Combining multiple surveys resolves the problem, but requires a considerable increase in effort to achieve the same sensitivity as a single survey. Therefore, other methods should be used to raise the threshold for the acceptance of a species. For example, digital data input systems that can verify, feedback and inform the user are likely to reduce false-positive errors significantly. PMID:28533972

Feedback on prescribing errors to junior doctors: exploring views, problems and preferred methods.

PubMed

Bertels, Jeroen; Almoudaris, Alex M; Cortoos, Pieter-Jan; Jacklin, Ann; Franklin, Bryony Dean

2013-06-01

Prescribing errors are common in hospital inpatients. However, the literature suggests that doctors are often unaware of their errors as they are not always informed of them. It has been suggested that providing more feedback to prescribers may reduce subsequent error rates. Only few studies have investigated the views of prescribers towards receiving such feedback, or the views of hospital pharmacists as potential feedback providers. Our aim was to explore the views of junior doctors and hospital pharmacists regarding feedback on individual doctors' prescribing errors. Objectives were to determine how feedback was currently provided and any associated problems, to explore views on other approaches to feedback, and to make recommendations for designing suitable feedback systems. A large London NHS hospital trust. To explore views on current and possible feedback mechanisms, self-administered questionnaires were given to all junior doctors and pharmacists, combining both 5-point Likert scale statements and open-ended questions. Agreement scores for statements regarding perceived prescribing error rates, opinions on feedback, barriers to feedback, and preferences for future practice. Response rates were 49% (37/75) for junior doctors and 57% (57/100) for pharmacists. In general, doctors did not feel threatened by feedback on their prescribing errors. They felt that feedback currently provided was constructive but often irregular and insufficient. Most pharmacists provided feedback in various ways; however some did not or were inconsistent. They were willing to provide more feedback, but did not feel it was always effective or feasible due to barriers such as communication problems and time constraints. Both professional groups preferred individual feedback with additional regular generic feedback on common or serious errors. Feedback on prescribing errors was valued and acceptable to both professional groups. From the results, several suggested methods of providing feedback on prescribing errors emerged. Addressing barriers such as the identification of individual prescribers would facilitate feedback in practice. Research investigating whether or not feedback reduces the subsequent error rate is now needed.
Judgment of line orientation depends on gender, education, and type of error.

PubMed

Caparelli-Dáquer, Egas M; Oliveira-Souza, Ricardo; Moreira Filho, Pedro F

2009-02-01

Visuospatial tasks are particularly proficient at eliciting gender differences during neuropsychological performance. Here we tested the hypothesis that gender and education are related to different types of visuospatial errors on a task of line orientation that allowed the independent scoring of correct responses ("hits", or H) and one type of incorrect responses ("commission errors", or CE). We studied 343 volunteers of roughly comparable ages and with different levels of education. Education and gender were significantly associated with H scores, which were higher in men and in the groups with higher education. In contrast, the differences between men and women on CE depended on education. We concluded that (I) the ability to find the correct responses differs from the ability to avoid the wrong responses amidst an array of possible alternatives, and that (II) education interacts with gender to promote a stable performance on CE earlier in men than in women.
Correction.

PubMed

2015-03-01

In the January 2015 issue of Cyberpsychology, Behavior, and Social Networking (vol. 18, no. 1, pp. 3–7), the article "Individual Differences in Cyber Security Behaviors: An Examination of Who Is Sharing Passwords." by Prof. Monica Whitty et al., has an error in wording in the abstract. The sentence in question was originally printed as: Contrary to our hypotheses, we found older people and individuals who score high on self-monitoring were more likely to share passwords. It should read: Contrary to our hypotheses, we found younger people and individuals who score high on self-monitoring were more likely to share passwords. The authors wish to apologize for the error.
Fusion of Scores in a Detection Context Based on Alpha Integration.

PubMed

Soriano, Antonio; Vergara, Luis; Ahmed, Bouziane; Salazar, Addisson

2015-09-01

We present a new method for fusing scores corresponding to different detectors (two-hypotheses case). It is based on alpha integration, which we have adapted to the detection context. Three optimization methods are presented: least mean square error, maximization of the area under the ROC curve, and minimization of the probability of error. Gradient algorithms are proposed for the three methods. Different experiments with simulated and real data are included. Simulated data consider the two-detector case to illustrate the factors influencing alpha integration and demonstrate the improvements obtained by score fusion with respect to individual detector performance. Two real data cases have been considered. In the first, multimodal biometric data have been processed. This case is representative of scenarios in which the probability of detection is to be maximized for a given probability of false alarm. The second case is the automatic analysis of electroencephalogram and electrocardiogram records with the aim of reproducing the medical expert detections of arousal during sleeping. This case is representative of scenarios in which probability of error is to be minimized. The general superior performance of alpha integration verifies the interest of optimizing the fusing parameters.
Absolute pitch among students at the Shanghai Conservatory of Music: a large-scale direct-test study.

PubMed

Deutsch, Diana; Li, Xiaonuo; Shen, Jing

2013-11-01

This paper reports a large-scale direct-test study of absolute pitch (AP) in students at the Shanghai Conservatory of Music. Overall note-naming scores were very high, with high scores correlating positively with early onset of musical training. Students who had begun training at age ≤5 yr scored 83% correct not allowing for semitone errors and 90% correct allowing for semitone errors. Performance levels were higher for white key pitches than for black key pitches. This effect was greater for orchestral performers than for pianists, indicating that it cannot be attributed to early training on the piano. Rather, accuracy in identifying notes of different names (C, C#, D, etc.) correlated with their frequency of occurrence in a large sample of music taken from the Western tonal repertoire. There was also an effect of pitch range, so that performance on tones in the two-octave range beginning on Middle C was higher than on tones in the octave below Middle C. In addition, semitone errors tended to be on the sharp side. The evidence also ran counter to the hypothesis, previously advanced by others, that the note A plays a special role in pitch identification judgments.
The Effect Of Different Corrective Feedback Methods on the Outcome and Self Confidence of Young Athletes

PubMed Central

Tzetzis, George; Votsis, Evandros; Kourtessis, Thomas

2008-01-01

This experiment investigated the effects of three corrective feedback methods, using different combinations of correction, or error cues and positive feedback for learning two badminton skills with different difficulty (forehand clear - low difficulty, backhand clear - high difficulty). Outcome and self-confidence scores were used as dependent variables. The 48 participants were randomly assigned into four groups. Group A received correction cues and positive feedback. Group B received cues on errors of execution. Group C received positive feedback, correction cues and error cues. Group D was the control group. A pre, post and a retention test was conducted. A three way analysis of variance ANOVA (4 groups X 2 task difficulty X 3 measures) with repeated measures on the last factor revealed significant interactions for each depended variable. All the corrective feedback methods groups, increased their outcome scores over time for the easy skill, but only groups A and C for the difficult skill. Groups A and B had significantly better outcome scores than group C and the control group for the easy skill on the retention test. However, for the difficult skill, group C was better than groups A, B and D. The self confidence scores of groups A and C improved over time for the easy skill but not for group B and D. Again, for the difficult skill, only group C improved over time. Finally a regression analysis depicted that the improvement in performance predicted a proportion of the improvement in self confidence for both the easy and the difficult skill. It was concluded that when young athletes are taught skills of different difficulty, different type of instruction, might be more appropriate in order to improve outcome and self confidence. A more integrated approach on teaching will assist coaches or physical education teachers to be more efficient and effective. Key pointsThe type of the skill is a critical factor in determining the effectiveness of the feedback types.Different instructional methods of corrective feedback could have beneficial effects in the outcome and self-confidence of young athletesInstructions focusing on the correct cues or errors increase performance of easy skills.Positive feedback or correction cues increase self-confidence of easy skills but only the combination of error and correction cues increase self confidence and outcome scores of difficult skills. PMID:24149905
Bootstrap Standard Error Estimates in Dynamic Factor Analysis

ERIC Educational Resources Information Center

Zhang, Guangjian; Browne, Michael W.

2010-01-01

Dynamic factor analysis summarizes changes in scores on a battery of manifest variables over repeated measurements in terms of a time series in a substantially smaller number of latent factors. Algebraic formulae for standard errors of parameter estimates are more difficult to obtain than in the usual intersubject factor analysis because of the…
An Application of Multivariate Generalizability in Selection of Mathematically Gifted Students

ERIC Educational Resources Information Center

Kim, Sungyeun; Berebitsky, Dan

2016-01-01

This study investigates error sources and the effects of each error source to determine optimal weights of the composite score of teacher recommendation letters and self-introduction letters using multivariate generalizability theory. Data were collected from the science education institute for the gifted attached to the university located within…
Judgment of Line Orientation Depends on Gender, Education, and Type of Error

ERIC Educational Resources Information Center

Caparelli-Daquer, Egas M.; Oliveira-Souza, Ricardo; Filho, Pedro F. Moreira

2009-01-01

Visuospatial tasks are particularly proficient at eliciting gender differences during neuropsychological performance. Here we tested the hypothesis that gender and education are related to different types of visuospatial errors on a task of line orientation that allowed the independent scoring of correct responses ("hits", or H) and one type of…
Standard Errors of Equating Differences: Prior Developments, Extensions, and Simulations

ERIC Educational Resources Information Center

Moses, Tim; Zhang, Wenmin

2011-01-01

The purpose of this article was to extend the use of standard errors for equated score differences (SEEDs) to traditional equating functions. The SEEDs are described in terms of their original proposal for kernel equating functions and extended so that SEEDs for traditional linear and traditional equipercentile equating functions can be computed.…
Examiner Errors on the Reynolds Intellectual Assessment Scales Committed by Graduate Student Examiners

ERIC Educational Resources Information Center

Loe, Scott A.

2014-01-01

Protocols from 108 administrations of the Reynolds Intellectual Assessment Scales were evaluated to determine the frequency of examiner errors and their impact on the accuracy of three test composite scores, the Composite Ability Index (CIX), Verbal Ability Index (VIX), and Nonverbal Ability Index (NIX). Students committed at least one…
Quality Control of an OSCE Using Generalizability Theory and Many-Faceted Rasch Measurement

ERIC Educational Resources Information Center

Iramaneerat, Cherdsak; Yudkowsky, Rachel; Myford, Carol M.; Downing, Steven M.

2008-01-01

An Objective Structured Clinical Examination (OSCE) is an effective method for evaluating competencies. However, scores obtained from an OSCE are vulnerable to many potential measurement errors that cases, items, or standardized patients (SPs) can introduce. Monitoring these sources of errors is an important quality control mechanism to ensure…
Estimating the Imputed Social Cost of Errors of Measurement.

DTIC Science & Technology

1983-10-01

social cost of an error of measurement in the score on a unidimensional test, an asymptotic method, based on item response theory, is developed for...11111111 ij MICROCOPY RESOLUTION TEST CHART NATIONAL BUREAU OF STANDARDS-1963-A.5. ,,, I v.P I RR-83-33-ONR 4ESTIMATING THE IMPUTED SOCIAL COST S OF... SOCIAL COST OF ERRORS OF MEASUREMENT Frederic M. Lord This research was sponsored in part by the Personnel and Training Research Programs Psychological
Patient safety awareness among Undergraduate Medical Students in Pakistani Medical School

PubMed Central

Kamran, Rizwana; Bari, Attia; Khan, Rehan Ahmed; Al-Eraky, Mohamed

2018-01-01

Objective: To measure the level of awareness of patient safety among undergraduate medical students in Pakistani Medical School and to find the difference with respect to gender and prior experience with medical error. Methods: This cross-sectional study was conducted at the University of Lahore (UOL), Pakistan from January to March 2017, and comprised final year medical students. Data was collected using a questionnaire ‘APSQ- III’ on 7 point Likert scale. Eight questions were reverse coded. Survey was anonymous. SPSS package 20 was used for statistical analysis. Results: Questionnaire was filled by 122 students, with 81% response rate. The best score 6.17 was given for the ‘team functioning’, followed by 6.04 for ‘long working hours as a cause of medical error’. The domains regarding involvement of patient, confidence to report medical errors and role of training and learning on patient safety scored high in the agreed range of >5. Reverse coded questions about ‘professional incompetence as an error cause’ and ‘disclosure of errors’ showed negative perception. No significant differences of perceptions were found with respect to gender and prior experience with medical error (p= >0.05). Conclusion: Undergraduate medical students at UOL had a positive attitude towards patient safety. However, there were misconceptions about causes of medical errors and error disclosure among students and patient safety education needs to be incorporated in medical curriculum of Pakistan. PMID:29805398
Local Observed-Score Kernel Equating

ERIC Educational Resources Information Center

Wiberg, Marie; van der Linden, Wim J.; von Davier, Alina A.

2014-01-01

Three local observed-score kernel equating methods that integrate methods from the local equating and kernel equating frameworks are proposed. The new methods were compared with their earlier counterparts with respect to such measures as bias--as defined by Lord's criterion of equity--and percent relative error. The local kernel item response…
Development of a body motion interactive system with a weight voting mechanism and computer vision technology

NASA Astrophysics Data System (ADS)

Lin, Chern-Sheng; Chen, Chia-Tse; Shei, Hung-Jung; Lay, Yun-Long; Chiu, Chuang-Chien

2012-09-01

This study develops a body motion interactive system with computer vision technology. This application combines interactive games, art performing, and exercise training system. Multiple image processing and computer vision technologies are used in this study. The system can calculate the characteristics of an object color, and then perform color segmentation. When there is a wrong action judgment, the system will avoid the error with a weight voting mechanism, which can set the condition score and weight value for the action judgment, and choose the best action judgment from the weight voting mechanism. Finally, this study estimated the reliability of the system in order to make improvements. The results showed that, this method has good effect on accuracy and stability during operations of the human-machine interface of the sports training system.
A meta-analysis of inhibitory-control deficits in patients diagnosed with Alzheimer's dementia.

PubMed

Kaiser, Anna; Kuhlmann, Beatrice G; Bosnjak, Michael

2018-05-10

The authors conducted meta-analyses to determine the magnitude of performance impairments in patients diagnosed with Alzheimer's dementia (AD) compared with healthy aging (HA) controls on eight tasks commonly used to measure inhibitory control. Response time (RT) and error rates from a total of 64 studies were analyzed with random-effects models (overall effects) and mixed-effects models (moderator analyses). Large differences between AD patients and HA controls emerged in the basic inhibition conditions of many of the tasks with AD patients often performing slower, overall d = 1.17, 95% CI [0.88-1.45], and making more errors, d = 0.83 [0.63-1.03]. However, comparably large differences were also present in performance on many of the baseline control-conditions, d = 1.01 [0.83-1.19] for RTs and d = 0.44 [0.19-0.69] for error rates. A standardized derived inhibition score (i.e., control-condition score - inhibition-condition score) suggested no significant mean group difference for RTs, d = -0.07 [-0.22-0.08], and only a small difference for errors, d = 0.24 [-0.12-0.60]. Effects systematically varied across tasks and with AD severity. Although the error rate results suggest a specific deterioration of inhibitory-control abilities in AD, further processes beyond inhibitory control (e.g., a general reduction in processing speed and other, task-specific attentional processes) appear to contribute to AD patients' performance deficits observed on a variety of inhibitory-control tasks. Nonetheless, the inhibition conditions of many of these tasks well discriminate between AD patients and HA controls. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Teamwork and error in the operating room: analysis of skills and roles.

PubMed

Catchpole, K; Mishra, A; Handa, A; McCulloch, P

2008-04-01

To analyze the effects of surgical, anesthetic, and nursing teamwork skills on technical outcomes. The value of team skills in reducing adverse events in the operating room is presently receiving considerable attention. Current work has not yet identified in detail how the teamwork and communication skills of surgeons, anesthetists, and nurses affect the course of an operation. Twenty-six laparoscopic cholecystectomies and 22 carotid endarterectomies were studied using direct observation methods. For each operation, teams' skills were scored for the whole team, and for nursing, surgical, and anesthetic subteams on 4 dimensions (leadership and management [LM]; teamwork and cooperation; problem solving and decision making; and situation awareness). Operating time, errors in surgical technique, and other procedural problems and errors were measured as outcome parameters for each operation. The relationships between teamwork scores and these outcome parameters within each operation were examined using analysis of variance and linear regression. Surgical (F(2,42) = 3.32, P = 0.046) and anesthetic (F(2,42) = 3.26, P = 0.048) LM had significant but opposite relationships with operating time in each operation: operating time increased significantly with higher anesthetic but decreased with higher surgical LM scores. Errors in surgical technique had a strong association with surgical situation awareness (F(2,42) = 7.93, P < 0.001) in each operation. Other procedural problems and errors were related to the intraoperative LM skills of the nurses (F(5,1) = 3.96, P = 0.027). Detailed analysis of team interactions and dimensions is feasible and valuable, yielding important insights into relationships between nontechnical skills, technical performance, and operative duration. These results support the concept that interventions designed to improve teamwork and communication may have beneficial effects on technical performance and patient outcome.
Performance of high school male athletes on the Functional Movement Screen™.

PubMed

Smith, Laura J; Creps, James R; Bean, Ryan; Rodda, Becky; Alsalaheen, Bara

2017-09-01

(1) Describe the performance of the Functional Movement Screen™ (FMS™) by reporting the proportion of adolescents with a score of ≤14 and the frequency of asymmetries in a cross-sectional sample; (2) explore associations between FMS™ to age and body mass, and explore the construct validity of the FMS™ against common postural stability measures; (3) examine the inter-rater and test-retest reliability of the FMS™ in adolescents. Cross-sectional. Field-setting. 94 male high-school athletes. The FMS™, Y-Balance Test (YBT) and Balance Error Scoring System (BESS). The median FMS™ composite score was 16 (9-21), 33% of participants scored below the suggested injury risk cutoff composite score of ≤14, and 62.8% had at least one asymmetry. No relationship was observed between the FMS™ to common static/dynamic balance tests. The inter-rater reliability of the FMS™ composite score suggested good reliability (ICC = 0.88, CI 95%:0.77, 0.94) and test-retest reliability for FMS™ composite scores was good with ICC = 0.83 (CI 95%:0.56, 0.95). FMS™ results should be interpreted cautiously with attention to the asymmetries identified during the screen, regardless of composite score. The lack of relationship between the FMS™ and other balance measures supports the notion that multiple screening tests should be used in order to provide a comprehensive picture of the adolescent athlete. Copyright © 2017 Elsevier Ltd. All rights reserved.
Quantification of errors in ordinal outcome scales using shannon entropy: effect on sample size calculations.

PubMed

Mandava, Pitchaiah; Krumpelman, Chase S; Shah, Jharna N; White, Donna L; Kent, Thomas A

2013-01-01

Clinical trial outcomes often involve an ordinal scale of subjective functional assessments but the optimal way to quantify results is not clear. In stroke, the most commonly used scale, the modified Rankin Score (mRS), a range of scores ("Shift") is proposed as superior to dichotomization because of greater information transfer. The influence of known uncertainties in mRS assessment has not been quantified. We hypothesized that errors caused by uncertainties could be quantified by applying information theory. Using Shannon's model, we quantified errors of the "Shift" compared to dichotomized outcomes using published distributions of mRS uncertainties and applied this model to clinical trials. We identified 35 randomized stroke trials that met inclusion criteria. Each trial's mRS distribution was multiplied with the noise distribution from published mRS inter-rater variability to generate an error percentage for "shift" and dichotomized cut-points. For the SAINT I neuroprotectant trial, considered positive by "shift" mRS while the larger follow-up SAINT II trial was negative, we recalculated sample size required if classification uncertainty was taken into account. Considering the full mRS range, error rate was 26.1%±5.31 (Mean±SD). Error rates were lower for all dichotomizations tested using cut-points (e.g. mRS 1; 6.8%±2.89; overall p<0.001). Taking errors into account, SAINT I would have required 24% more subjects than were randomized. We show when uncertainty in assessments is considered, the lowest error rates are with dichotomization. While using the full range of mRS is conceptually appealing, a gain of information is counter-balanced by a decrease in reliability. The resultant errors need to be considered since sample size may otherwise be underestimated. In principle, we have outlined an approach to error estimation for any condition in which there are uncertainties in outcome assessment. We provide the user with programs to calculate and incorporate errors into sample size estimation.

Evaluation of a 3D stereophotogrammetric technique to measure the stone casts of patients with unilateral cleft lip and palate.

PubMed

Sforza, Chiarella; De Menezes, Marcio; Bresciani, Elena; Cerón-Zapata, Ana M; López-Palacio, Ana M; Rodriguez-Ardila, Myriam J; Berrio-Gutiérrez, Lina M

2012-07-01

To assess a three-dimensional stereophotogrammetric method for palatal cast digitization of children with unilateral cleft lip and palate. As part of a collaboration between the University of Milan (Italy) and the University CES of Medellin (Colombia), 96 palatal cast models obtained from neonatal patients with unilateral cleft lip and palate were obtained and digitized using a three-dimensional stereophotogrammetric imaging system. Three-dimensional measurements (cleft width, depth, length) were made separately for the longer and shorter cleft segments on the digital dental cast surface between landmarks, previously marked. Seven linear measurements were computed. Systematic and random errors between operators' tracings, and accuracy on geometric objects of known size were calculated. In addition, mean measurements from three-dimensional stereophotographs were compared statistically with those from direct anthropometry. The three-dimensional method presented good accuracy error (<0.9%) on measuring geometric objects. No systematic errors between operators' measurements were found (p > .05). Statistically significant differences (p < 5%) were noted for different methods (caliper versus stereophotogrammetry) for almost all distances analyzed, with mean absolute difference values ranging between 0.22 and 3.41 mm. Therefore, rates for the technical error of measurement and relative error magnitude were scored as moderate for Ag-Am and poor for Ag-Pg and Am-Pm distances. Generally, caliper values were larger than three-dimensional stereophotogrammetric values. Three-dimensional stereophotogrammetric systems have some advantages over direct anthropometry, and therefore the method could be sufficiently precise and accurate on palatal cast digitization with unilateral cleft lip and palate. This would be useful for clinical analyses in maxillofacial, plastic, and aesthetic surgery.
Predicting preference-based SF-6D index scores from the SF-8 health survey.

PubMed

Wang, P; Fu, A Z; Wee, H L; Lee, J; Tai, E S; Thumboo, J; Luo, N

2013-09-01

To develop and test functions for predicting the preference-based SF-6D index scores from the SF-8 health survey. This study was a secondary analysis of data collected in a population health survey in which respondents (n = 7,529) completed both the SF-36 and the SF-8 questionnaires. We examined seven ordinary least-square estimators for their performance in predicting SF-6D scores from the SF-8 at both the individual and the group levels. In general, all functions performed similarly well in predicting SF-6D scores, and the predictions at the group level were better than predictions at the individual level. At the individual level, 42.5-51.5% of prediction errors were smaller than the minimally important difference (MID) of the SF-6D scores, depending on the function specifications, while almost all prediction errors of the tested functions were smaller than the MID of SF-6D at the group level. At both individual and group levels, the tested functions predicted lower than actual scores at the higher end of the SF-6D scale. Our study developed functions to generate preference-based SF-6D index scores from the SF-8 health survey, the first of its kind. Further research is needed to evaluate the performance and validity of the prediction functions.
Assessment of individual hand performance in box trainers compared to virtual reality trainers.

PubMed

Madan, Atul K; Frantzides, Constantine T; Shervin, Nina; Tebbit, Christopher L

2003-12-01

Training residents in laparoscopic skills is ideally initiated in an inanimate laboratory with both box trainers and virtual reality trainers. Virtual reality trainers have the ability to score individual hand performance although they are expensive. Here we compared the ability to assess dominant and nondominant hand performance in box trainers with virtual reality trainers. Medical students without laparoscopic experience were utilized in this study (n = 16). Each student performed tasks on the LTS 2000, an inanimate box trainer (placing pegs with both hands and transferring pegs from one hand to another), as well as a task on the MIST-VR, a virtual reality trainer (grasping a virtual object and placing it in a virtual receptable with alternating hands). A surgeon scored students for the inanimate box trainer exercises (time and errors) while the MIST-VR scored students (time, economy of movements, and errors for each hand). Statistical analysis included Pearson correlations. Errors and time for the one-handed tasks on the box trainer did not correlate with errors, time, or economy measured for each hand by the MIST-VR (r = 0.01 to 0.30; P = NS). Total errors on the virtual reality trainer did correlate with errors on transferring pege (r = 0.61; P < 0.05). Economy and time of both dominant and nondominant hand from the MIST-VR correlated with time of transferring pegs in the box trainer (r = 0.53 to 0.77; P < 0.05). While individual hand assessment by the box trainer during 2-handed tasks was related to assessment by the virtual reality trainer, individual hand assessment during 1-handed tasks did not correlate with the virtual reality trainer. Virtual reality trainers, such as the MIST-VR, allow assessment of individual hand skills which may lead to improved laparoscopic skill acquisition. It is difficult to assess individual hand performance with box trainers alone.
Impact of variational assimilation using multivariate background error covariances on the simulation of monsoon depressions over India

NASA Astrophysics Data System (ADS)

Dhanya, M.; Chandrasekar, A.

2016-02-01

The background error covariance structure influences a variational data assimilation system immensely. The simulation of a weather phenomenon like monsoon depression can hence be influenced by the background correlation information used in the analysis formulation. The Weather Research and Forecasting Model Data assimilation (WRFDA) system includes an option for formulating multivariate background correlations for its three-dimensional variational (3DVar) system (cv6 option). The impact of using such a formulation in the simulation of three monsoon depressions over India is investigated in this study. Analysis and forecast fields generated using this option are compared with those obtained using the default formulation for regional background error correlations (cv5) in WRFDA and with a base run without any assimilation. The model rainfall forecasts are compared with rainfall observations from the Tropical Rainfall Measurement Mission (TRMM) and the other model forecast fields are compared with a high-resolution analysis as well as with European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim reanalysis. The results of the study indicate that inclusion of additional correlation information in background error statistics has a moderate impact on the vertical profiles of relative humidity, moisture convergence, horizontal divergence and the temperature structure at the depression centre at the analysis time of the cv5/cv6 sensitivity experiments. Moderate improvements are seen in two of the three depressions investigated in this study. An improved thermodynamic and moisture structure at the initial time is expected to provide for improved rainfall simulation. The results of the study indicate that the skill scores of accumulated rainfall are somewhat better for the cv6 option as compared to the cv5 option for at least two of the three depression cases studied, especially at the higher threshold levels. Considering the importance of utilising improved flow-dependent correlation structures for efficient data assimilation, the need for more studies on the impact of background error covariances is obvious.
Detecting determinism with improved sensitivity in time series: rank-based nonlinear predictability score.

PubMed

Naro, Daniel; Rummel, Christian; Schindler, Kaspar; Andrzejak, Ralph G

2014-09-01

The rank-based nonlinear predictability score was recently introduced as a test for determinism in point processes. We here adapt this measure to time series sampled from time-continuous flows. We use noisy Lorenz signals to compare this approach against a classical amplitude-based nonlinear prediction error. Both measures show an almost identical robustness against Gaussian white noise. In contrast, when the amplitude distribution of the noise has a narrower central peak and heavier tails than the normal distribution, the rank-based nonlinear predictability score outperforms the amplitude-based nonlinear prediction error. For this type of noise, the nonlinear predictability score has a higher sensitivity for deterministic structure in noisy signals. It also yields a higher statistical power in a surrogate test of the null hypothesis of linear stochastic correlated signals. We show the high relevance of this improved performance in an application to electroencephalographic (EEG) recordings from epilepsy patients. Here the nonlinear predictability score again appears of higher sensitivity to nonrandomness. Importantly, it yields an improved contrast between signals recorded from brain areas where the first ictal EEG signal changes were detected (focal EEG signals) versus signals recorded from brain areas that were not involved at seizure onset (nonfocal EEG signals).
Detecting determinism with improved sensitivity in time series: Rank-based nonlinear predictability score

NASA Astrophysics Data System (ADS)

Naro, Daniel; Rummel, Christian; Schindler, Kaspar; Andrzejak, Ralph G.

2014-09-01

The rank-based nonlinear predictability score was recently introduced as a test for determinism in point processes. We here adapt this measure to time series sampled from time-continuous flows. We use noisy Lorenz signals to compare this approach against a classical amplitude-based nonlinear prediction error. Both measures show an almost identical robustness against Gaussian white noise. In contrast, when the amplitude distribution of the noise has a narrower central peak and heavier tails than the normal distribution, the rank-based nonlinear predictability score outperforms the amplitude-based nonlinear prediction error. For this type of noise, the nonlinear predictability score has a higher sensitivity for deterministic structure in noisy signals. It also yields a higher statistical power in a surrogate test of the null hypothesis of linear stochastic correlated signals. We show the high relevance of this improved performance in an application to electroencephalographic (EEG) recordings from epilepsy patients. Here the nonlinear predictability score again appears of higher sensitivity to nonrandomness. Importantly, it yields an improved contrast between signals recorded from brain areas where the first ictal EEG signal changes were detected (focal EEG signals) versus signals recorded from brain areas that were not involved at seizure onset (nonfocal EEG signals).
Brain and Music: An Intraoperative Stimulation Mapping Study of a Professional Opera Singer.

PubMed

Riva, Marco; Casarotti, Alessandra; Comi, Alessandro; Pessina, Federico; Bello, Lorenzo

2016-09-01

Music is one of the most sophisticated and fascinating functions of the brain. Yet, how music is instantiated within the brain is not fully characterized. Singing is a peculiar aspect of music, in which both musical and linguistic skills are required to provide a merged vocal output. Identifying the neural correlates of this process is relevant for both clinical and research purposes. An adult white man with a presumed left temporal glioma was studied. He is a professional opera singer. A tailored music evaluation, the Montreal Battery of Evaluation of Amusia, was performed preoperatively and postoperatively, with long-term follow-up. Intraoperative stimulation mapping (ISM) with awake surgery with a specific music evaluation battery was used to identify and preserve the cortical and subcortical structures subserving music, along with standard motor-sensory and language mapping. A total resection of a grade I glioma was achieved. The Montreal Battery of Evaluation of Amusia reported an improvement in musical scores after the surgery. ISM consistently elicited several types of errors in the superior temporal gyrus and, to a lesser extent, in the inferior frontal operculum. Most errors occurred during score reading; fewer errors were elicited during the assessment of rhythm. No spontaneous errors were recorded. These areas did not overlap with eloquent sites for counting or naming. ISM and a tailored music battery enabled better characterization of a specific network within the brain subserving score reading independently from speech with long-term clinical impact. Copyright © 2016 Elsevier Inc. All rights reserved.
Using domain knowledge and domain-inspired discourse model for coreference resolution for clinical narratives

PubMed Central

Roth, Dan

2013-01-01

Objective This paper presents a coreference resolution system for clinical narratives. Coreference resolution aims at clustering all mentions in a single document to coherent entities. Materials and methods A knowledge-intensive approach for coreference resolution is employed. The domain knowledge used includes several domain-specific lists, a knowledge intensive mention parsing, and task informed discourse model. Mention parsing allows us to abstract over the surface form of the mention and represent each mention using a higher-level representation, which we call the mention's semantic representation (SR). SR reduces the mention to a standard form and hence provides better support for comparing and matching. Existing coreference resolution systems tend to ignore discourse aspects and rely heavily on lexical and structural cues in the text. The authors break from this tradition and present a discourse model for “person” type mentions in clinical narratives, which greatly simplifies the coreference resolution. Results This system was evaluated on four different datasets which were made available in the 2011 i2b2/VA coreference challenge. The unweighted average of F1 scores (over B-cubed, MUC and CEAF) varied from 84.2% to 88.1%. These experiments show that domain knowledge is effective for different mention types for all the datasets. Discussion Error analysis shows that most of the recall errors made by the system can be handled by further addition of domain knowledge. The precision errors, on the other hand, are more subtle and indicate the need to understand the relations in which mentions participate for building a robust coreference system. Conclusion This paper presents an approach that makes an extensive use of domain knowledge to significantly improve coreference resolution. The authors state that their system and the knowledge sources developed will be made publicly available. PMID:22781192
The Role of Linguistic Modification in Nursing Education.

PubMed

Moore, Brenda S; Clark, Michele C

2016-06-01

English-as-a-second-language (ESL) nursing students fail to graduate from programs at alarming rates. For many of these students, academic failure results from poor performance on multiple choice examinations, which frequently contain linguistic errors. A remedy for these errors is to linguistically modify examination questions. This study assessed the effects of linguistic modification on examination scores. Scores of ESL and non-ESL nursing students were compared on an experimental multiple choice examination and a control examination. After exclusion, 67 ESL and 252 non-ESL students completed the experimental examination; 68 ESL and 257 non-ESL students completed the control examination. Both ESL and non-ESL students scored higher on the experimental examination than on the control examination. For ESL students, the increase in observed means between the experimental and control examination was 0.6%; for non-ESL students, the increase was 0.48%. [J Nurs Educ. 2016;55(6):309-315.]. Copyright 2016, SLACK Incorporated.
Investigating Predictors of Spelling Ability for Adults with Low Literacy Skills

PubMed Central

Talwar, Amani; Cote, Nicole Gilbert; Binder, Katherine S.

2014-01-01

This study examined whether the spelling abilities of adults with low literacy skills could be predicted by their phonological, orthographic, and morphological awareness. Sixty Adult Basic Education (ABE) students completed several literacy tasks. It was predicted that scores on phonological and orthographic tasks would explain variance in spelling scores, whereas scores on morphological tasks may not. Scores on all phonological tasks and on one orthographic task emerged as significant predictors of spelling scores. Additionally, error analyses revealed a limited influence of morphological knowledge in spelling attempts. Implications for ABE instruction are discussed. PMID:25364644
Differences in medication knowledge and risk of errors between graduating nursing students and working registered nurses: comparative study.

PubMed

Simonsen, Bjoerg O; Daehlin, Gro K; Johansson, Inger; Farup, Per G

2014-11-21

Nurses experience insufficient medication knowledge; particularly in drug dose calculations, but also in drug management and pharmacology. The weak knowledge could be a result of deficiencies in the basic nursing education, or lack of continuing maintenance training during working years. The aim of this study was to compare the medication knowledge, certainty and risk of error between graduating bachelor students in nursing and experienced registered nurses. Bachelor students in closing term and registered nurses with at least one year job experience underwent a multiple choice test in pharmacology, drug management and drug dose calculations: 3x14 questions with 3-4 alternative answers (score 0-42). Certainty of each answer was recorded with score 0-3, 0-1 indicating need for assistance. Risk of error was scored 1-3, where 3 expressed high risk: being certain that a wrong answer was correct. The results are presented as mean and (SD). Participants were 243 graduating students (including 29 men), aged 28.2 (7.6) years, and 203 registered nurses (including 16 men), aged 42.0 (9.3) years and with a working experience of 12.4 years (9.2). The knowledge among the nurses was found to be superior to that of the students: 68.9%(8.0) and 61.5%(7.8) correct answers, respectively, (p < 0.001). The difference was largest in drug management and dose calculations. The improvement occurred during the first working year. The nurses expressed higher degree of certainty and the risk of error was lower, both overall and for each topic (p < 0.01). Low risk of error was associated with high knowledge and high sense of coping (p < 0.001). The medication knowledge among experienced nurses was superior to bachelor students in nursing, but nevertheless insufficient. As much as 25% of the answers to the drug management questions would lead to high risk of error. More emphasis should be put into the basic nursing education and in the introduction to medication procedures in clinical practice to improve the nurses' medication knowledge and reduce the risk of error.
Comparison of SVM, RF and ELM on an Electronic Nose for the Intelligent Evaluation of Paraffin Samples.

PubMed

Men, Hong; Fu, Songlin; Yang, Jialin; Cheng, Meiqi; Shi, Yan; Liu, Jingjing

2018-01-18

Paraffin odor intensity is an important quality indicator when a paraffin inspection is performed. Currently, paraffin odor level assessment is mainly dependent on an artificial sensory evaluation. In this paper, we developed a paraffin odor analysis system to classify and grade four kinds of paraffin samples. The original feature set was optimized using Principal Component Analysis (PCA) and Partial Least Squares (PLS). Support Vector Machine (SVM), Random Forest (RF), and Extreme Learning Machine (ELM) were applied to three different feature data sets for classification and level assessment of paraffin. For classification, the model based on SVM, with an accuracy rate of 100%, was superior to that based on RF, with an accuracy rate of 98.33-100%, and ELM, with an accuracy rate of 98.01-100%. For level assessment, the R² related to the training set was above 0.97 and the R² related to the test set was above 0.87. Through comprehensive comparison, the generalization of the model based on ELM was superior to those based on SVM and RF. The scoring errors for the three models were 0.0016-0.3494, lower than the error of 0.5-1.0 measured by industry standard experts, meaning these methods have a higher prediction accuracy for scoring paraffin level.
Static Postural Stability in Chronic Ankle Instability, An Ankle Sprain and Healthy Ankles.

PubMed

Kwon, Yong Ung

2018-05-18

To identify the single leg balance (SLB) test that discriminates among healthy, coper, and chronic ankle instability (CAI) groups and to determine effects of ankle muscles on the balance error scoring system (BESS) among the three populations. 60 subjects (20 per group) performed the SLB test with eyes open (EO) and eyes closed (EC). Normalized mean amplitude (NMA) of the tibia anterior (TA), fibularis longus (FL), and medial gastrocnemius (MG) muscles and BESS were measured while performing the SLB test. The coper group had a lower error score than the CAI group in the EC. NMA was greater in the CAI group compared to in the healthy and coper groups regardless of muscle type. NMA of the TA was less than the PL and MG regardless of the group in the EO. The CAI group demonstrated greater NMAs of the PL and MG than the healthy and coper groups in the EC. The CAI group demonstrated greater NMA of the PL and MG by compensating their ankle muscles in the EO and EC. BESS suggests that the coper group may have coping mechanisms to stabilize static postural control compared to the CAI group. The EC may be better to detect static postural instability in the CAI or coper group. © Georg Thieme Verlag KG Stuttgart · New York.
Application of failure mode and effect analysis in an assisted reproduction technology laboratory.

PubMed

Intra, Giulia; Alteri, Alessandra; Corti, Laura; Rabellotti, Elisa; Papaleo, Enrico; Restelli, Liliana; Biondo, Stefania; Garancini, Maria Paola; Candiani, Massimo; Viganò, Paola

2016-08-01

Assisted reproduction technology laboratories have a very high degree of complexity. Mismatches of gametes or embryos can occur, with catastrophic consequences for patients. To minimize the risk of error, a multi-institutional working group applied failure mode and effects analysis (FMEA) to each critical activity/step as a method of risk assessment. This analysis led to the identification of the potential failure modes, together with their causes and effects, using the risk priority number (RPN) scoring system. In total, 11 individual steps and 68 different potential failure modes were identified. The highest ranked failure modes, with an RPN score of 25, encompassed 17 failures and pertained to "patient mismatch" and "biological sample mismatch". The maximum reduction in risk, with RPN reduced from 25 to 5, was mostly related to the introduction of witnessing. The critical failure modes in sample processing were improved by 50% in the RPN by focusing on staff training. Three indicators of FMEA success, based on technical skill, competence and traceability, have been evaluated after FMEA implementation. Witnessing by a second human operator should be introduced in the laboratory to avoid sample mix-ups. These findings confirm that FMEA can effectively reduce errors in assisted reproduction technology laboratories. Copyright © 2016 Reproductive Healthcare Ltd. Published by Elsevier Ltd. All rights reserved.
Nurse perceptions of organizational culture and its association with the culture of error reporting: a case of public sector hospitals in Pakistan.

PubMed

Jafree, Sara Rizvi; Zakar, Rubeena; Zakar, Muhammad Zakria; Fischer, Florian

2016-01-05

There is an absence of formal error tracking systems in public sector hospitals of Pakistan and also a lack of literature concerning error reporting culture in the health care sector. Nurse practitioners have front-line knowledge and rich exposure about both the organizational culture and error sharing in hospital settings. The aim of this paper was to investigate the association between organizational culture and the culture of error reporting, as perceived by nurses. The authors used the "Practice Environment Scale-Nurse Work Index Revised" to measure the six dimensions of organizational culture. Seven questions were used from the "Survey to Solicit Information about the Culture of Reporting" to measure error reporting culture in the region. Overall, 309 nurses participated in the survey, including female nurses from all designations such as supervisors, instructors, ward-heads, staff nurses and student nurses. We used SPSS 17.0 to perform a factor analysis. Furthermore, descriptive statistics, mean scores and multivariable logistic regression were used for the analysis. Three areas were ranked unfavorably by nurse respondents, including: (i) the error reporting culture, (ii) staffing and resource adequacy, and (iii) nurse foundations for quality of care. Multivariable regression results revealed that all six categories of organizational culture, including: (1) nurse manager ability, leadership and support, (2) nurse participation in hospital affairs, (3) nurse participation in governance, (4) nurse foundations of quality care, (5) nurse-coworkers relations, and (6) nurse staffing and resource adequacy, were positively associated with higher odds of error reporting culture. In addition, it was found that married nurses and nurses on permanent contract were more likely to report errors at the workplace. Public healthcare services of Pakistan can be improved through the promotion of an error reporting culture, reducing staffing and resource shortages and the development of nursing care plans.
Relationship between blood manganese levels and children's attention, cognition, behavior, and academic performance--a nationwide cross-sectional study.

PubMed

Bhang, Soo-Young; Cho, Soo-Churl; Kim, Jae-Won; Hong, Yun-Chul; Shin, Min-Sup; Yoo, Hee Jeong; Cho, In Hee; Kim, Yeni; Kim, Bung-Nyun

2013-10-01

Manganese (Mn) is neurotoxic at high concentrations. However, Mn is an essential element that can protect against oxidative damage; thus, extremely low levels of Mn might be harmful. Our aim was to examine whether either high or low environmental Mn exposure is related to academic and attention function development among school-aged children. This cross-sectional study included 1089 children 8-11 years of age living in five representative areas in South Korea. Blood Mn, blood lead, and urine cotinine were measured. We assessed IQ with the Wechsler Abbreviated Scale of Intelligence; attention with a computerized continuous performance test called the Attention-deficit/hyperactivity disorder (ADHD) Diagnostic System (ADS), the Korean version of the Stroop Color-Word Test, the Children's Color Trails Test (CCTT), and the ADHD Rating Scale; academic functions with the Learning Disability Evaluation Scale (LDES); and emotional and behavioral problems with the Korean version of the Child Behavior Checklist (CBCL). We further assessed the presence of ADHD using a highly structured diagnostic interview, the Diagnostic Interview Schedule for Children Version IV (DISC-IV). The median blood concentration of Mn was 14.14 µg/L. We observed a nonlinear association between the CCTT2 completion time and the CPT commission error (F=3.14, p=0.03 and F=4.05, p=0.01, respectively). We divided the data into three groups: lower (<8.154 µg/L), and upper 5th percentile (>21.453 µg/L) and middle 90th percentile to determine whether a lack or overload of Mn could cause adverse effects. After adjusting for urine cotinine, blood lead, children's IQ, and other potential confounders, the high Mn group showed lower scores in thinking (B=-0.83, p=0.006), reading (B=-0.93, p=0.004), calculations (B=-0.72, p=0.005), and LQ (B=-4.06, p=0.006) in the LDES and a higher commission error in the CPT (B=8.02, p=0.048). The low Mn group showed lower color scores in the Stroop test (B=-3.24, p=0.040). We found that excess Mn in children is associated with lower scores of thinking, reading, calculation, and LQ in the LDES and higher scores of commission error in the ADS test. In contrast, lower Mn in children is associated with lower color scores in the Stroop test. The findings of this cross-sectional study suggest that excess exposure or deficiency of Mn can cause harmful effects in children. Copyright © 2013 Elsevier Inc. All rights reserved.
Using implicit association tests in age-heterogeneous samples: The importance of cognitive abilities and quad model processes.

PubMed

Wrzus, Cornelia; Egloff, Boris; Riediger, Michaela

2017-08-01

Implicit association tests (IATs) are increasingly used to indirectly assess people's traits, attitudes, or other characteristics. In addition to measuring traits or attitudes, IAT scores also reflect differences in cognitive abilities because scores are based on reaction times (RTs) and errors. As cognitive abilities change with age, questions arise concerning the usage and interpretation of IATs for people of different age. To address these questions, the current study examined how cognitive abilities and cognitive processes (i.e., quad model parameters) contribute to IAT results in a large age-heterogeneous sample. Participants (N = 549; 51% female) in an age-stratified sample (range = 12-88 years) completed different IATs and 2 tasks to assess cognitive processing speed and verbal ability. From the IAT data, D2-scores were computed based on RTs, and quad process parameters (activation of associations, overcoming bias, detection, guessing) were estimated from individual error rates. Substantial IAT scores and quad processes except guessing varied with age. Quad processes AC and D predicted D2-scores of the content-specific IAT. Importantly, the effects of cognitive abilities and quad processes on IAT scores were not significantly moderated by participants' age. These findings suggest that IATs seem suitable for age-heterogeneous studies from adolescence to old age when IATs are constructed and analyzed appropriately, for example with D-scores and process parameters. We offer further insight into how D-scoring controls for method effects in IATs and what IAT scores capture in addition to implicit representations of characteristics. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
A Smart-Home System to Unobtrusively and Continuously Assess Loneliness in Older Adults.

PubMed

Austin, Johanna; Dodge, Hiroko H; Riley, Thomas; Jacobs, Peter G; Thielke, Stephen; Kaye, Jeffrey

2016-01-01

Loneliness is a common condition in older adults and is associated with increased morbidity and mortality, decreased sleep quality, and increased risk of cognitive decline. Assessing loneliness in older adults is challenging due to the negative desirability biases associated with being lonely. Thus, it is necessary to develop more objective techniques to assess loneliness in older adults. In this paper, we describe a system to measure loneliness by assessing in-home behavior using wireless motion and contact sensors, phone monitors, and computer software as well as algorithms developed to assess key behaviors of interest. We then present results showing the accuracy of the system in detecting loneliness in a longitudinal study of 16 older adults who agreed to have the sensor platform installed in their own homes for up to 8 months. We show that loneliness is significantly associated with both time out-of-home ([Formula: see text] and [Formula: see text]) and number of computer sessions ([Formula: see text] and [Formula: see text]). [Formula: see text] for the model was 0.35. We also show the model's ability to predict out-of-sample loneliness, demonstrating that the correlation between true loneliness and predicted out-of-sample loneliness is 0.48. When compared with the University of California at Los Angeles loneliness score, the normalized mean absolute error of the predicted loneliness scores was 0.81 and the normalized root mean squared error was 0.91. These results represent first steps toward an unobtrusive, objective method for the prediction of loneliness among older adults, and mark the first time multiple objective behavioral measures that have been related to this key health outcome.
Skill assessment of Korea operational oceanographic system (KOOS)

NASA Astrophysics Data System (ADS)

Kim, J.; Park, K.

2016-02-01

For the ocean forecast system in Korea, the Korea operational oceanographic system (KOOS) has been developed and pre-operated since 2009 by the Korea institute of ocean science and technology (KIOST) funded by the Korean government. KOOS provides real time information and forecasts for marine environmental conditions in order to support all kinds of activities in the sea. Furthermore, more significant purpose of the KOOS information is to response and support to maritime problems and accidents such as oil spill, red-tide, shipwreck, extraordinary wave, coastal inundation and so on. Accordingly, it is essential to evaluate prediction accuracy and efforts to improve accuracy. The forecast accuracy should meet or exceed target benchmarks before its products are approved for release to the public.In this paper, we conduct error quantification of the forecasts using skill assessment technique for judgement of the KOOS performance. Skill assessment statistics includes the measures of errors and correlations such as root-mean-square-error (RMSE), mean bias (MB), correlation coefficient (R), and index of agreement (IOA) and the frequency with which errors lie within specified limits termed the central frequency (CF).The KOOS provides 72-hour daily forecast data such as air pressure, wind, water elevation, currents, wave, water temperature, and salinity produced by meteorological and hydrodynamic numerical models of WRF, ROMS, MOM5, WAM, WW3, and MOHID. The skill assessment has been performed through comparison of model results with in-situ observation data (Figure 1) for the period from 1 July, 2010 to 31 March, 2015 in Table 1 and model errors have been quantified with skill scores and CF determined by acceptable criteria depending on predicted variables (Table 2). Moreover, we conducted quantitative evaluation of spatio-temporal pattern correlation between numerical models and observation data such as sea surface temperature (SST) and sea surface current produced by ocean sensor in satellites and high frequency (HF) radar, respectively. Those quantified errors can allow to objective assessment of the KOOS performance and used can reveal different aspects of model inefficiency. Based on these results, various model components are tested and developed in order to improve forecast accuracy.
The Influence of Item Calibration Error on Variable-Length Computerized Adaptive Testing

ERIC Educational Resources Information Center

Patton, Jeffrey M.; Cheng, Ying; Yuan, Ke-Hai; Diao, Qi

2013-01-01

Variable-length computerized adaptive testing (VL-CAT) allows both items and test length to be "tailored" to examinees, thereby achieving the measurement goal (e.g., scoring precision or classification) with as few items as possible. Several popular test termination rules depend on the standard error of the ability estimate, which in turn depends…

The Errors of Our Ways

ERIC Educational Resources Information Center

Kane, Michael

2011-01-01

Errors don't exist in our data, but they serve a vital function. Reality is complicated, but our models need to be simple in order to be manageable. We assume that attributes are invariant over some conditions of observation, and once we do that we need some way of accounting for the variability in observed scores over these conditions of…
Type I Error Inflation for Detecting DIF in the Presence of Impact

ERIC Educational Resources Information Center

DeMars, Christine E.

2010-01-01

In this brief explication, two challenges for using differential item functioning (DIF) measures when there are large group differences in true proficiency are illustrated. Each of these difficulties may lead to inflated Type I error rates, for very different reasons. One problem is that groups matched on observed score are not necessarily well…
The relationship between social capital and quality management systems in European hospitals: a quantitative study.

PubMed

Hammer, Antje; Arah, Onyebuchi A; Dersarkissian, Maral; Thompson, Caroline A; Mannion, Russell; Wagner, Cordula; Ommen, Oliver; Sunol, Rosa; Pfaff, Holger

2013-01-01

Strategic leadership is an important organizational capability and is essential for quality improvement in hospital settings. Furthermore, the quality of leadership depends crucially on a common set of shared values and mutual trust between hospital management board members. According to the concept of social capital, these are essential requirements for successful cooperation and coordination within groups. We assume that social capital within hospital management boards is an important factor in the development of effective organizational systems for overseeing health care quality. We hypothesized that the degree of social capital within the hospital management board is associated with the effectiveness and maturity of the quality management system in European hospitals. We used a mixed-method approach to data collection and measurement in 188 hospitals in 7 European countries. For this analysis, we used responses from hospital managers. To test our hypothesis, we conducted a multilevel linear regression analysis of the association between social capital and the quality management system score at the hospital level, controlling for hospital ownership, teaching status, number of beds, number of board members, organizational culture, and country clustering. The average social capital score within a hospital management board was 3.3 (standard deviation: 0.5; range: 1-4) and the average hospital score for the quality management index was 19.2 (standard deviation: 4.5; range: 0-27). Higher social capital was associated with higher quality management system scores (regression coefficient: 1.41; standard error: 0.64, p=0.029). The results suggest that a higher degree of social capital exists in hospitals that exhibit higher maturity in their quality management systems. Although uncontrolled confounding and reverse causation cannot be completely ruled out, our new findings, along with the results of previous research, could have important implications for the work of hospital managers and the design and evaluation of hospital quality management systems.
The Relationship between Social Capital and Quality Management Systems in European Hospitals: A Quantitative Study

PubMed Central

Hammer, Antje; Arah, Onyebuchi A.; DerSarkissian, Maral; Thompson, Caroline A.; Mannion, Russell; Wagner, Cordula; Ommen, Oliver; Sunol, Rosa; Pfaff, Holger

2013-01-01

Background Strategic leadership is an important organizational capability and is essential for quality improvement in hospital settings. Furthermore, the quality of leadership depends crucially on a common set of shared values and mutual trust between hospital management board members. According to the concept of social capital, these are essential requirements for successful cooperation and coordination within groups. Objectives We assume that social capital within hospital management boards is an important factor in the development of effective organizational systems for overseeing health care quality. We hypothesized that the degree of social capital within the hospital management board is associated with the effectiveness and maturity of the quality management system in European hospitals. Methods We used a mixed-method approach to data collection and measurement in 188 hospitals in 7 European countries. For this analysis, we used responses from hospital managers. To test our hypothesis, we conducted a multilevel linear regression analysis of the association between social capital and the quality management system score at the hospital level, controlling for hospital ownership, teaching status, number of beds, number of board members, organizational culture, and country clustering. Results The average social capital score within a hospital management board was 3.3 (standard deviation: 0.5; range: 1-4) and the average hospital score for the quality management index was 19.2 (standard deviation: 4.5; range: 0-27). Higher social capital was associated with higher quality management system scores (regression coefficient: 1.41; standard error: 0.64, p=0.029). Conclusion The results suggest that a higher degree of social capital exists in hospitals that exhibit higher maturity in their quality management systems. Although uncontrolled confounding and reverse causation cannot be completely ruled out, our new findings, along with the results of previous research, could have important implications for the work of hospital managers and the design and evaluation of hospital quality management systems. PMID:24392027
Accuracy of emotion labeling in children of parents diagnosed with bipolar disorder.

PubMed

Hanford, Lindsay C; Sassi, Roberto B; Hall, Geoffrey B

2016-04-01

Emotion labeling deficits have been posited as an endophenotype for bipolar disorder (BD) as they have been observed in both patients and their first-degree relatives. It remains unclear whether these deficits exist secondary to the development of psychiatric symptoms or whether they can be attributed to risk for psychopathology. To explore this, we investigated emotion processing in symptomatic and asymptomatic high-risk bipolar offspring (HRO) and healthy children of healthy parents (HCO). Symptomatic (n:18, age: 13.8 ± 2.6 years, 44% female) and asymptomatic (n:12, age: 12.8 ± 3.0 years, 42% female) HRO and age- and sex-matched HCO (n:20, age: 13.3 ± 2.5 years, 45% female) performed an emotion-labeling task. Total number of errors, emotion category and intensity of emotion error scores were compared. Correlations between total error scores and symptom severity were also investigated. Compared to HCO, both HRO groups made more errors on the adult face task (pcor=0.014). The HRO group were 2.3 times [90%CI:0.9-6.3] more likely and 4.3 times [90%CI:1.3-14.3] more likely to make errors on sad and angry faces, respectively. With the exception of sad face type errors, we observed no significant differences in error patterns between symptomatic and asymptomatic HRO, and no correlations between symptom severity and total number of errors. This study was cross-sectional in design, limiting our ability to infer trajectories or heritability of these deficits. This study provides further support for emotion labeling deficits as a candidate endophenotype for BD. Our study also suggests these deficits are not attributable to the presence of psychiatric symptoms. Copyright © 2016 Elsevier B.V. All rights reserved.
Improving Analysis: Dealing with Information Processing Errors

DTIC Science & Technology

2006-11-01

obviating this issue, psychological test data provides information that is normed and scored in a common standardized metric (e.g., a z score. A z score is a...to take these into account when interpreting psychological test information. Clinicians are not alone in their relative inability to outperform...1980); M. Snyder and B. Campbell, " Testing hypotheses about other people: The role of the hypothesis," Personality and Social Psychology Bulletin, No. 6
The association between EMS workplace safety culture and safety outcomes.

PubMed

Weaver, Matthew D; Wang, Henry E; Fairbanks, Rollin J; Patterson, Daniel

2012-01-01

Prior studies have highlighted wide variation in emergency medical services (EMS) workplace safety culture across agencies. To determine the association between EMS workplace safety culture scores and patient or provider safety outcomes. We administered a cross-sectional survey to EMS workers affiliated with a convenience sample of agencies. We recruited these agencies from a national EMS management organization. We used the EMS Safety Attitudes Questionnaire (EMS-SAQ) to measure workplace safety culture and the EMS Safety Inventory (EMS-SI), a tool developed to capture self-reported safety outcomes from EMS workers. The EMS-SAQ provides reliable and valid measures of six domains: safety climate, teamwork climate, perceptions of management, working conditions, stress recognition, and job satisfaction. A panel of medical directors, emergency medical technicians and paramedics, and occupational epidemiologists developed the EMS-SI to measure self-reported injury, medical errors and adverse events, and safety-compromising behaviors. We used hierarchical linear models to evaluate the association between EMS-SAQ scores and EMS-SI safety outcome measures. Sixteen percent of all respondents reported experiencing an injury in the past three months, four of every 10 respondents reported an error or adverse event (AE), and 89% reported safety-compromising behaviors. Respondents reporting injury scored lower on five of the six domains of safety culture. Respondents reporting an error or AE scored lower for four of the six domains, while respondents reporting safety-compromising behavior had lower safety culture scores for five of the six domains. Individual EMS worker perceptions of workplace safety culture are associated with composite measures of patient and provider safety outcomes. This study is preliminary evidence of the association between safety culture and patient or provider safety outcomes.
Preventability of early vs. late readmissions in an academic medical center

PubMed Central

Graham, Kelly L.; Dike, Ogechi; Doctoroff, Lauren; Jupiter, Marisa; Vanka, Anita

2017-01-01

Background It is unclear if the 30-day unplanned hospital readmission rate is a plausible accountability metric. Objective Compare preventability of hospital readmissions, between an early period [0–7 days post-discharge] and a late period [8–30 days post-discharge]. Compare causes of readmission, and frequency of markers of clinical instability 24h prior to discharge between early and late readmissions. Design, setting, patients 120 patient readmissions in an academic medical center between 1/1/2009-12/31/2010 Measures Sum-score based on a standard algorithm that assesses preventability of each readmission based on blinded hospitalist review; average causation score for seven types of adverse events; rates of markers of clinical instability within 24h prior to discharge. Results Readmissions were significantly more preventable in the early compared to the late period [median preventability sum score 8.5 vs. 8.0, p = 0.03]. There were significantly more management errors as causative events for the readmission in the early compared to the late period [mean causation score [scale 1–6, 6 most causal] 2.0 vs. 1.5, p = 0.04], and these errors were significantly more preventable in the early compared to the late period [mean preventability score 1.9 vs 1.5, p = 0.03]. Patients readmitted in the early period were significantly more likely to have mental status changes documented 24h prior to hospital discharge than patients readmitted in the late period [12% vs. 0%, p = 0.01]. Conclusions Readmissions occurring in the early period were significantly more preventable. Early readmissions were associated with more management errors, and mental status changes 24h prior to discharge. Seven-day readmissions may be a better accountability measure. PMID:28622384
The association between EMS workplace safety culture and safety outcomes

PubMed Central

Weaver, Matthew D.; Wang, Henry E.; Fairbanks, Rollin J.; Patterson, Daniel

2012-01-01

Objective Prior studies have highlighted wide variation in EMS workplace safety culture across agencies. We sought to determine the association between EMS workplace safety culture scores and patient or provider safety outcomes. Methods We administered a cross-sectional survey to EMS workers affiliated with a convenience sample of agencies. We recruited these agencies from a national EMS management organization. We used the EMS Safety Attitudes Questionnaire (EMS-SAQ) to measure workplace safety culture and the EMS Safety Inventory (EMS-SI), a tool developed to capture self-reported safety outcomes from EMS workers. The EMS-SAQ provides reliable and valid measures of six domains: safety climate, teamwork climate, perceptions of management, perceptions of working conditions, stress recognition, and job satisfaction. A panel of medical directors, paramedics, and occupational epidemiologists developed the EMS-SI to measure self-reported injury, medical errors and adverse events, and safety-compromising behaviors. We used hierarchical linear models to evaluate the association between EMS-SAQ scores and EMS-SI safety outcome measures. Results Sixteen percent of all respondents reported experiencing an injury in the past 3 months, four of every 10 respondents reported an error or adverse event (AE), and 90% reported safety-compromising behaviors. Respondents reporting injury scored lower on 5 of the 6 domains of safety culture. Respondents reporting an error or AE scored lower for 4 of the 6 domains, while respondents reporting safety-compromising behavior had lower safety culture scores for 5 of 6 domains. Conclusions Individual EMS worker perceptions of workplace safety culture are associated with composite measures of patient and provider safety outcomes. This study is preliminary evidence of the association between safety culture and patient or provider safety outcomes. PMID:21950463
The effect of methylphenidate on Internet video game play in children with attention-deficit/hyperactivity disorder.

PubMed

Han, Doug Hyun; Lee, Young Sik; Na, Churl; Ahn, Jee Young; Chung, Un Sun; Daniels, Melissa A; Haws, Charlotte A; Renshaw, Perry F

2009-01-01

A number of studies about attention-deficit/hyperactivity disorder (ADHD) and Internet video game play have examined the prefrontal cortex and dopaminergic system. Stimulants such as methylphenidate (MPH), given to treat ADHD, and video game play have been found to increase synaptic dopamine. We hypothesized that MPH treatment would reduce Internet use in subjects with co-occurring ADHD and Internet video game addictions. Sixty-two children (52 males and 10 females), drug-naive, diagnosed with ADHD, and Internet video game players, participated in this study. At the beginning of the study and after 8 weeks of treatment with Concerta (OROS methylphenidate HCl, Seoul, Korea), participants were assessed with Young's Internet Addiction Scale, Korean version (YIAS-K), Korean DuPaul's ADHD Rating Scale, and the Visual Continuous Performance Test. Their Internet usage time was also recorded. After 8 weeks of treatment, the YIAS-K scores and Internet usage times were significantly reduced. The changes in the YIAS-K scores between the baseline and 8-week assessments were positively correlated with the changes in total and inattention scores from the Korean DuPaul's ADHD Rating Scale, as well as omission errors from the Visual Continuous Performance Test. There was also a significant difference in the number of omission errors among non-Internet-addicted, mildly Internet addicted, and severely Internet addicted participants. We suggest that Internet video game playing might be a means of self-medication for children with ADHD. In addition, we cautiously suggest that MPH might be evaluated as a potential treatment of Internet addiction.
[Proposal for the systematization of the elastographic study of mammary lesions through ultrasound scan].

PubMed

Fleury, Eduardo de Faria Castro; Fleury, Jose Carlos Vendramini; Oliveira, Vilmar Marques de; Rinaldi, Jose Francisco; Piato, Sebastiao; Roveda Junior, Decio

2009-01-01

Proposal of systematization for the elastographic study in the ultrasound routine. Evaluation was made of 308 patients forwarded to the breast intervention service in the CTC-Genesis from May 1, 2007 to March 1, 2008 to perform percutaneous breast biopsy. Prior to the percutaneous biopsy, an ultrasound study and an elastography were performed. Lesions were primarily analyzed and classified according to the Bi-Rads lexicon criteria by the conventional ultrasound scan (B mode). The elastography was then performed and analyzed in accordance with the systematization proposed by the authors, using images obtained during compression and after decompression of the area of interest. Lesions were classified following the system developed by the authors using a four-point scale, where scores (1) and (2) were considered benign, score (3) probably benign and score (4) suspicion of malignancy. Results obtained by the two methods were compared with the histological results using the areas within the ROC (receiver operator curves) curves. The area within the curve for elastography was of 0.952 with a confidence interval between 0.910 and 0.966, error of 0.023, and of 0.867 with a confidence interval between 0.823 and 0.903, error of 0.0333 for the ultrasound. When the areas were compared, a difference between the curves of 0.026 was observed, which was statistically significant. This work shows the systematization of the elastographic study using information obtained during compression and after decompression of the ultrasound scan sample, thus showing that elastography might enhance the assessment of risk of malignancy for lesions characterized by the ultrasound.
Prophylactic Bracing Has No Effect on Lower Extremity Alignment or Functional Performance.

PubMed

Hueber, Garrett A; Hall, Emily A; Sage, Brad W; Docherty, Carrie L

2017-07-01

Prophylactic ankle bracing is commonly used during physical activity. Understanding how bracing affects body mechanics is critically important when discussing both injury prevention and sport performance. The purpose is to determine if ankle bracing affects lower extremity mechanics during the Landing Error Scoring System test (LESS) and Sage Sway Index (SSI). Thirty physically active participants volunteered for this study. Participants completed the LESS and SSI in both a braced and unsupported conditions. Total errors were recorded for the LESS. Total errors and time (seconds) were recorded for the SSI. The Wilcoxon signed-rank test was utilized to evaluate any differences between the brace conditions for each dependent variable. A priori alpha level was set at p<0.05. The Wilcoxon signed-rank test yielded no significant difference between the braced and unsupported conditions for the LESS (Z=-0.35, p=0.72), SSI time (Z=-0.36, p=0.72), or SSI Errors (Z=-0.37, p=0.71). Ankle braces had no effect on subjective clinical assessments of lower extremity alignment or postural stability. Utilization of a prophylactic support at the ankle did not substantially alter the proximal components of the lower kinetic chain. © Georg Thieme Verlag KG Stuttgart · New York.
Verification of Meteorological and Oceanographic Ensemble Forecasts in the U.S. Navy

NASA Astrophysics Data System (ADS)

Klotz, S.; Hansen, J.; Pauley, P.; Sestak, M.; Wittmann, P.; Skupniewicz, C.; Nelson, G.

2013-12-01

The Navy Ensemble Forecast Verification System (NEFVS) has been promoted recently to operational status at the U.S. Navy's Fleet Numerical Meteorology and Oceanography Center (FNMOC). NEFVS processes FNMOC and National Centers for Environmental Prediction (NCEP) meteorological and ocean wave ensemble forecasts, gridded forecast analyses, and innovation (observational) data output by FNMOC's data assimilation system. The NEFVS framework consists of statistical analysis routines, a variety of pre- and post-processing scripts to manage data and plot verification metrics, and a master script to control application workflow. NEFVS computes metrics that include forecast bias, mean-squared error, conditional error, conditional rank probability score, and Brier score. The system also generates reliability and Receiver Operating Characteristic diagrams. In this presentation we describe the operational framework of NEFVS and show examples of verification products computed from ensemble forecasts, meteorological observations, and forecast analyses. The construction and deployment of NEFVS addresses important operational and scientific requirements within Navy Meteorology and Oceanography. These include computational capabilities for assessing the reliability and accuracy of meteorological and ocean wave forecasts in an operational environment, for quantifying effects of changes and potential improvements to the Navy's forecast models, and for comparing the skill of forecasts from different forecast systems. NEFVS also supports the Navy's collaboration with the U.S. Air Force, NCEP, and Environment Canada in the North American Ensemble Forecast System (NAEFS) project and with the Air Force and the National Oceanic and Atmospheric Administration (NOAA) in the National Unified Operational Prediction Capability (NUOPC) program. This program is tasked with eliminating unnecessary duplication within the three agencies, accelerating the transition of new technology, such as multi-model ensemble forecasting, to U.S. Department of Defense use, and creating a superior U.S. global meteorological and oceanographic prediction capability. Forecast verification is an important component of NAEFS and NUOPC. Distribution Statement A: Approved for Public Release; distribution is unlimited
Verification of Meteorological and Oceanographic Ensemble Forecasts in the U.S. Navy

NASA Astrophysics Data System (ADS)

Klotz, S. P.; Hansen, J.; Pauley, P.; Sestak, M.; Wittmann, P.; Skupniewicz, C.; Nelson, G.

2012-12-01

The Navy Ensemble Forecast Verification System (NEFVS) has been promoted recently to operational status at the U.S. Navy's Fleet Numerical Meteorology and Oceanography Center (FNMOC). NEFVS processes FNMOC and National Centers for Environmental Prediction (NCEP) meteorological and ocean wave ensemble forecasts, gridded forecast analyses, and innovation (observational) data output by FNMOC's data assimilation system. The NEFVS framework consists of statistical analysis routines, a variety of pre- and post-processing scripts to manage data and plot verification metrics, and a master script to control application workflow. NEFVS computes metrics that include forecast bias, mean-squared error, conditional error, conditional rank probability score, and Brier score. The system also generates reliability and Receiver Operating Characteristic diagrams. In this presentation we describe the operational framework of NEFVS and show examples of verification products computed from ensemble forecasts, meteorological observations, and forecast analyses. The construction and deployment of NEFVS addresses important operational and scientific requirements within Navy Meteorology and Oceanography (METOC). These include computational capabilities for assessing the reliability and accuracy of meteorological and ocean wave forecasts in an operational environment, for quantifying effects of changes and potential improvements to the Navy's forecast models, and for comparing the skill of forecasts from different forecast systems. NEFVS also supports the Navy's collaboration with the U.S. Air Force, NCEP, and Environment Canada in the North American Ensemble Forecast System (NAEFS) project and with the Air Force and the National Oceanic and Atmospheric Administration (NOAA) in the National Unified Operational Prediction Capability (NUOPC) program. This program is tasked with eliminating unnecessary duplication within the three agencies, accelerating the transition of new technology, such as multi-model ensemble forecasting, to U.S. Department of Defense use, and creating a superior U.S. global meteorological and oceanographic prediction capability. Forecast verification is an important component of NAEFS and NUOPC.
Processes and Procedures for Estimating Score Reliability and Precision

ERIC Educational Resources Information Center

Bardhoshi, Gerta; Erford, Bradley T.

2017-01-01

Precision is a key facet of test development, with score reliability determined primarily according to the types of error one wants to approximate and demonstrate. This article identifies and discusses several primary forms of reliability estimation: internal consistency (i.e., split-half, KR-20, a), test-retest, alternate forms, interscorer, and…
A Strategy for Replacing Sum Scoring

ERIC Educational Resources Information Center

Ramsay, James O.; Wiberg, Marie

2017-01-01

This article promotes the use of modern test theory in testing situations where sum scores for binary responses are now used. It directly compares the efficiencies and biases of classical and modern test analyses and finds an improvement in the root mean squared error of ability estimates of about 5% for two designed multiple-choice tests and…
Uncovering Multivariate Structure in Classroom Observations in the Presence of Rater Errors

ERIC Educational Resources Information Center

McCaffrey, Daniel F.; Yuan, Kun; Savitsky, Terrance D.; Lockwood, J. R.; Edelen, Maria O.

2015-01-01

We examine the factor structure of scores from the CLASS-S protocol obtained from observations of middle school classroom teaching. Factor analysis has been used to support both interpretations of scores from classroom observation protocols, like CLASS-S, and the theories about teaching that underlie them. However, classroom observations contain…
Isometric Force Regulation in Children.

ERIC Educational Resources Information Center

Lazarus, Jo-Anne C.; And Others

1995-01-01

Isometric pinch force regulation was investigated in children and adults using a visuo-motor tracking paradigm. Younger children aged 5-7 years performed significantly worse than older children aged 9-11 years and adults in terms of an overall error score as well as a correlation score, which is believed to reflect the ability to predict the…
Incorporating Quality Scores in Meta-Analysis

ERIC Educational Resources Information Center

Ahn, Soyeon; Becker, Betsy Jane

2011-01-01

This paper examines the impact of quality-score weights in meta-analysis. A simulation examines the roles of study characteristics such as population effect size (ES) and its variance on the bias and mean square errors (MSEs) of the estimators for several patterns of relationship between quality and ES, and for specific patterns of systematic…
Attenuation of the Squared Canonical Correlation Coefficient under Varying Estimates of Score Reliability

ERIC Educational Resources Information Center

Wilson, Celia M.

2010-01-01

Research pertaining to the distortion of the squared canonical correlation coefficient has traditionally been limited to the effects of sampling error and associated correction formulas. The purpose of this study was to compare the degree of attenuation of the squared canonical correlation coefficient under varying conditions of score reliability.…

Application of Component Scoring to a Complicated Cognitive Domain.

ERIC Educational Resources Information Center

Tatsuoka, Kikumi K.; Yamamoto, Kentaro

This study used the Montague-Riley Test to introduce a new scoring procedure that revealed errors in cognitive processes occurring at subcomponents of an electricity problem. The test, consisting of four parts with 36 open-ended problems each, was administered to 250 high school students. A computer program, ELTEST, was written applying a…
Landing Technique Improvements After an Aquatic-Based Neuromuscular Training Program in Physically Active Women.

PubMed

Scarneo, Samantha E; Root, Hayley J; Martinez, Jessica C; Denegar, Craig; Casa, Douglas J; Mazerolle, Stephanie M; Dann, Catie L; Aerni, Giselle A; DiStefano, Lindsay J

2017-01-01

Neuromuscular training programs (NTPs) improve landing technique and decrease vertical ground-reaction forces (VGRFs), resulting in injury-risk reduction. NTPs in an aquatic environment may elicit the same improvements as land-based programs with reduced joint stress. To examine the effects of an aquatic NTP on landing technique as measured by the Landing Error Scoring System (LESS) and VGRFs, immediately and 4 mo after the intervention. Repeated measures, pool and laboratory. Fifteen healthy, recreationally active women (age 21 ± 2 y, mass 62.02 ± 8.18 kg, height 164.74 ± 5.97 cm) who demonstrated poor landing technique (LESS-Real Time > 4). All participants completed an aquatic NTP 3 times/wk for 6 wk. Participants' landing technique was evaluated using a jump-landing task immediately before (PRE), immediately after (POST), and 4 mo after (RET) the intervention period. A single rater, blinded to time point, graded all videos using the LESS, which is a valid and reliable movement-screening tool. Peak VGRFs were measured during the stance phase of the jump-landing test. Repeated-measure analyses of variance with planned comparisons were performed to explore differences between time points. LESS scores were lower at POST (4.46 ± 1.69 errors) and at RET (4.2 ± 1.72 errors) than at PRE (6.30 ± 1.78 errors) (P < .01). No significant differences were observed between POST and RET (P > .05). Participants also landed with significantly lower peak VGRFs (P < .01) from PRE (2.69 ± .72 N) to POST (2.23 ± .66 N). The findings introduce evidence that an aquatic NTP improves landing technique and suggest that improvements are retained over time. These results show promise of using an aquatic NTP when there is a desire to reduce joint loading, such as early stages of rehabilitation, to improve biomechanics and reduce injury risk.
How Radiation Oncologists Would Disclose Errors: Results of a Survey of Radiation Oncologists and Trainees

DOE Office of Scientific and Technical Information (OSTI.GOV)

Evans, Suzanne B., E-mail: Suzannne.evans@yale.edu; Yu, James B.; Chagpar, Anees

2012-10-01

Purpose: To analyze error disclosure attitudes of radiation oncologists and to correlate error disclosure beliefs with survey-assessed disclosure behavior. Methods and Materials: With institutional review board exemption, an anonymous online survey was devised. An email invitation was sent to radiation oncologists (American Society for Radiation Oncology [ASTRO] gold medal winners, program directors and chair persons of academic institutions, and former ASTRO lecturers) and residents. A disclosure score was calculated based on the number or full, partial, or no disclosure responses chosen to the vignette-based questions, and correlation was attempted with attitudes toward error disclosure. Results: The survey received 176 responses:more » 94.8% of respondents considered themselves more likely to disclose in the setting of a serious medical error; 72.7% of respondents did not feel it mattered who was responsible for the error in deciding to disclose, and 3.9% felt more likely to disclose if someone else was responsible; 38.0% of respondents felt that disclosure increased the likelihood of a lawsuit, and 32.4% felt disclosure decreased the likelihood of lawsuit; 71.6% of respondents felt near misses should not be disclosed; 51.7% thought that minor errors should not be disclosed; 64.7% viewed disclosure as an opportunity for forgiveness from the patient; and 44.6% considered the patient's level of confidence in them to be a factor in disclosure. For a scenario that could be considerable, a non-harmful error, 78.9% of respondents would not contact the family. Respondents with high disclosure scores were more likely to feel that disclosure was an opportunity for forgiveness (P=.003) and to have never seen major medical errors (P=.004). Conclusions: The surveyed radiation oncologists chose to respond with full disclosure at a high rate, although ideal disclosure practices were not uniformly adhered to beyond the initial decision to disclose the occurrence of the error.« less
Effect of thematic map misclassification on landscape multi-metric assessment.

PubMed

Kleindl, William J; Powell, Scott L; Hauer, F Richard

2015-06-01

Advancements in remote sensing and computational tools have increased our awareness of large-scale environmental problems, thereby creating a need for monitoring, assessment, and management at these scales. Over the last decade, several watershed and regional multi-metric indices have been developed to assist decision-makers with planning actions of these scales. However, these tools use remote-sensing products that are subject to land-cover misclassification, and these errors are rarely incorporated in the assessment results. Here, we examined the sensitivity of a landscape-scale multi-metric index (MMI) to error from thematic land-cover misclassification and the implications of this uncertainty for resource management decisions. Through a case study, we used a simplified floodplain MMI assessment tool, whose metrics were derived from Landsat thematic maps, to initially provide results that were naive to thematic misclassification error. Using a Monte Carlo simulation model, we then incorporated map misclassification error into our MMI, resulting in four important conclusions: (1) each metric had a different sensitivity to error; (2) within each metric, the bias between the error-naive metric scores and simulated scores that incorporate potential error varied in magnitude and direction depending on the underlying land cover at each assessment site; (3) collectively, when the metrics were combined into a multi-metric index, the effects were attenuated; and (4) the index bias indicated that our naive assessment model may overestimate floodplain condition of sites with limited human impacts and, to a lesser extent, either over- or underestimated floodplain condition of sites with mixed land use.
Evaluating the Ocean Component of the US Navy Earth System Model

NASA Astrophysics Data System (ADS)

Zamudio, L.

2017-12-01

Ocean currents, temperature, and salinity observations are used to evaluate the ocean component of the US Navy Earth System Model. The ocean and atmosphere components of the system are an eddy-resolving (1/12.5° equatorial resolution) version of the HYbrid Coordinate Ocean Model (HYCOM), and a T359L50 version of the NAVy Global Environmental Model (NAVGEM), respectively. The system was integrated in hindcast mode and the ocean results are compared against unassimilated observations, a stand-alone version of HYCOM, and the Generalized Digital Environment Model ocean climatology. The different observation types used in the system evaluation are: drifting buoys, temperature profiles, salinity profiles, and acoustical proxies (mixed layer depth, sonic layer depth, below layer gradient, and acoustical trapping). To evaluate the system's performance in each different metric, a scorecard is used to translate the system's errors into scores, which provide an indication of the system's skill in both space and time.
Validation of automatic joint space width measurements in hand radiographs in rheumatoid arthritis

PubMed Central

Schenk, Olga; Huo, Yinghe; Vincken, Koen L.; van de Laar, Mart A.; Kuper, Ina H. H.; Slump, Kees C. H.; Lafeber, Floris P. J. G.; Bernelot Moens, Hein J.

2016-01-01

Abstract. Computerized methods promise quick, objective, and sensitive tools to quantify progression of radiological damage in rheumatoid arthritis (RA). Measurement of joint space width (JSW) in finger and wrist joints with these systems performed comparable to the Sharp–van der Heijde score (SHS). A next step toward clinical use, validation of precision and accuracy in hand joints with minimal damage, is described with a close scrutiny of sources of error. A recently developed system to measure metacarpophalangeal (MCP) and proximal interphalangeal (PIP) joints was validated in consecutive hand images of RA patients. To assess the impact of image acquisition, measurements on radiographs from a multicenter trial and from a recent prospective cohort in a single hospital were compared. Precision of the system was tested by comparing the joint space in mm in pairs of subsequent images with a short interval without progression of SHS. In case of incorrect measurements, the source of error was analyzed with a review by human experts. Accuracy was assessed by comparison with reported measurements with other systems. In the two series of radiographs, the system could automatically locate and measure 1003/1088 (92.2%) and 1143/1200 (95.3%) individual joints, respectively. In joints with a normal SHS, the average (SD) size of MCP joints was 1.7±0.2 and 1.6±0.3 mm in the two series of radiographs, and of PIP joints 1.0±0.2 and 0.9±0.2 mm. The difference in JSW between two serial radiographs with an interval of 6 to 12 months and unchanged SHS was 0.0±0.1 mm, indicating very good precision. Errors occurred more often in radiographs from the multicenter cohort than in a more recent series from a single hospital. Detailed analysis of the 55/1125 (4.9%) measurements that had a discrepant paired measurement revealed that variation in the process of image acquisition (exposure in 15% and repositioning in 57%) was a more frequent source of error than incorrect delineation by the software (25%). Various steps in the validation of an automated measurement system for JSW of MCP and PIP joints are described. The use of serial radiographs from different sources, with a short interval and limited damage, is helpful to detect sources of error. Image acquisition, in particular repositioning, is a dominant source of error. PMID:27921071
A system to measure the data quality of spectral remote-sensing reflectance of aquatic environments

NASA Astrophysics Data System (ADS)

Wei, Jianwei; Lee, Zhongping; Shang, Shaoling

2016-11-01

Spectral remote-sensing reflectance (Rrs, sr-1) is the key for ocean color retrieval of water bio-optical properties. Since Rrs from in situ and satellite systems are subject to errors or artifacts, assessment of the quality of Rrs data is critical. From a large collection of high quality in situ hyperspectral Rrs data sets, we developed a novel quality assurance (QA) system that can be used to objectively evaluate the quality of an individual Rrs spectrum. This QA scheme consists of a unique Rrs spectral reference and a score metric. The reference system includes Rrs spectra of 23 optical water types ranging from purple blue to yellow waters, with an upper and a lower bound defined for each water type. The scoring system is to compare any target Rrs spectrum with the reference and a score between 0 and 1 will be assigned to the target spectrum, with 1 for perfect Rrs spectrum and 0 for unusable Rrs spectrum. The effectiveness of this QA system is evaluated with both synthetic and in situ Rrs spectra and it is found to be robust. Further testing is performed with the NOMAD data set as well as with satellite Rrs over coastal and oceanic waters, where questionable or likely erroneous Rrs spectra are shown to be well identifiable with this QA system. Our results suggest that applications of this QA system to in situ data sets can improve the development and validation of bio-optical algorithms and its application to ocean color satellite data can improve the short-term and long-term products by objectively excluding questionable Rrs data.
A Novel Scoring Metrics for Quality Assurance of Ocean Color Observations

NASA Astrophysics Data System (ADS)

Wei, J.; Lee, Z.

2016-02-01

Interpretation of the ocean bio-optical properties from ocean color observations depends on the quality of the ocean color data, specifically the spectrum of remote sensing reflectance (Rrs). The in situ and remotely measured Rrs spectra are inevitably subject to errors induced by instrument calibration, sea-surface correction and atmospheric correction, and other environmental factors. Great efforts have been devoted to the ocean color calibration and validation. Yet, there exist no objective and consensus criteria for assessment of the ocean color data quality. In this study, the gap is filled by developing a novel metrics for such data quality assurance and quality control (QA/QC). This new QA metrics is not intended to discard "suspicious" Rrs spectra from available datasets. Rather, it takes into account the Rrs spectral shapes and amplitudes as a whole and grades each Rrs spectrum. This scoring system is developed based on a large ensemble of in situ hyperspectral remote sensing reflectance data measured from various aquatic environments and processed with robust procedures. This system is further tested with the NASA bio-Optical Marine Algorithm Data set (NOMAD), with results indicating significant improvements in the estimation of bio-optical properties when Rrs spectra marked with higher quality assurance are used. This scoring system is further verified with simulated data and satellite ocean color data in various regions, and we envision higher quality ocean color products with the implementation of such a quality screening system.
Assessing the predictive value of the American Board of Family Practice In-training Examination.

PubMed

Replogle, William H; Johnson, William D

2004-03-01

The American Board of Family Practice In-training Examination (ABFP ITE) is a cognitive examination similar in content to the ABFP Certification Examination (CE). The ABFP ITE is widely used in family medicine residency programs. It was originally developed and intended to be used for assessment of groups of residents. Despite lack of empirical support, however, some residency programs are using ABFP ITE scores as individual resident performance indicators. This study's objective was to estimate the positive predictive value of the ABFP ITE for identifying residents at risk for poor performance on the ABFP CE or a subsequent ABFP ITE. We used a normal distribution model for correlated test scores and Monte Carlo simulation to investigate the effect of test reliability (measurement errors) on the positive predictive value of the ABFP ITE. The positive predictive value of the composite score was .72. The positive predictive value of the eight specialty subscales ranged from .26 to .57. Only the composite score of the ABFP ITE has acceptable positive predictive value to be used as part of a comprehension resident evaluation system. The ABFP ITE specialty subscales do not have sufficient positive predictive value or reliability to warrant use as performance indicators.
SU-E-T-418: Explore the Sensitive of the Planar Quality Assurance to the MLC Error with Different Beam Complexity in Intensity-Modulate Radiation Therapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, J; Peng, J; Xie, J

2015-06-15

Purpose: The purpose of this study is to investigate the sensitivity of the planar quality assurance to MLC errors with different beam complexities in intensity-modulate radiation therapy. Methods: sixteen patients’ planar quality assurance (QA) plans in our institution were enrolled in this study, including 10 dynamic MLC (DMLC) IMRT plans measured by Portal Dosimetry and 6 static MLC (SMLC) IMRT plans measured by Mapcheck. The gamma pass rate was calculated using vender’s software. The field numbers were 74 and 40 for DMLC and SMLC, respectively. A random error was generated and introduced to these fields. The modified gamma pass ratemore » was calculated by comparing the original measured fluence and modified fields’ fluence. A decreasing gamma pass rate was acquired using the original gamma pass rate minus the modified gamma pass rate. Eight complexity scores were calculated in MATLAB based on the fluence and MLC sequence of these fields. The complexity scores include fractal dimension, monitor unit of field, modulation index, fluence map complexity, weighted average of field area, weighted average of field perimeter, and small aperture ratio ( <5cm{sup 2} and <50cm{sup 2}). The Spearman’s rank correlation coefficient was implemented to analyze the correlation between these scores and decreasing gamma rate. Results: The relation between the decreasing gamma pass rate and field complexity was insignificant for most complexity scores. The most significant complexity score was fluence map complexity for SMLC, which have ρ =0.4274 (p-value=0.0063). For DMLC, the most significant complex score was fractal dimension, which have ρ=−0.3068 (p-value=0.0081). Conclusions: According to the primarily Result of this study, the sensitivity gamma pass rate was not strongly relate to the field complexity.« less
Assessing Hourly Precipitation Forecast Skill with the Fractions Skill Score

NASA Astrophysics Data System (ADS)

Zhao, Bin; Zhang, Bo

2018-02-01

Statistical methods for category (yes/no) forecasts, such as the Threat Score, are typically used in the verification of precipitation forecasts. However, these standard methods are affected by the so-called "double-penalty" problem caused by slight displacements in either space or time with respect to the observations. Spatial techniques have recently been developed to help solve this problem. The fractions skill score (FSS), a neighborhood spatial verification method, directly compares the fractional coverage of events in windows surrounding the observations and forecasts. We applied the FSS to hourly precipitation verification by taking hourly forecast products from the GRAPES (Global/Regional Assimilation Prediction System) regional model and quantitative precipitation estimation products from the National Meteorological Information Center of China during July and August 2016, and investigated the difference between these results and those obtained with the traditional category score. We found that the model spin-up period affected the assessment of stability. Systematic errors had an insignificant role in the fraction Brier score and could be ignored. The dispersion of observations followed a diurnal cycle and the standard deviation of the forecast had a similar pattern to the reference maximum of the fraction Brier score. The coefficient of the forecasts and the observations is similar to the FSS; that is, the FSS may be a useful index that can be used to indicate correlation. Compared with the traditional skill score, the FSS has obvious advantages in distinguishing differences in precipitation time series, especially in the assessment of heavy rainfall.
Family matters: dyadic agreement in end-of-life medical decision making.

PubMed

Schmid, Bettina; Allen, Rebecca S; Haley, Philip P; Decoster, Jamie

2010-04-01

We examined race/ethnicity and cultural context within hypothetical end-of-life medical decision scenarios and its influence on patient-proxy agreement. Family dyads consisting of an older adult and 1 family member, typically an adult child, responded to questions regarding the older adult's preferences for cardiopulmonary resuscitation, artificial feeding and fluids, and palliative care in hypothetical illness scenarios. The responses of 34 Caucasian dyads and 30 African American dyads were compared to determine the extent to which family members could accurately predict the treatment preferences of their older relative. We found higher treatment preference agreement among African American dyads compared with Caucasian dyads when considering overall raw difference scores (i.e., overtreatment errors can compensate for undertreatment errors). Prior advance care planning moderated the effect such that lower levels of advance care planning predicted undertreatment errors among African American proxies and overtreatment errors among Caucasian proxies. In contrast, no racial/ethnic differences in treatment preference agreement were found within absolute difference scores (i.e., total error, regardless of the direction of error). This project is one of the first to examine the mediators and moderators of dyadic racial/cultural differences in treatment preference agreement for end-of-life care in hypothetical illness scenarios. Future studies should use mixed method approaches to explore underlying factors for racial differences in patient-proxy agreement as a basis for developing culturally sensitive interventions to reduce racial disparities in end-of-life care options.
Undergraduate medical students' perceptions and intentions regarding patient safety during clinical clerkship.

PubMed

Lee, Hoo-Yeon; Hahm, Myung-Il; Lee, Sang Gyu

2018-04-04

The purpose of this study was to examine undergraduate medical students' perceptions and intentions regarding patient safety during clinical clerkships. Cross-sectional study administered in face-to-face interviews using modified the Medical Student Safety Attitudes and Professionalism Survey (MSSAPS) from three colleges of medicine in Korea. We assessed medical students' perceptions of the cultures ('safety', 'teamwork', and 'error disclosure'), 'behavioural intentions' concerning patient safety issues and 'overall patient safety'. Confirmatory factor analysis and Spearman's correlation analyses was performed. In total, 194(91.9%) of the 211 third-year undergraduate students participated. 78% of medical students reported that the quality of care received by patients was impacted by teamwork during clinical rotations. Regarding error disclosure, positive scores ranged from 10% to 74%. Except for one question asking whether the disclosure of medical errors was an important component of patient safety (74%), the percentages of positive scores for all the other questions were below 20%. 41.2% of medical students have intention to disclose it when they saw a medical error committed by another team member. Many students had difficulty speaking up about medical errors. Error disclosure guidelines and educational efforts aimed at developing sophisticated communication skills are needed. This study may serve as a reference for other institutions planning patient safety education in their curricula. Assessing student perceptions of safety culture can provide clerkship directors and clinical service chiefs with information that enhances the educational environment and promotes patient safety.
Automated Classification of Selected Data Elements from Free-text Diagnostic Reports for Clinical Research.

PubMed

Löpprich, Martin; Krauss, Felix; Ganzinger, Matthias; Senghas, Karsten; Riezler, Stefan; Knaup, Petra

2016-08-05

In the Multiple Myeloma clinical registry at Heidelberg University Hospital, most data are extracted from discharge letters. Our aim was to analyze if it is possible to make the manual documentation process more efficient by using methods of natural language processing for multiclass classification of free-text diagnostic reports to automatically document the diagnosis and state of disease of myeloma patients. The first objective was to create a corpus consisting of free-text diagnosis paragraphs of patients with multiple myeloma from German diagnostic reports, and its manual annotation of relevant data elements by documentation specialists. The second objective was to construct and evaluate a framework using different NLP methods to enable automatic multiclass classification of relevant data elements from free-text diagnostic reports. The main diagnoses paragraph was extracted from the clinical report of one third randomly selected patients of the multiple myeloma research database from Heidelberg University Hospital (in total 737 selected patients). An EDC system was setup and two data entry specialists performed independently a manual documentation of at least nine specific data elements for multiple myeloma characterization. Both data entries were compared and assessed by a third specialist and an annotated text corpus was created. A framework was constructed, consisting of a self-developed package to split multiple diagnosis sequences into several subsequences, four different preprocessing steps to normalize the input data and two classifiers: a maximum entropy classifier (MEC) and a support vector machine (SVM). In total 15 different pipelines were examined and assessed by a ten-fold cross-validation, reiterated 100 times. For quality indication the average error rate and the average F1-score were conducted. For significance testing the approximate randomization test was used. The created annotated corpus consists of 737 different diagnoses paragraphs with a total number of 865 coded diagnosis. The dataset is publicly available in the supplementary online files for training and testing of further NLP methods. Both classifiers showed low average error rates (MEC: 1.05; SVM: 0.84) and high F1-scores (MEC: 0.89; SVM: 0.92). However the results varied widely depending on the classified data element. Preprocessing methods increased this effect and had significant impact on the classification, both positive and negative. The automatic diagnosis splitter increased the average error rate significantly, even if the F1-score decreased only slightly. The low average error rates and high average F1-scores of each pipeline demonstrate the suitability of the investigated NPL methods. However, it was also shown that there is no best practice for an automatic classification of data elements from free-text diagnostic reports.
Long-range forecast of all India summer monsoon rainfall using adaptive neuro-fuzzy inference system: skill comparison with CFSv2 model simulation and real-time forecast for the year 2015

NASA Astrophysics Data System (ADS)

Chaudhuri, S.; Das, D.; Goswami, S.; Das, S. K.

2016-11-01

All India summer monsoon rainfall (AISMR) characteristics play a vital role for the policy planning and national economy of the country. In view of the significant impact of monsoon system on regional as well as global climate systems, accurate prediction of summer monsoon rainfall has become a challenge. The objective of this study is to develop an adaptive neuro-fuzzy inference system (ANFIS) for long range forecast of AISMR. The NCEP/NCAR reanalysis data of temperature, zonal and meridional wind at different pressure levels have been taken to construct the input matrix of ANFIS. The membership of the input parameters for AISMR as high, medium or low is estimated with trapezoidal membership function. The fuzzified standardized input parameters and the de-fuzzified target output are trained with artificial neural network models. The forecast of AISMR with ANFIS is compared with non-hybrid multi-layer perceptron model (MLP), radial basis functions network (RBFN) and multiple linear regression (MLR) models. The forecast error analyses of the models reveal that ANFIS provides the best forecast of AISMR with minimum prediction error of 0.076, whereas the errors with MLP, RBFN and MLR models are 0.22, 0.18 and 0.73 respectively. During validation with observations, ANFIS shows its potency over the said comparative models. Performance of the ANFIS model is verified through different statistical skill scores, which also confirms the aptitude of ANFIS in forecasting AISMR. The forecast skill of ANFIS is also observed to be better than Climate Forecast System version 2. The real-time forecast with ANFIS shows possibility of deficit (65-75 cm) AISMR in the year 2015.
Glaucoma and Driving: On-Road Driving Characteristics

PubMed Central

Wood, Joanne M.; Black, Alex A.; Mallon, Kerry; Thomas, Ravi; Owsley, Cynthia

2016-01-01

Purpose To comprehensively investigate the types of driving errors and locations that are most problematic for older drivers with glaucoma compared to those without glaucoma using a standardized on-road assessment. Methods Participants included 75 drivers with glaucoma (mean = 73.2±6.0 years) with mild to moderate field loss (better-eye MD = -1.21 dB; worse-eye MD = -7.75 dB) and 70 age-matched controls without glaucoma (mean = 72.6 ± 5.0 years). On-road driving performance was assessed in a dual-brake vehicle by an occupational therapist using a standardized scoring system which assessed the types of driving errors and the locations where they were made and the number of critical errors that required an instructor intervention. Driving safety was rated on a 10-point scale. Self-reported driving ability and difficulties were recorded using the Driving Habits Questionnaire. Results Drivers with glaucoma were rated as significantly less safe, made more driving errors, and had almost double the rate of critical errors than those without glaucoma. Driving errors involved lane positioning and planning/approach, and were significantly more likely to occur at traffic lights and yield/give-way intersections. There were few between group differences in self-reported driving ability. Conclusions Older drivers with glaucoma with even mild to moderate field loss exhibit impairments in driving ability, particularly during complex driving situations that involve tactical problems with lane-position, planning ahead and observation. These results, together with the fact that these drivers self-report their driving to be relatively good, reinforce the need for evidence-based on-road assessments for evaluating driving fitness. PMID:27472221
Glaucoma and Driving: On-Road Driving Characteristics.

PubMed

Wood, Joanne M; Black, Alex A; Mallon, Kerry; Thomas, Ravi; Owsley, Cynthia

2016-01-01

To comprehensively investigate the types of driving errors and locations that are most problematic for older drivers with glaucoma compared to those without glaucoma using a standardized on-road assessment. Participants included 75 drivers with glaucoma (mean = 73.2±6.0 years) with mild to moderate field loss (better-eye MD = -1.21 dB; worse-eye MD = -7.75 dB) and 70 age-matched controls without glaucoma (mean = 72.6 ± 5.0 years). On-road driving performance was assessed in a dual-brake vehicle by an occupational therapist using a standardized scoring system which assessed the types of driving errors and the locations where they were made and the number of critical errors that required an instructor intervention. Driving safety was rated on a 10-point scale. Self-reported driving ability and difficulties were recorded using the Driving Habits Questionnaire. Drivers with glaucoma were rated as significantly less safe, made more driving errors, and had almost double the rate of critical errors than those without glaucoma. Driving errors involved lane positioning and planning/approach, and were significantly more likely to occur at traffic lights and yield/give-way intersections. There were few between group differences in self-reported driving ability. Older drivers with glaucoma with even mild to moderate field loss exhibit impairments in driving ability, particularly during complex driving situations that involve tactical problems with lane-position, planning ahead and observation. These results, together with the fact that these drivers self-report their driving to be relatively good, reinforce the need for evidence-based on-road assessments for evaluating driving fitness.
The Influence of Guided Error-Based Learning on Motor Skills Self-Efficacy and Achievement.

PubMed

Chien, Kuei-Pin; Chen, Sufen

2018-01-01

The authors investigated the role of errors in motor skills teaching, specifically the influence of errors on skills self-efficacy and achievement. The participants were 75 undergraduate students enrolled in pétanque courses. The experimental group (guided error-based learning, n = 37) received a 6-week period of instruction based on the students' errors, whereas the control group (correct motion instruction, n = 38) received a 6-week period of instruction emphasizing correct motor skills. The experimental group had significantly higher scores in motor skills self-efficacy and outcomes than did the control group. Novices' errors reflect their schema in motor skills learning, which provides a basis for instructors to implement student-centered instruction and to facilitate the learning process. Guided error-based learning can effectively enhance beginners' skills self-efficacy and achievement in precision sports such as pétanque.
Validating the Rett Syndrome Gross Motor Scale.

PubMed

Downs, Jenny; Stahlhut, Michelle; Wong, Kingsley; Syhler, Birgit; Bisgaard, Anne-Marie; Jacoby, Peter; Leonard, Helen

2016-01-01

Rett syndrome is a pervasive neurodevelopmental disorder associated with a pathogenic mutation on the MECP2 gene. Impaired movement is a fundamental component and the Rett Syndrome Gross Motor Scale was developed to measure gross motor abilities in this population. The current study investigated the validity and reliability of the Rett Syndrome Gross Motor Scale. Video data showing gross motor abilities supplemented with parent report data was collected for 255 girls and women registered with the Australian Rett Syndrome Database, and the factor structure and relationships between motor scores, age and genotype were investigated. Clinical assessment scores for 38 girls and women with Rett syndrome who attended the Danish Center for Rett Syndrome were used to assess consistency of measurement. Principal components analysis enabled the calculation of three factor scores: Sitting, Standing and Walking, and Challenge. Motor scores were poorer with increasing age and those with the p.Arg133Cys, p.Arg294* or p.Arg306Cys mutation achieved higher scores than those with a large deletion. The repeatability of clinical assessment was excellent (intraclass correlation coefficient for total score 0.99, 95% CI 0.93-0.98). The standard error of measurement for the total score was 2 points and we would be 95% confident that a change 4 points in the 45-point scale would be greater than within-subject measurement error. The Rett Syndrome Gross Motor Scale could be an appropriate measure of gross motor skills in clinical practice and clinical trials.
Impact of lossy compression on diagnostic accuracy of radiographs for periapical lesions

NASA Technical Reports Server (NTRS)

Eraso, Francisco E.; Analoui, Mostafa; Watson, Andrew B.; Rebeschini, Regina

2002-01-01

OBJECTIVES: The purpose of this study was to evaluate the lossy Joint Photographic Experts Group compression for endodontic pretreatment digital radiographs. STUDY DESIGN: Fifty clinical charge-coupled device-based, digital radiographs depicting periapical areas were selected. Each image was compressed at 2, 4, 8, 16, 32, 48, and 64 compression ratios. One root per image was marked for examination. Images were randomized and viewed by four clinical observers under standardized viewing conditions. Each observer read the image set three times, with at least two weeks between each reading. Three pre-selected sites per image (mesial, distal, apical) were scored on a five-scale score confidence scale. A panel of three examiners scored the uncompressed images, with a consensus score for each site. The consensus score was used as the baseline for assessing the impact of lossy compression on the diagnostic values of images. The mean absolute error between consensus and observer scores was computed for each observer, site, and reading session. RESULTS: Balanced one-way analysis of variance for all observers indicated that for compression ratios 48 and 64, there was significant difference between mean absolute error of uncompressed and compressed images (P <.05). After converting the five-scale score to two-level diagnostic values, the diagnostic accuracy was strongly correlated (R (2) = 0.91) with the compression ratio. CONCLUSION: The results of this study suggest that high compression ratios can have a severe impact on the diagnostic quality of the digital radiographs for detection of periapical lesions.

Assessing pediatrics residents' mathematical skills for prescribing medication: a need for improved training.

PubMed

Glover, Mark L; Sussmane, Jeffrey B

2002-10-01

To evaluate residents' skills in performing basic mathematical calculations used for prescribing medications to pediatric patients. In 2001, a test of ten questions on basic calculations was given to first-, second-, and third-year residents at Miami Children's Hospital in Florida. Four additional questions were included to obtain the residents' levels of training, specific pediatrics intensive care unit (PICU) experience, and whether or not they routinely double-checked doses and adjusted them for each patient's weight. The test was anonymous and calculators were permitted. The overall score and the score for each resident class were calculated. Twenty-one residents participated. The overall average test score and the mean test score of each resident class was less than 70%. Second-year residents had the highest mean test scores, although there was no significant difference between the classes of residents (p =.745) or relationship between the residents' PICU experiences and their exam scores (p =.766). There was no significant difference between residents' levels of training and whether they double-checked their calculations (p =.633) or considered each patient's weight relative to the dose prescribed (p =.869). Seven residents committed tenfold dosing errors, and one resident committed a 1,000-fold dosing error. Pediatrics residents need to receive additional education in performing the calculations needed to prescribe medications. In addition, residents should be required to demonstrate these necessary mathematical skills before they are allowed to prescribe medications.
Lod scores for gene mapping in the presence of marker map uncertainty.

PubMed

Stringham, H M; Boehnke, M

2001-07-01

Multipoint lod scores are typically calculated for a grid of locus positions, moving the putative disease locus across a fixed map of genetic markers. Changing the order of a set of markers and/or the distances between the markers can make a substantial difference in the resulting lod score curve and the location and height of its maximum. The typical approach of using the best maximum likelihood marker map is not easily justified if other marker orders are nearly as likely and give substantially different lod score curves. To deal with this problem, we propose three weighted multipoint lod score statistics that make use of information from all plausible marker orders. In each of these statistics, the information conditional on a particular marker order is included in a weighted sum, with weight equal to the posterior probability of that order. We evaluate the type 1 error rate and power of these three statistics on the basis of results from simulated data, and compare these results to those obtained using the best maximum likelihood map and the map with the true marker order. We find that the lod score based on a weighted sum of maximum likelihoods improves on using only the best maximum likelihood map, having a type 1 error rate and power closest to that of using the true marker order in the simulation scenarios we considered. Copyright 2001 Wiley-Liss, Inc.
The Effect of Piano Playing on Preservice Teachers' Ability to Detect Errors in a Choral Score

ERIC Educational Resources Information Center

Napoles, Jessica; Babb, Sandra L.; Bowers, Judy; Hankle, Steven; Zrust, Adam

2017-01-01

The purpose of this study was to examine and empirically test the pedagogical claim that playing the piano while listening to choral singers impedes error detection ability. In a within-subjects design, participants (N = 55 preservice teachers) either listened to four excerpts of choral hymns or played a single part (soprano/bass) on the piano…
Usability Evaluation of Laboratory Information Systems.

PubMed

Mathews, Althea; Marc, David

2017-01-01

Numerous studies have revealed widespread clinician frustration with the usability of electronic health records (EHRs) that is counterproductive to adoption of EHR systems to meet the aims of health-care reform. With poor system usability comes increased risk of negative unintended consequences. Usability issues could lead to user error and workarounds that have the potential to compromise patient safety and negatively impact the quality of care.[1] While there is ample research on EHR usability, there is little information on the usability of laboratory information systems (LISs). Yet, LISs facilitate the timely provision of a great deal of the information needed by physicians to make patient care decisions.[2] Medical and technical advances in genomics that require processing of an increased volume of complex laboratory data further underscore the importance of developing user-friendly LISs. This study aims to add to the body of knowledge on LIS usability. A survey was distributed among LIS users at hospitals across the United States. The survey consisted of the ten-item System Usability Scale (SUS). In addition, participants were asked to rate the ease of performing 24 common tasks with a LIS. Finally, respondents provided comments on what they liked and disliked about using the LIS to provide diagnostic insight into LIS perceived usability. The overall mean SUS score of 59.7 for the LIS evaluated is significantly lower than the benchmark of 68 ( P < 0.001). All LISs evaluated received mean SUS scores below 68 except for Orchard Harvest (78.7). While the years of experience using the LIS was found to be a statistically significant influence on mean SUS scores, the combined effect of years of experience and LIS used did not account for the statistically significant difference in the mean SUS score between Orchard Harvest and each of the other LISs evaluated. The results of this study indicate that overall usability of LISs is poor. Usability lags that of systems evaluated across 446 usability surveys.
Setting and validating the pass/fail score for the NBDHE.

PubMed

Tsai, Tsung-Hsun; Dixon, Barbara Leatherman

2013-04-01

This report describes the overall process used for setting the pass/fail score for the National Board Dental Hygiene Examination (NBDHE). The Objective Standard Setting (OSS) method was used for setting the pass/fail score for the NBDHE. The OSS method requires a panel of experts to determine the criterion items and proportion of these items that minimally competent candidates would answer correctly, the percentage of mastery and the confidence level of the error band. A panel of 11 experts was selected by the Joint Commission on National Dental Examinations (Joint Commission). Panel members represented geographic distribution across the U.S. and had the following characteristics: full-time dental hygiene practitioners with experience in areas of preventive, periodontal, geriatric and special needs care, and full-time dental hygiene educators with experience in areas of scientific basis for dental hygiene practice, provision of clinical dental hygiene services and community health/research principles. Utilizing the expert panel's judgments, the pass/fail score was set and then the score scale was established using the Rasch measurement model. Statistical and psychometric analysis shows the actual failure rate and the OSS failure rate are reasonably consistent (2.4% vs. 2.8%). The analysis also showed the lowest error of measurement, an index of the precision at the pass/fail score point and that the highest reliability (0.97) are achieved at the pass/fail score point. The pass/fail score is a valid guide for making decisions about candidates for dental hygiene licensure. This new standard was reviewed and approved by the Joint Commission and was implemented beginning in 2011.
Prediction of true test scores from observed item scores and ancillary data.

PubMed

Haberman, Shelby J; Yao, Lili; Sinharay, Sandip

2015-05-01

In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®). In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. © 2015 The British Psychological Society.
Selection of neural network structure for system error correction of electro-optical tracker system with horizontal gimbal

NASA Astrophysics Data System (ADS)

Liu, Xing-fa; Cen, Ming

2007-12-01

Neural Network system error correction method is more precise than lest square system error correction method and spheric harmonics function system error correction method. The accuracy of neural network system error correction method is mainly related to the frame of Neural Network. Analysis and simulation prove that both BP neural network system error correction method and RBF neural network system error correction method have high correction accuracy; it is better to use RBF Network system error correction method than BP Network system error correction method for little studying stylebook considering training rate and neural network scale.
Developing a weighted measure of speech sound accuracy.

PubMed

Preston, Jonathan L; Ramsdell, Heather L; Oller, D Kimbrough; Edwards, Mary Louise; Tobin, Stephen J

2011-02-01

To develop a system for numerically quantifying a speaker's phonetic accuracy through transcription-based measures. With a focus on normal and disordered speech in children, the authors describe a system for differentially weighting speech sound errors on the basis of various levels of phonetic accuracy using a Weighted Speech Sound Accuracy (WSSA) score. The authors then evaluate the reliability and validity of this measure. Phonetic transcriptions were analyzed from several samples of child speech, including preschoolers and young adolescents with and without speech sound disorders and typically developing toddlers. The new measure of phonetic accuracy was validated against existing measures, was used to discriminate typical and disordered speech production, and was evaluated to examine sensitivity to changes in phonetic accuracy over time. Reliability between transcribers and consistency of scores among different word sets and testing points are compared. Initial psychometric data indicate that WSSA scores correlate with other measures of phonetic accuracy as well as listeners' judgments of the severity of a child's speech disorder. The measure separates children with and without speech sound disorders and captures growth in phonetic accuracy in toddlers' speech over time. The measure correlates highly across transcribers, word lists, and testing points. Results provide preliminary support for the WSSA as a valid and reliable measure of phonetic accuracy in children's speech.
Performance of red-green color deficient subjects on the Holmes-Wright lantern (Type A) in photopic viewing.

PubMed

Birch, J

1999-09-01

The Holmes-Wright lantern (Type A) is an approved occupational color vision test for airline pilots in the European Economic Community and for specific occupations in the British Armed Forces. The colors shown are red, green and white signal lights. The Holmes-Wright lantern is a sensitive screening test for red-green color deficiency in photopic viewing and the pass/fail level is similar to that of the Farnsworth Lantern (Falant) if the same scoring method is applied. There were 138 color deficient subjects identified with the Ishihara plates and diagnosed with the Nagel anomaloscope, completed a color vision test battery which included three runs of the nine color pairs of the Holmes-Wright lantern at high brightness in normal room illumination. Screening sensitivity on a single error was found to be 97% compared with the Ishihara plates. Using the Falant scoring method, 20 subjects passed. These were 1 deuteranope, 2 protanomalous trichromats and 17 deuteranomalous trichromats (22% of 88 anomalous trichromats). The mean error score was greater for protans than for deutans but the mean number of qualitative error categories was smaller. Green/white confusions were the most frequent errors. It was not possible to predict who would pass the lantern test from other test results but all subjects with a Nagel anomaloscope matching range > 15 scale units who failed the Farnsworth D15 test or were grading as moderate/severe with the American Optical Company (Hardy, Rand and Rittler) plates failed. The Holmes-Wright lantern is a sensitive screening test for red-green color deficiency. Although a similar percentage of anomalous trichromats fail the Holmes-Wright lantern as fail the Falant, if the same scoring method is used, the superior correlation between the Holmes-Wright result and other color vision tests designed to grade the severity of color deficiency suggests that the two lantern results are not equivalent.
Performance of a light fluorescence device for the detection of microbial plaque and gingival inflammation.

PubMed

Rechmann, Peter; Liou, Shasan W; Rechmann, Beate M T; Featherstone, John D B

2016-01-01

The hypothesis to be tested was that using the SOPROCARE system in fluorescence perio-mode allows scoring of microbial plaque that is comparable to the Turesky modification of the Quigley Hein plaque index (T-QH) and scoring of gingival inflammation comparable to the Silness and Löe gingival inflammation index (GI). Fifty-five subjects with various amounts of microbial plaque were recruited. The T-QH and GI index were recorded. SOPROCARE pictures were recorded in fluorescence perio-mode and in daylight mode. Finally, conventional digital photographs were taken. All pictures were assessed using the same criteria as described for the clinical indices. The average T-QH was 1.1 ± 1.2 (mean ± SD). Scoring with SOPROCARE perio-mode led to a slightly higher average than the T-QH scores. SOPROCARE daylight mode and digital photography showed the highest plaque scores. The average GI index was 0.7 ± 0.9. SOPROCARE in perio-mode scored slightly lower. Linear regression fits between the different clinical indices and SOPROCARE scores were significantly different from zero demonstrating high goodness of fit. The study demonstrated that the SOPROCARE fluorescence assessment tool in perio-mode allows reliable judgment of microbial plaque and gingival inflammation levels similar to the established Turesky-modified Quigley Hein index and the Silness and Löe gingival inflammation index. Training on plaque-free teeth will actually reduce scoring errors. The SOPROCARE fluorescence tool in perio-mode provides reliable evaluation of microbial plaque and gingival inflammation for the dental clinician.
The multiple hop test: a discriminative or evaluative instrument for chronic ankle instability?

PubMed

Eechaute, Christophe; Bautmans, Ivan; De Hertogh, Willem; Vaes, Peter

2012-05-01

To determine whether the multiple hop test should be used as an evaluative or a discriminative instrument for chronic ankle instability (CAI). Blinded case-control study. : University research laboratory. Twenty-nine healthy subjects (21 men, 8 women, mean age 21.8 years) and 29 patients with CAI (17 men, 12 women, mean age 24.9 years) were selected. Subjects performed a multiple hop test and hopped on 10 different tape markers while trying to avoid any postural correction. Minimal detectable changes (MDC) of the number of balance errors, the time value, and the visual analog scale (VAS) score (perceived difficulty) were calculated as evaluative measures. For the discriminative properties, a receiver operating characteristic curve was determined and the area under curve (AUC), the sensitivity, specificity, diagnostic accuracy (DA), and likelihood ratios (LR) were calculated whether 1, 2, or 3 outcomes were positive. Based on their MDC, outcomes should, respectively, change by more than 7 errors (41%), 6 seconds (15%), and 27 mm (55%, VAS score) before considering it as a real change. Area under curves were, respectively, 79% (errors), 77% (time value), and 65% (VAS score). The most optimal cutoff point was, respectively, 13.5 errors, 35 seconds, and 32.5 mm. When 2 of 3 outcomes were positive, the sensitivity was 86%, the specificity was 79%, the DA was 83%, the positive LR was 4.2, and the negative LR was 0.17. The multiple hop test seems to be more a discriminative instrument for CAI, and its responsiveness needs to be demonstrated.
Psychometric Evaluation of the Brachial Assessment Tool Part 1: Reproducibility.

PubMed

Hill, Bridget; Williams, Gavin; Olver, John; Ferris, Scott; Bialocerkowski, Andrea

2018-04-01

To evaluate reproducibility (reliability and agreement) of the Brachial Assessment Tool (BrAT), a new patient-reported outcome measure for adults with traumatic brachial plexus injury (BPI). Prospective repeated-measure design. Outpatient clinics. Adults with confirmed traumatic BPI (N=43; age range, 19-82y). People with BPI completed the 31-item 4-response BrAT twice, 2 weeks apart. Results for the 3 subscales and summed score were compared at time 1 and time 2 to determine reliability, including systematic differences using paired t tests, test retest using intraclass correlation coefficient model 1,1 (ICC 1,1 ), and internal consistency using Cronbach α. Agreement parameters included standard error of measurement, minimal detectable change, and limits of agreement. BrAT. Test-retest reliability was excellent (ICC 1,1 =.90-.97). Internal consistency was high (Cronbach α=.90-.98). Measurement error was relatively low (standard error of measurement range, 3.1-8.8). A change of >4 for subscale 1, >6 for subscale 2, >4 for subscale 3, and >10 for the summed score is indicative of change over and above measurement error. Limits of agreement ranged from ±4.4 (subscale 3) to 11.61 (summed score). These findings support the use of the BrAT as a reproducible patient-reported outcome measure for adults with traumatic BPI with evidence of appropriate reliability and agreement for both individual and group comparisons. Further psychometric testing is required to establish the construct validity and responsiveness of the BrAT. Copyright © 2017 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
A cross "ethnical" comparison of the Driver Behaviour Questionnaire (DBQ) in an economically fast developing country.

PubMed

Bener, Abdulbari; Verjee, Mohamud; Dafeeah, Elnour E; Yousafzai, Mohammad T; Mari, Sundus; Hassib, Ahmed; Al-Khatib, Hamza; Choi, Min Kyung; Nema, Noor; Ozkan, Türker; Lajunen, Timo

2013-05-12

The aim of this study was to compare the driving behaviours of four ethnic groups and to investigate the relationship between violations, errors and lapses of DBQ and accident involvement in Qatar. The Driver Behaviour Questionnaire (DBQ) was used to measure the aberrant driving behaviours leading to accidents. Of 2400 drivers approached, 1824 drivers agreed to participate (76%) and completed the driver behaviour questionnaire and background information. The study revealed that the majority of the Qatari (35.9%) and Jordanian drivers (37.5%) were below 30 years of age, whereas Filipino (42.3%) and Indian subcontinent (34.1%) drivers were in the age group of 30-39 years. Qatari drivers (52%) were involved in most accidents, followed by Jordanians (48.3%). The most common type of collision was a head on collision, which was similar in all four ethnic groups. The Qatari drivers scored higher on almost all items of violations, errors and lapses compared to other ethnic groups, while Filipino drivers were lower on all the items. The most common violation was the same in all four ethnic groups "Disregard the speed limits on a motorway". The most common error item observed was "Queing to turn right/left on to a main road". "Forget where you left your car" and "Hit something when reversing" were the two lapses identified in factor analysis. The present study identified that Qatari drivers scored higher on most of the items of violations, errors and lapses of DBQ compared to other countries, whereas Filipino drivers scored lower in DBQ items.
The culture of patient safety in an Iranian intensive care unit.

PubMed

Abdi, Zhaleh; Delgoshaei, Bahram; Ravaghi, Hamid; Abbasi, Mohsen; Heyrani, Ali

2015-04-01

To explore nurses' and physicians' attitudes and perceptions relevant to safety culture and to elicit strategies to promote safety culture in an intensive care unit. A strong safety culture is essential to ensure patient safety in the intensive care unit. This case study adopted a mixed method design. The Safety Attitude Questionnaire (SAQ-ICU version), assessing the safety climate through six domains, was completed by nurses and physicians (n = 42) in an academic intensive care unit. Twenty semi-structured interviews and document analyses were conducted as well. Interviews were analysed using a framework analysis method. Mean scores across the six domains ranged from 52.3 to 72.4 on a 100-point scale. Further analysis indicated that there were statistically significant differences between physicians' and nurses' attitudes toward teamwork (mean scores: 64.5/100 vs. 52.6/100, d = 1.15, t = 3.69, P < 0.001) and job satisfaction (mean scores: 78.2/100 vs. 57.7/100, d = 1.5, t = 4.8, P < 0.001). Interviews revealed several safety challenges including underreporting, failure to learn from errors, lack of speaking up, low job satisfaction among nurses and ineffective nurse-physician communication. The results indicate that all the domains need improvements. However, further attention should be devoted to error reporting and analysis, communication and teamwork among professional groups, and nurses' job satisfaction. Nurse managers can contribute to promoting a safety culture by encouraging staff to report errors, fostering learning from errors and addressing inter-professional communication problems. © 2013 John Wiley & Sons Ltd.
Pragmatics abilities in narrative production: a cross-disorder comparison.

PubMed

Norbury, Courtenay Frazier; Gemmell, Tracey; Paul, Rhea

2014-05-01

We aimed to disentangle contributions of socio-pragmatic and structural language deficits to narrative competence by comparing the narratives of children with autism spectrum disorder (ASD; n = 25), non-autistic children with language impairments (LI; n = 23), and children with typical development (TD; n = 27). Groups were matched for age (6½ to 15 years; mean: 10;6) and non-verbal ability; ASD and TD groups were matched on standardized language scores. Despite distinct clinical presentation, children with ASD and LI produced similarly simple narratives that lacked semantic richness and omitted important story elements, when compared to TD peers. Pragmatic errors were common across groups. Within the LI group, pragmatic errors were negatively correlated with story macrostructure scores and with an index of semantic-pragmatic relevance. For the group with ASD, pragmatic errors consisted of comments that, though extraneous, did not detract from the gist of the narrative. These findings underline the importance of both language and socio-pragmatic skill for producing coherent, appropriate narratives.
Reaction time, impulsivity, and attention in hyperactive children and controls: a video game technique.

PubMed

Mitchell, W G; Chavez, J M; Baker, S A; Guzman, B L; Azen, S P

1990-07-01

Maturation of sustained attention was studied in a group of 52 hyperactive elementary school children and 152 controls using a microcomputer-based test formatted to resemble a video game. In nonhyperactive children, both simple and complex reaction time decreased with age, as did variability of response time. Omission errors were extremely infrequent on simple reaction time and decreased with age on the more complex tasks. Commission errors had an inconsistent relationship with age. Hyperactive children were slower, more variable, and made more errors on all segments of the game than did controls. Both motor speed and calculated mental speed were slower in hyperactive children, with greater discrepancy for responses directed to the nondominant hand, suggesting that a selective right hemisphere deficit may be present in hyperactives. A summary score (number of individual game scores above the 95th percentile) of 4 or more detected 60% of hyperactive subjects with a false positive rate of 5%. Agreement with the Matching Familiar Figures Test was 75% in the hyperactive group.
Estimating Teacher Effectiveness from Two-Year Changes in Students' Test Scores

ERIC Educational Resources Information Center

Leigh, Andrew

2010-01-01

Using a dataset covering over 10,000 Australian school teachers and over 90,000 pupils, I estimate how effective teachers are in raising students' test scores. Since the exams are biennial, it is necessary to take account of the teacher's work in the intervening year. Even adjusting for measurement error, the teacher fixed effects are widely…
Multilevel Multidimensional Item Response Model with a Multilevel Latent Covariate

ERIC Educational Resources Information Center

Cho, Sun-Joo; Bottge, Brian A.

2015-01-01

In a pretest-posttest cluster-randomized trial, one of the methods commonly used to detect an intervention effect involves controlling pre-test scores and other related covariates while estimating an intervention effect at post-test. In many applications in education, the total post-test and pre-test scores that ignores measurement error in the…
Using the Graded Response Model to Control Spurious Interactions in Moderated Multiple Regression

ERIC Educational Resources Information Center

Morse, Brendan J.; Johanson, George A.; Griffeth, Rodger W.

2012-01-01

Recent simulation research has demonstrated that using simple raw score to operationalize a latent construct can result in inflated Type I error rates for the interaction term of a moderated statistical model when the interaction (or lack thereof) is proposed at the latent variable level. Rescaling the scores using an appropriate item response…
Beyond Correctness: Development and Validation of Concept-Based Categorical Scoring Rubrics for Diagnostic Purposes

ERIC Educational Resources Information Center

Arieli-Attali, Meirav; Liu, Ying

2016-01-01

Diagnostic assessment approaches intend to provide fine-grained reports of what students know and can do, focusing on their areas of strengths and weaknesses. However, current application of such diagnostic approaches is limited by the scoring method for item responses; important diagnostic information, such as type of errors and strategy use is…

Characterizing Sources of Uncertainty in Item Response Theory Scale Scores

ERIC Educational Resources Information Center

Yang, Ji Seung; Hansen, Mark; Cai, Li

2012-01-01

Traditional estimators of item response theory scale scores ignore uncertainty carried over from the item calibration process, which can lead to incorrect estimates of the standard errors of measurement (SEMs). Here, the authors review a variety of approaches that have been applied to this problem and compare them on the basis of their statistical…
Impact of Accumulated Error on Item Response Theory Pre-Equating with Mixed Format Tests

ERIC Educational Resources Information Center

Keller, Lisa A.; Keller, Robert; Cook, Robert J.; Colvin, Kimberly F.

2016-01-01

The equating of tests is an essential process in high-stakes, large-scale testing conducted over multiple forms or administrations. By adjusting for differences in difficulty and placing scores from different administrations of a test on a common scale, equating allows scores from these different forms and administrations to be directly compared…
Manual for the USES General Aptitude Test Battery. Section IV: Norms, Specific Occupations.

ERIC Educational Resources Information Center

Manpower Administration (DOL), Washington, DC.

Adult norms are shown as cutting scores for each of the aptitudes judged significant for a given occupation. Tables for converting adult scores to their ninth and tenth grade equivalents are included. The standard error of measurement is reported for each of the nine aptitudes of the General Aptitude Test Battery (GATB): intelligence, verbal…
Preservice Teachers' Ability To Determine Miscues and Comprehension Response Errors of Elementary Students.

ERIC Educational Resources Information Center

Traynelis-Yurek, Elaine; Strong, Mary W.

2000-01-01

Examines the results of instruction in administering the Informal Reading Inventory (IRI) in three teacher training programs. Focuses on the examination of the scoring of the IRI in simulation exercises by preservice teachers after instruction in the administration and scoring of the IRI. Concludes that the preservice teachers did not accurately…
Measurement Error Correction Formula for Cluster-Level Group Differences in Cluster Randomized and Observational Studies

ERIC Educational Resources Information Center

Cho, Sun-Joo; Preacher, Kristopher J.

2016-01-01

Multilevel modeling (MLM) is frequently used to detect cluster-level group differences in cluster randomized trial and observational studies. Group differences on the outcomes (posttest scores) are detected by controlling for the covariate (pretest scores) as a proxy variable for unobserved factors that predict future attributes. The pretest and…
Relating physician's workload with errors during radiation therapy planning.

PubMed

Mazur, Lukasz M; Mosaly, Prithima R; Hoyle, Lesley M; Jones, Ellen L; Chera, Bhishamjit S; Marks, Lawrence B

2014-01-01

To relate subjective workload (WL) levels to errors for routine clinical tasks. Nine physicians (4 faculty and 5 residents) each performed 3 radiation therapy planning cases. The WL levels were subjectively assessed using National Aeronautics and Space Administration Task Load Index (NASA-TLX). Individual performance was assessed objectively based on the severity grade of errors. The relationship between the WL and performance was assessed via ordinal logistic regression. There was an increased rate of severity grade of errors with increasing WL (P value = .02). As the majority of the higher NASA-TLX scores, and the majority of the performance errors were in the residents, our findings are likely most pertinent to radiation oncology centers with training programs. WL levels may be an important factor contributing to errors during radiation therapy planning tasks. Published by Elsevier Inc.
Validation of Clinical Scoring Systems ART and ABCR after Transarterial Chemoembolization of Hepatocellular Carcinoma.

PubMed

Kloeckner, Roman; Pitton, Michael B; Dueber, Christoph; Schmidtmann, Irene; Galle, Peter R; Koch, Sandra; Wörns, Marcus A; Weinmann, Arndt

2017-01-01

To perform an external validation of the Assessment for Retreatment with Transarterial Chemoembolization (ART) and α-fetoprotein (AFP), Barcelona Clinic Liver Cancer (BCLC), Child-Pugh, and response (ABCR) scores and to compare them in terms of prognostic power. From 2000 to 2015, 871 patients with hepatocellular carcinoma underwent transarterial chemoembolization at a tertiary referral hospital, and 176 met all inclusion and exclusion criteria for both scores and were analyzed. Nineteen percent (n = 34) had BCLC stage A disease and 81% had stage B disease. Thirty-nine patients (22%) presented with elevated AFP levels. Overall survival was calculated. Scores were validated and compared with a Harrell C-index, integrated Brier score (IBS), and prediction error curves. Before the second chemoembolization procedure, 22 patients (12%) showed an increase of 1 point in Child-Pugh score and 51 patients (22%) had an increase of ≥ 2 points. Thirty-one patients (23%) showed a > 25% increase in aspartate aminotransferase level, and 114 (65%) showed a response to treatment. Consequently, 127 patients (72%) had a low ART score and 49 (28%) had a high ART score. One hundred fifty-eight patients (90%) had a low ABCR score, whereas 18 (10%) had a high ABCR score. Low and high ART score groups had median survival durations of 20.8 and 15.3 mo, respectively. Harrell C-indexes were 0.572 and 0.608, and IBSs were 0.135 and 0.128, for ART and ABCR, respectively. For both scores, an increase in Child-Pugh score ≥ 2 points and a radiologic response were significantly associated with survival. Both scores were of limited predictive value, and neither was sufficient to support clear-cut clinical decisions. Further effort is necessary to determine criteria for making valid clinical predictions. Copyright © 2017 SIR. Published by Elsevier Inc. All rights reserved.
Detecting Intervention Effects in a Cluster-Randomized Design Using Multilevel Structural Equation Modeling for Binary Responses

PubMed Central

Cho, Sun-Joo; Preacher, Kristopher J.; Bottge, Brian A.

2015-01-01

Multilevel modeling (MLM) is frequently used to detect group differences, such as an intervention effect in a pre-test–post-test cluster-randomized design. Group differences on the post-test scores are detected by controlling for pre-test scores as a proxy variable for unobserved factors that predict future attributes. The pre-test and post-test scores that are most often used in MLM are summed item responses (or total scores). In prior research, there have been concerns regarding measurement error in the use of total scores in using MLM. To correct for measurement error in the covariate and outcome, a theoretical justification for the use of multilevel structural equation modeling (MSEM) has been established. However, MSEM for binary responses has not been widely applied to detect intervention effects (group differences) in intervention studies. In this article, the use of MSEM for intervention studies is demonstrated and the performance of MSEM is evaluated via a simulation study. Furthermore, the consequences of using MLM instead of MSEM are shown in detecting group differences. Results of the simulation study showed that MSEM performed adequately as the number of clusters, cluster size, and intraclass correlation increased and outperformed MLM for the detection of group differences. PMID:29881032
Knee osteoarthritis image registration: data from the Osteoarthritis Initiative

NASA Astrophysics Data System (ADS)

Galván-Tejada, Jorge I.; Celaya-Padilla, José M.; Treviño, Victor; Tamez-Peña, José G.

2015-03-01

Knee osteoarthritis is a very common disease, in early stages, changes in joint structures are shown, some of the most common symptoms are; formation of osteophytes, cartilage degradation and joint space reduction, among others. Based on a joint space reduction measurement, Kellgren-Lawrence grading scale, is a very extensive used tool to asses radiological OA knee x-ray images, based on information obtained from these assessments, the objective of this work is to correlate the Kellgren-Lawrence score to the bilateral asymmetry between knees. Using public data from the Osteoarthritis initiative (OAI), a set of images with different Kellgren-Lawrencescores were used to determine a relationship of Kellgren-Lawrence score and the bilateral asymmetry, in order to measure the asymmetry between the knees, the right knee was registered to match the left knee, then a series of similarity metrics, mutual information, correlation, and mean squared error where computed to correlate the deformation (mismatch) of the knees to the Kellgren-Lawrence score. Radiological information was evaluated and scored by OAI radiologist groups. The results of the study suggest an association between Radiological Kellgren-Lawrence score and image registration metrics, mutual information and correlation is higher in the early stages, and mean squared error is higher in advanced stages. This association can be helpful to develop a computer aided grading tool.
Detecting Intervention Effects in a Cluster-Randomized Design Using Multilevel Structural Equation Modeling for Binary Responses.

PubMed

Cho, Sun-Joo; Preacher, Kristopher J; Bottge, Brian A

2015-11-01

Multilevel modeling (MLM) is frequently used to detect group differences, such as an intervention effect in a pre-test-post-test cluster-randomized design. Group differences on the post-test scores are detected by controlling for pre-test scores as a proxy variable for unobserved factors that predict future attributes. The pre-test and post-test scores that are most often used in MLM are summed item responses (or total scores). In prior research, there have been concerns regarding measurement error in the use of total scores in using MLM. To correct for measurement error in the covariate and outcome, a theoretical justification for the use of multilevel structural equation modeling (MSEM) has been established. However, MSEM for binary responses has not been widely applied to detect intervention effects (group differences) in intervention studies. In this article, the use of MSEM for intervention studies is demonstrated and the performance of MSEM is evaluated via a simulation study. Furthermore, the consequences of using MLM instead of MSEM are shown in detecting group differences. Results of the simulation study showed that MSEM performed adequately as the number of clusters, cluster size, and intraclass correlation increased and outperformed MLM for the detection of group differences.
Undesired variance due to examiner stringency/leniency effect in communication skill scores assessed in OSCEs.

PubMed

Harasym, Peter H; Woloschuk, Wayne; Cunning, Leslie

2008-12-01

Physician-patient communication is a clinical skill that can be learned and has a positive impact on patient satisfaction and health outcomes. A concerted effort at all medical schools is now directed at teaching and evaluating this core skill. Student communication skills are often assessed by an Objective Structure Clinical Examination (OSCE). However, it is unknown what sources of error variance are introduced into examinee communication scores by various OSCE components. This study primarily examined the effect different examiners had on the evaluation of students' communication skills assessed at the end of a family medicine clerkship rotation. The communication performance of clinical clerks from Classes 2005 and 2006 were assessed using six OSCE stations. Performance was rated at each station using the 28-item Calgary-Cambridge guide. Item Response Theory analysis using a Multifaceted Rasch model was used to partition the various sources of error variance and generate a "true" communication score where the effects of examiner, case, and items are removed. Variance and reliability of scores were as follows: communication scores (.20 and .87), examiner stringency/leniency (.86 and .91), case (.03 and .96), and item (.86 and .99), respectively. All facet scores were reliable (.87-.99). Examiner variance (.86) was more than four times the examinee variance (.20). About 11% of the clerks' outcome status shifted using "true" rather than observed/raw scores. There was large variability in examinee scores due to variation in examiner stringency/leniency behaviors that may impact pass-fail decisions. Exploring the benefits of examiner training and employing "true" scores generated using Item Response Theory analyses prior to making pass/fail decisions are recommended.
Training improves laparoscopic tasks performance and decreases operator workload.

PubMed

Hu, Jesse S L; Lu, Jirong; Tan, Wee Boon; Lomanto, Davide

2016-05-01

It has been postulated that increased operator workload during task performance may increase fatigue and surgical errors. The National Aeronautics and Space Administration-Task Load Index (NASA-TLX) is a validated tool for self-assessment for workload. Our study aims to assess the relationship of workload and performance of novices in simulated laparoscopic tasks of different complexity levels before and after training. Forty-seven novices without prior laparoscopic experience were recruited in a trial to investigate whether training improves task performance as well as mental workload. The participants were tested on three standard tasks (ring transfer, precision cutting and intracorporeal suturing) in increasing complexity based on the Fundamentals of Laparoscopic Surgery (FLS) curriculum. Following a period of training and rest, participants were tested again. Test scores were computed from time taken and time penalties for precision errors. Test scores and NASA-TLX scores were recorded pre- and post-training and analysed using paired t tests. One-way repeated measures ANOVA was used to analyse differences in NASA-TLX scores between the three tasks. NASA-TLX score was lowest with ring transfer and highest with intracorporeal suturing. This was statistically significant in both pre-training (p < 0.001) and post-training (p < 0.001). NASA-TLX scores mirror the changes in test scores for the three tasks. Workload scores decreased significantly after training for all three tasks (ring transfer = 2.93, p < 0.001, precision cutting = 3.74, p < 0.001, intracorporeal suturing = 2.98, p < 0.001). NASA-TLX score is an accurate reflection of the complexity of simulated laparoscopic tasks in the FLS curriculum. This also correlates with the relationship of test scores between the three tasks. Simulation training improves both performance score and workload score across the tasks.
Source memory errors in schizophrenia, hallucinations and negative symptoms: a synthesis of research findings.

PubMed

Brébion, G; Ohlsen, R I; Bressan, R A; David, A S

2012-12-01

Previous research has shown associations between source memory errors and hallucinations in patients with schizophrenia. We bring together here findings from a broad memory investigation to specify better the type of source memory failure that is associated with auditory and visual hallucinations. Forty-one patients with schizophrenia and 43 healthy participants underwent a memory task involving recall and recognition of lists of words, recognition of pictures, memory for temporal and spatial context of presentation of the stimuli, and remembering whether target items were presented as words or pictures. False recognition of words and pictures was associated with hallucination scores. The extra-list intrusions in free recall were associated with verbal hallucinations whereas the intra-list intrusions were associated with a global hallucination score. Errors in discriminating the temporal context of word presentation and the spatial context of picture presentation were associated with auditory hallucinations. The tendency to remember verbal labels of items as pictures of these items was associated with visual hallucinations. Several memory errors were also inversely associated with affective flattening and anhedonia. Verbal and visual hallucinations are associated with confusion between internal verbal thoughts or internal visual images and perception. In addition, auditory hallucinations are associated with failure to process or remember the context of presentation of the events. Certain negative symptoms have an opposite effect on memory errors.
The frontal-anatomic specificity of design fluency repetitions and their diagnostic relevance for behavioral variant frontotemporal dementia.

PubMed

Possin, Katherine L; Chester, Serana K; Laluz, Victor; Bostrom, Alan; Rosen, Howard J; Miller, Bruce L; Kramer, Joel H

2012-09-01

On tests of design fluency, an examinee draws as many different designs as possible in a specified time limit while avoiding repetition. The neuroanatomical substrates and diagnostic group differences of design fluency repetition errors and total correct scores were examined in 110 individuals diagnosed with dementia, 53 with mild cognitive impairment (MCI), and 37 neurologically healthy controls. The errors correlated significantly with volumes in the right and left orbitofrontal cortex (OFC), the right and left superior frontal gyrus, the right inferior frontal gyrus, and the right striatum, but did not correlate with volumes in any parietal or temporal lobe regions. Regression analyses indicated that the lateral OFC may be particularly crucial for preventing these errors, even after excluding patients with behavioral variant frontotemporal dementia (bvFTD) from the analysis. Total correct correlated more diffusely with volumes in the right and left frontal and parietal cortex, the right temporal cortex, and the right striatum and thalamus. Patients diagnosed with bvFTD made significantly more repetition errors than patients diagnosed with MCI, Alzheimer's disease, semantic dementia, progressive supranuclear palsy, or corticobasal syndrome. In contrast, total correct design scores did not differentiate the dementia patients. These results highlight the frontal-anatomic specificity of design fluency repetitions. In addition, the results indicate that the propensity to make these errors supports the diagnosis of bvFTD. (JINS, 2012, 18, 1-11).
"Alarm-corrected" ergonomic armrest use could improve learning curves of novices on robotic simulator.

PubMed

Yang, Kun; Perez, Manuela; Hossu, Gabriela; Hubert, Nicolas; Perrenot, Cyril; Hubert, Jacques

2017-01-01

In robotic surgery, the professional ergonomic habit of using an armrest reduces operator fatigue and increases the precision of motion. We designed and validated a pressure surveillance system (PSS) based on force sensors to investigate armrest use. The objective was to evaluate whether adding an alarm to the PSS system could shorten ergonomic training and improve performance. Twenty robot and simulator-naïve participants were recruited and randomized in two groups (A and B). The PSS was installed on a robotic simulator, the dV-Trainer, to detect contact with the armrest. The Group A members completed three tasks on the dV-Trainer without the alarm, making 15 attempts at each task. The Group B members practiced the first two tasks with the alarm and then completed the final tasks without the alarm. The simulator provided an overall score reflecting the trainees' performance. We used the new concept of an "armrest load" score to describe the ergonomic habit of using the armrest. Group B had a significantly higher performance score (p < 0.001) and armrest load score (p < 0.001) than Group A from the fifth attempt of the first task to the end of the experiment. Based on the conditioned reflex effect, the alarm associated with the PSS rectified ergonomic errors and accelerated professional ergonomic habit acquisition. The combination of the PSS and alarm is effective in significantly shortening the learning curve in the robotic training process.
Evaluation of Video Image Analysis (VIA) technology to predict meat yield of sheep carcasses on-line under UK abattoir conditions.

PubMed

Rius-Vilarrasa, E; Bünger, L; Maltin, C; Matthews, K R; Roehe, R

2009-05-01

The Meat and Livestock Commission's (MLC) EUROP classification based scheme and Video Image Analysis (VIA) system were compared in their ability to predict weights of primal carcass joints. A total of 443 commercial lamb carcasses under 12 months of age and mixed gender were selected by their cold carcass weight (CCW), conformation and fat scores. Lamb carcasses were classified for conformation and fatness, scanned by the VIA system and dissected into primal joints of leg, chump, loin, breast and shoulder. After adjustment for CCW, the estimation of primal joints using MLC EUROP scores showed high coefficients of determination (R(2)) in the range of 0.82-0.99. The use of VIA always resulted in equal or higher R(2). The precision measured as root mean square error (RMSE) was 27% (leg), 13% (chump), 1% (loin), 11% (breast), 5% (shoulders) and 13% (total primals) higher using VIA than MLC carcass information. Adjustment for slaughter day and gender effects indicated that estimations of primal joints using MLC EUROP scores were more sensitive to these factors than using VIA. This was consistent with an increase in stability of the prediction model of 28%, 11%, 2%, 12%, 6% and 14% for leg, chump, loin, breast and shoulder and total primals, respectively, using VIA compared to MLC EUROP scores. Consequently, VIA was capable of improving the prediction of primal meat yields compared to the current MLC EUROP carcass classification scheme used in the UK abattoirs.
Dehydration and performance on clinical concussion measures in collegiate wrestlers.

PubMed

Weber, Amanda Friedline; Mihalik, Jason P; Register-Mihalik, Johna K; Mays, Sally; Prentice, William E; Guskiewicz, Kevin M

2013-01-01

The effects of dehydration induced by wrestling-related weight-cutting tactics on clinical concussion outcomes, such as neurocognitive function, balance performance, and symptoms, have not been adequately studied. To evaluate the effects of dehydration on the outcome of clinical concussion measures in National Collegiate Athletic Association Division I collegiate wrestlers. Repeated-measures design. Clinical research laboratory. Thirty-two Division I healthy collegiate male wrestlers (age = 20.0 ± 1.4 years; height = 175.0 ± 7.5 cm; baseline mass = 79.2 ± 12.6 kg). Participants completed preseason concussion baseline testing in early September. Weight and urine samples were also collected at this time. All participants reported to prewrestling practice and postwrestling practice for the same test battery and protocol in mid-October. They had begun practicing weight-cutting tactics a day before prepractice and postpractice testing. Differences between these measures permitted us to evaluate how dehydration and weight-cutting tactics affected concussion measures. Sport Concussion Assessment Tool 2 (SCAT2), Balance Error Scoring System, Graded Symptom Checklist, and Simple Reaction Time scores. The Simple Reaction Time was measured using the Automated Neuropsychological Assessment Metrics. The SCAT2 measurements were lower at prepractice (P = .002) and postpractice (P < .001) when compared with baseline. The BESS error scores were higher at postpractice when compared with baseline (P = .015). The GSC severity scores were higher at prepractice (P = .011) and postpractice (P < .001) than at baseline and at postpractice when than at prepractice (P = .003). The number of Graded Symptom Checklist symptoms reported was also higher at prepractice (P = .036) and postpractice (P < .001) when compared with baseline, and at postpractice when compared with prepractice (P = .003). Our results suggest that it is important for wrestlers to be evaluated in a euhydrated state to ensure that dehydration is not influencing the outcome of the clinical measures.
Dehydration and Performance on Clinical Concussion Measures in Collegiate Wrestlers

PubMed Central

Weber, Amanda Friedline; Mihalik, Jason P.; Register-Mihalik, Johna K.; Mays, Sally; Prentice, William E.; Guskiewicz, Kevin M.

2013-01-01

Context: The effects of dehydration induced by wrestling-related weight-cutting tactics on clinical concussion outcomes, such as neurocognitive function, balance performance, and symptoms, have not been adequately studied. Objective: To evaluate the effects of dehydration on the outcome of clinical concussion measures in National Collegiate Athletic Association Division I collegiate wrestlers. Design: Repeated-measures design. Setting: Clinical research laboratory. Patients or Other Participants: Thirty-two Division I healthy collegiate male wrestlers (age = 20.0 ± 1.4 years; height = 175.0 ± 7.5 cm; baseline mass = 79.2 ± 12.6 kg). Intervention(s): Participants completed preseason concussion baseline testing in early September. Weight and urine samples were also collected at this time. All participants reported to prewrestling practice and postwrestling practice for the same test battery and protocol in mid-October. They had begun practicing weight-cutting tactics a day before prepractice and postpractice testing. Differences between these measures permitted us to evaluate how dehydration and weight-cutting tactics affected concussion measures. Main Outcome Measures: Sport Concussion Assessment Tool 2 (SCAT2), Balance Error Scoring System, Graded Symptom Checklist, and Simple Reaction Time scores. The Simple Reaction Time was measured using the Automated Neuropsychological Assessment Metrics. Results: The SCAT2 measurements were lower at prepractice (P = .002) and postpractice (P < .001) when compared with baseline. The BESS error scores were higher at postpractice when compared with baseline (P = .015). The GSC severity scores were higher at prepractice (P = .011) and postpractice (P < .001) than at baseline and at postpractice when than at prepractice (P = .003). The number of Graded Symptom Checklist symptoms reported was also higher at prepractice (P = .036) and postpractice (P < .001) when compared with baseline, and at postpractice when compared with prepractice (P = .003). Conclusions: Our results suggest that it is important for wrestlers to be evaluated in a euhydrated state to ensure that dehydration is not influencing the outcome of the clinical measures. PMID:23672379
Effect of teaching with or without mirror on balance in young female ballet students

PubMed Central

2014-01-01

Background In literature there is a general consensus that the use of the mirror improves proprioception. During rehabilitation the mirror is an important instrument to improve stability. In some sports, such as dancing, mirrors are widely used during training. The purpose of this study is to evaluate the effectiveness of the use of a mirror on balance in young dancers. Sixty-four young dancers (ranging from 9–10 years) were included in this study. Thirty-two attending lessons with a mirror (mirror- group) were compared to 32 young dancers that attended the same lessons without a mirror (non-mirror group). Balance was evaluated by BESS (Balance Error Scoring System), which consists of three stances (double limb, single limb, and tandem) on two surfaces (firm and foam). The errors were assessed at each stance and summed to create the two subtotal scores (firm and foam surface) and the final total score (BESS). The BESS was performed at recruitment (T0) and after 6 months of dance lessons (T1). Results The repeated measures ANOVA analysis showed that for the BESS total score there is a difference due to the time (F = 3.86; p < 0.05). No other differences due to the group or to the time of measurement were found (p > 0.05). The analysis of the multiple regression model showed the influence of the values at T0 for every BESS items and the dominance of limb for stability on an unstable surface standing on one or two legs. Conclusions These preliminary results suggest that the use of a mirror in a ballet classroom does not improve balance acquisition of the dancer. On the other hand, improvement found after 6 months confirms that at the age of the dancers studied motor skills and balance can easily be trained and improved. PMID:24996519
Effect of teaching with or without mirror on balance in young female ballet students.

PubMed

Notarnicola, Angela; Maccagnano, Giuseppe; Pesce, Vito; Di Pierro, Silvia; Tafuri, Silvio; Moretti, Biagio

2014-07-04

In literature there is a general consensus that the use of the mirror improves proprioception. During rehabilitation the mirror is an important instrument to improve stability. In some sports, such as dancing, mirrors are widely used during training. The purpose of this study is to evaluate the effectiveness of the use of a mirror on balance in young dancers. Sixty-four young dancers (ranging from 9-10 years) were included in this study. Thirty-two attending lessons with a mirror (mirror- group) were compared to 32 young dancers that attended the same lessons without a mirror (non-mirror group). Balance was evaluated by BESS (Balance Error Scoring System), which consists of three stances (double limb, single limb, and tandem) on two surfaces (firm and foam). The errors were assessed at each stance and summed to create the two subtotal scores (firm and foam surface) and the final total score (BESS). The BESS was performed at recruitment (T0) and after 6 months of dance lessons (T1). The repeated measures ANOVA analysis showed that for the BESS total score there is a difference due to the time (F = 3.86; p < 0.05). No other differences due to the group or to the time of measurement were found (p > 0.05). The analysis of the multiple regression model showed the influence of the values at T0 for every BESS items and the dominance of limb for stability on an unstable surface standing on one or two legs. These preliminary results suggest that the use of a mirror in a ballet classroom does not improve balance acquisition of the dancer. On the other hand, improvement found after 6 months confirms that at the age of the dancers studied motor skills and balance can easily be trained and improved.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.