Identifying predictors of physics item difficulty: A linear regression approach
NASA Astrophysics Data System (ADS)
Mesic, Vanes; Muratovic, Hasnija
2011-06-01
Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge structures. Identified predictors point out the fundamental cognitive dimensions of student physics achievement at the end of compulsory education in Bosnia and Herzegovina, whose level of development influenced the test results within the conducted assessments.
Comparison of university students' understanding of graphs in different contexts
NASA Astrophysics Data System (ADS)
Planinic, Maja; Ivanjek, Lana; Susac, Ana; Milin-Sipus, Zeljka
2013-12-01
This study investigates university students’ understanding of graphs in three different domains: mathematics, physics (kinematics), and contexts other than physics. Eight sets of parallel mathematics, physics, and other context questions about graphs were developed. A test consisting of these eight sets of questions (24 questions in all) was administered to 385 first year students at University of Zagreb who were either prospective physics or mathematics teachers or prospective physicists or mathematicians. Rasch analysis of data was conducted and linear measures for item difficulties were obtained. Average difficulties of items in three domains (mathematics, physics, and other contexts) and over two concepts (graph slope, area under the graph) were computed and compared. Analysis suggests that the variation of average difficulty among the three domains is much smaller for the concept of graph slope than for the concept of area under the graph. Most of the slope items are very close in difficulty, suggesting that students who have developed sufficient understanding of graph slope in mathematics are generally able to transfer it almost equally successfully to other contexts. A large difference was found between the difficulty of the concept of area under the graph in physics and other contexts on one side and mathematics on the other side. Comparison of average difficulty of the three domains suggests that mathematics without context is the easiest domain for students. Adding either physics or other context to mathematical items generally seems to increase item difficulty. No significant difference was found between the average item difficulty in physics and contexts other than physics, suggesting that physics (kinematics) remains a difficult context for most students despite the received instruction on kinematics in high school.
Modelling Question Difficulty in an A Level Physics Examination
ERIC Educational Resources Information Center
Crisp, Victoria; Grayson, Rebecca
2013-01-01
"Item difficulty modelling" is a technique used for a number of purposes such as to support future item development, to explore validity in relation to the constructs that influence difficulty and to predict the difficulty of items. This research attempted to explore the factors influencing question difficulty in a general qualification…
ERIC Educational Resources Information Center
Marie, S. Maria Josephine Arokia; Edannur, Sreekala
2015-01-01
This paper focused on the analysis of test items constructed in the paper of teaching Physical Science for B.Ed. class. It involved the analysis of difficulty level and discrimination power of each test item. Item analysis allows selecting or omitting items from the test, but more importantly item analysis is a tool to help the item writer improve…
ERIC Educational Resources Information Center
Alberta Dept. of Education, Edmonton.
This document outlines the use of machine-scorable open-ended questions for the evaluation of Physics 30 in Alberta. Contents include: (1) an introduction to the questions; (2) sample instruction sheet; (3) fifteen sample items; (4) item information including the key, difficulty, and source of each item; (5) solutions to items having multiple…
Mokken scaling of the Myocardial Infarction Dimensional Assessment Scale (MIDAS).
Thompson, David R; Watson, Roger
2011-02-01
The purpose of this study was to examine the hierarchical and cumulative nature of the 35 items of the Myocardial Infarction Dimensional Assessment Scale (MIDAS), a disease-specific health-related quality of life measure. Data from 668 participants who completed the MIDAS were analysed using the Mokken Scaling Procedure, which is a computer program that searches polychotomous data for hierarchical and cumulative scales on the basis of a range of diagnostic criteria. Fourteen MIDAS items were retained in a Mokken scale and these items included physical activity, insecurity, emotional reaction and dependency items but excluded items related to diet, medication or side-effects. Item difficulty, in item response theory terms, ran from physical activity items (low difficulty) to insecurity, suggesting that the most severe quality of life effect of myocardial infarction is loneliness and isolation. Items from the MIDAS form a strong and reliable Mokken scale, which provides new insight into the relationship between items in the MIDAS and the measurement of quality of life after myocardial infarction. © 2010 Blackwell Publishing Ltd.
Probing University Students' Pre-Knowledge in Quantum Physics with QPCS Survey
ERIC Educational Resources Information Center
Asikainen, Mervi A.
2017-01-01
The study investigated the use of Quantum Physics Conceptual Survey (QPCS) in probing student understanding of quantum physics. Altogether 103 Finnish university students responded to QPCS. The mean scores of the student responses were calculated and the test was evaluated using common five indices: Item difficulty index, Item discrimination…
Identifying Predictors of Physics Item Difficulty: A Linear Regression Approach
ERIC Educational Resources Information Center
Mesic, Vanes; Muratovic, Hasnija
2011-01-01
Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary…
ERIC Educational Resources Information Center
Planinic, Maja; Boone, William J.; Krsnik, Rudolf; Beilfuss, Meredith L.
2006-01-01
Croatian 1st-year and 3rd-year high-school students (N = 170) completed a conceptual physics test. Students were evaluated with regard to two physics topics: Newtonian dynamics and simple DC circuits. Students answered test items and also indicated their confidence in each answer. Rasch analysis facilitated the calculation of three linear…
Measuring Student Learning with Item Response Theory
ERIC Educational Resources Information Center
Lee, Young-Jin; Palazzo, David J.; Warnakulasooriya, Rasil; Pritchard, David E.
2008-01-01
We investigate short-term learning from hints and feedback in a Web-based physics tutoring system. Both the skill of students and the difficulty and discrimination of items were determined by applying item response theory (IRT) to the first answers of students who are working on for-credit homework items in an introductory Newtonian physics…
Choi, Bongsam
2018-01-01
[Purpose] This study aimed to cross-cultural adapt and validate the Korean version of an physical activity measure (K-PAM) for community-dwelling elderly. [Subjects and Methods] One hundred and thirty eight community-dwelling elderlies, 32 males and 106 female, participated in the study. All participants were asked to fill out a fifty-one item questionnaire measuring perceived difficulty in the activities of daily living (ADL) for the elderly. One-parameter model of item response theory (Rasch analysis) was applied to determine the construct validity and to inspect item-level psychometric properties of 51 ADL items of the K-PAM. [Results] Person separation reliability (analogous to Cronbach's alpha) for internal consistency was ranging 0.93 to 0.94. A total of 16 items was misfit to the Rasch model. After misfit item deletion, 35 ADL items of the K-PAM were placed in an empirically meaningful hierarchy from easy to hard. The item-person map analysis delineated that the item difficulty was well matched for the elderlies with moderate and low ability except for high ceilings. [Conclusion] Cross-cultural adapted K-PAM was shown to be sufficient for establishing construct validity and stable psychometric properties confirmed by person separation reliability and fit statistics.
A measure of early physical functioning (EPF) post-stroke.
Finch, Lois E; Higgins, Johanne; Wood-Dauphinee, Sharon; Mayo, Nancy E
2008-07-01
To develop a comprehensive measure of Early Physical Functioning (EPF) post-stroke quantified through Rasch analysis and conceptualized using the International Classification of Functioning Disability and Health (ICF). An observational cohort study. A cohort of 262 subjects (mean age 71.6 (standard deviation 12.5) years) hospitalized post-acute stroke. Functional assessments were made within 3 days of stroke with items from valid and reliable indices commonly utilized to evaluate stroke survivors. Information on important variables was also collected. Principal component and Rasch analysis confirmed the factor structure, and dimensionality of the measure. Rasch analysis combined items across ICF components to develop the measure. Items were deleted iteratively, those retained fit the model and were related to the construct; reliability and validity were assessed. A 38-item unidimensional measure of the EPF met all Rasch model requirements. The item difficulty matched the person ability (mean person measure: -0.31; standard error 0.37 logits), reliability of the person-item-hierarchy was excellent at 0.97. Initial validity was adequate. The 38-item EPF measure was developed. It expands the range of assessment post acute stroke; it covers a broad spectrum of difficulty with good initial psychometric properties that, once revalidated, can assist in planning and evaluating early interventions.
Development of Thermodynamic Conceptual Evaluation
NASA Astrophysics Data System (ADS)
Talaeb, P.; Wattanakasiwich, P.
2010-07-01
This research aims to develop a test for assessing student understanding of fundamental principles in thermodynamics. Misconceptions found from previous physics education research were used to develop the test. Its topics include heat and temperature, the zeroth and the first law of thermodynamics, and the thermodynamics processes. The content validity was analyzed by three physics experts. Then the test was administered to freshmen, sophomores and juniors majored in physics in order to determine item difficulties and item discrimination of the test. A few items were eliminated from the test. Finally, the test will be administered to students taking Physics I course in order to evaluate the effectiveness of Interactive Lecture Demonstrations that will be used for the first time at Chiang Mai University.
Students’ understanding of forces: Force diagrams on horizontal and inclined plane
NASA Astrophysics Data System (ADS)
Sirait, J.; Hamdani; Mursyid, S.
2018-03-01
This study aims to analyse students’ difficulties in understanding force diagrams on horizontal surfaces and inclined planes. Physics education students (pre-service physics teachers) of Tanjungpura University, who had completed a Basic Physics course, took a Force concept test which has six questions covering three concepts: an object at rest, an object moving at constant speed, and an object moving at constant acceleration both on a horizontal surface and on an inclined plane. The test is in a multiple-choice format. It examines the ability of students to select appropriate force diagrams depending on the context. The results show that 44% of students have difficulties in solving the test (these students only could solve one or two items out of six items). About 50% of students faced difficulties finding the correct diagram of an object when it has constant speed and acceleration in both contexts. In general, students could only correctly identify 48% of the force diagrams on the test. The most difficult task for the students in terms was identifying the force diagram representing forces exerted on an object on in an inclined plane.
Development and assessment of floor and ceiling items for the PROMIS physical function item bank
2013-01-01
Introduction Disability and Physical Function (PF) outcome assessment has had limited ability to measure functional status at the floor (very poor functional abilities) or the ceiling (very high functional abilities). We sought to identify, develop and evaluate new floor and ceiling items to enable broader and more precise assessment of PF outcomes for the NIH Patient-Reported-Outcomes Measurement Information System (PROMIS). Methods We conducted two cross-sectional studies using NIH PROMIS item improvement protocols with expert review, participant survey and focus group methods. In Study 1, respondents with low PF abilities evaluated new floor items, and those with high PF abilities evaluated new ceiling items for clarity, importance and relevance. In Study 2, we compared difficulty ratings of new floor items by low functioning respondents and ceiling items by high functioning respondents to reference PROMIS PF-10 items. We used frequencies, percentages, means and standard deviations to analyze the data. Results In Study 1, low (n = 84) and high (n = 90) functioning respondents were mostly White, women, 70 years old, with some college, and disability scores of 0.62 and 0.30. More than 90% of the 31 new floor and 31 new ceiling items were rated as clear, important and relevant, leaving 26 ceiling and 30 floor items for Study 2. Low (n = 246) and high (n = 637) functioning Study 2 respondents were mostly White, women, 70 years old, with some college, and Health Assessment Questionnaire (HAQ) scores of 1.62 and 0.003. Compared to difficulty ratings of reference items, ceiling items were rated to be 10% more to greater than 40% more difficult to do, and floor items were rated to be about 12% to nearly 90% less difficult to do. Conclusions These new floor and ceiling items considerably extend the measurable range of physical function at either extreme. They will help improve instrument performance in populations with broad functional ranges and those concentrated at one or the other extreme ends of functioning. Optimal use of these new items will be assisted by computerized adaptive testing (CAT), reducing questionnaire burden and insuring item administration to appropriate individuals. PMID:24286166
O'Brien, Kelly K; Bayoumi, Ahmed M; Stratford, Paul; Solomon, Patricia
2015-01-01
To assess the dimensions of disability measured by the HIV Disability Questionnaire (HDQ), a newly developed 72-item self-administered questionnaire that describes the presence, severity and episodic nature of disability experienced by people living with HIV. We recruited adults living with HIV from hospital clinics, AIDS service organizations and a specialty hospital and administered the HDQ followed by a demographic questionnaire. We conducted an exploratory factor analysis using disability severity scores to determine the domains of disability in the HDQ. We used the following steps: (a) ensured correlations between items were >0.30 and <0.80; (b) conducted a principal components analysis to extract factors; (c) used the Scree Test and eigenvalue threshold >1.5 to determine the number of factors to retain; and d) used oblique rotation to simplify the factor loading matrix. We assigned items to factors based on factor loadings of >0.30. Of the 361 participants, 80% were men and 77% reported living with at least two concurrent health conditions in addition to HIV. The exploratory factor analysis suggested retaining six factors. Items related to symptoms and impairments loaded on three factors (physical [20 items], cognitive [3 items], and mental and emotional health [11 items]) and items related to worrying about the future, daily activities, and personal relationships loaded on three additional factors (uncertainty [14 items], difficulties with day-to-day activities [9 items], social inclusion [12 items]). The HDQ has six domains: physical symptoms and impairments; cognitive symptoms and impairments; mental and emotional health symptoms and impairments; uncertainty; difficulties with day-to-day activities and challenges to social inclusion. These domains establish the scoring structure for the dimensions of disability measured by the HDQ. Implications for Rehabilitation As individuals live longer and age with HIV, they may be living with the health-related consequences of HIV and concurrent health conditions, a concept that may be termed disability. Measuring disability is important to understand the impact of HIV and its comorbidities. The HIV Disability Questionnaire (HDQ) is a self-administered questionnaire developed to describe the presence, severity and episodic nature of disability experienced by people living with HIV. The HDQ is comprised of six domains of disability including: physical symptoms and impairments (20 items); cognitive symptoms and impairments (3 items); mental and emotional health symptoms and impairments (11 items); uncertainty (14 items); difficulties with day-to-day activities (9 items) and challenges to social inclusion (12 items). These domains represent the dimensions of disability measured by the HDQ. The HDQ is the first known HIV-specific disability measure for adults living with HIV. The HDQ may be used by clinicians and researchers to assess disability experienced by adults living with HIV.
Coons, Stephen Joel; Chongpison, Yuda; Wendel, Christopher S; Grant, Marcia; Krouse, Robert S
2007-09-01
To explore whether there was a significant relationship between difficulty paying for ostomy supplies and overall quality of life among a sample of ostomates receiving care from the Veterans Health Administration (VHA). The data were collected as part of the Veterans Affairs (VA) Ostomy Health-Related Quality of Life Study, in which 511 respondents (239 cases, 272 controls) completed a survey instrument that included the modified City of Hope Quality of Life (mCOH-QOL) Ostomy questionnaire, SF-36V, and sociodemographic items. Responses from the 239 cases (ie, patients with intestinal stomas) were used in this analysis. The modified City of Hope Quality of Life Ostomy questionnaire item, "How good is your overall quality of life?," was the dependent variable for this analysis. The primary independent variable was the response (yes/no) to the item, "If you pay for any of the (ostomy) costs, is it difficult for you?" A hierarchical regression model was used to examine whether difficulty paying was significantly related to overall quality of life after adjusting for age, income, race/ethnicity, and physical health. After accounting for the proportion of variance explained by age, income, race/ethnicity, and physical health, the additional proportion of variance explained by difficulty paying was statistically significant. Individuals reporting difficulty paying had a roughly 1 point lower (ie, beta-coefficient = -1.052; SE = 0.481) overall quality of life score on the 11-point scale. We found a significant association between difficulty paying for ostomy supplies and overall quality of life. Although the cross-sectional study design does not allow causal inference, the results suggest a relationship that merits further examination.
Item Writer Judgments of Item Difficulty versus Actual Item Difficulty: A Case Study
ERIC Educational Resources Information Center
Sydorenko, Tetyana
2011-01-01
This study investigates how accurate one item writer can be on item difficulty estimates and whether factors affecting item writer judgments correspond to predictors of actual item difficulty. The items were based on conversational dialogs (presented as videos online) that focus on pragmatic functions. Thirty-five 2nd-, 3rd-, and 4th-year learners…
Aritake, Sayaka; Asaoka, Shoichi; Kagimura, Tatsuo; Shimura, Akiyoshi; Futenma, Kunihiro; Komada, Yoko; Inoue, Yuichi
2015-04-01
This study was conducted to determine what symptom components or conditions of insomnia are related to subjective feelings of insomnia, low health-related quality of life (HRQOL), or depression. Data from 7,027 Japanese adults obtained using an Internet-based questionnaire survey was analyzed to examine associations between demographic variables and each sleep difficulty symptom item on the Pittsburgh Sleep Quality Index (PSQI) with the presence/absence of subjective insomnia and scores on the Short Form-8 (SF-8) and Center for Epidemiologic Studies Depression Scale (CES-D). Prevalence of subjective insomnia was 12.2% (n = 860). Discriminant function analysis revealed that item scores for sleep quality, sleep latency, and sleep medication use on the PSQI and CES-D showed relatively high discriminant function coefficients for identifying positivity for the subjective feeling of insomnia. Among respondents with subjective insomnia, a low SF-8 physical component summary score was associated with higher age, depressive state, and PSQI items for sleep difficulty and daytime dysfunction, whereas a low SF-8 mental component summary score was associated with depressive state, PSQI sleep latency, sleeping medication use, and daytime dysfunction. Depressive state was significantly associated with sleep latency, sleeping medication use, and daytime dysfunction. Among insomnia symptom components, disturbed sleep quality and sleep onset insomnia may be specifically associated with subjective feelings of the disorder. The existence of a depressive state could be significantly associated with not only subjective insomnia but also mental and physical QOL. Our results also suggest that different components of sleep difficulty, as measured by the PSQI, might be associated with mental and physical QOL and depressive status.
Tulsky, David S; Kisala, Pamela A; Tate, Denise G; Spungen, Ann M; Kirshblum, Steven C
2015-05-01
To describe the development and psychometric properties of the Spinal Cord Injury--Quality of Life (SCI-QOL) Bladder Management Difficulties and Bowel Management Difficulties item banks and Bladder Complications scale. Using a mixed-methods design, a pool of items assessing bladder and bowel-related concerns were developed using focus groups with individuals with spinal cord injury (SCI) and SCI clinicians, cognitive interviews, and item response theory (IRT) analytic approaches, including tests of model fit and differential item functioning. Thirty-eight bladder items and 52 bowel items were tested at the University of Michigan, Kessler Foundation Research Center, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters VA Medical Center, Bronx, NY. Seven hundred fifty-seven adults with traumatic SCI. The final item banks demonstrated unidimensionality (Bladder Management Difficulties CFI=0.965; RMSEA=0.093; Bowel Management Difficulties CFI=0.955; RMSEA=0.078) and acceptable fit to a graded response IRT model. The final calibrated Bladder Management Difficulties bank includes 15 items, and the final Bowel Management Difficulties item bank consists of 26 items. Additionally, 5 items related to urinary tract infections (UTI) did not fit with the larger Bladder Management Difficulties item bank but performed relatively well independently (CFI=0.992, RMSEA=0.050) and were thus retained as a separate scale. The SCI-QOL Bladder Management Difficulties and Bowel Management Difficulties item banks are psychometrically robust and are available as computer adaptive tests or short forms. The SCI-QOL Bladder Complications scale is a brief, fixed-length outcomes instrument for individuals with a UTI.
Is the Factor Observed in Investigations on the Item-Position Effect Actually the Difficulty Factor?
Schweizer, Karl; Troche, Stefan
2018-02-01
In confirmatory factor analysis quite similar models of measurement serve the detection of the difficulty factor and the factor due to the item-position effect. The item-position effect refers to the increasing dependency among the responses to successively presented items of a test whereas the difficulty factor is ascribed to the wide range of item difficulties. The similarity of the models of measurement hampers the dissociation of these factors. Since the item-position effect should theoretically be independent of the item difficulties, the statistical ex post manipulation of the difficulties should enable the discrimination of the two types of factors. This method was investigated in two studies. In the first study, Advanced Progressive Matrices (APM) data of 300 participants were investigated. As expected, the factor thought to be due to the item-position effect was observed. In the second study, using data simulated to show the major characteristics of the APM data, the wide range of items with various difficulties was set to zero to reduce the likelihood of detecting the difficulty factor. Despite this reduction, however, the factor now identified as item-position factor, was observed in virtually all simulated datasets.
Physical performance testing in mucopolysaccharidosis I: a pilot study.
Dumas, Helene M; Fragala, Maria A; Haley, Stephen M; Skrinar, Alison M; Wraith, James E; Cox, Gerald F
2004-01-01
To develop and field-test a physical performance measure (MPS-PPM) for individuals with Mucopolysaccharidosis I (MPS I), a rare genetic disorder. Motor performance and endurance items were developed based on literature review, clinician feedback, feasibility, and equipment and training needs. A standardized testing protocol and scoring rules were created. The MPS-PPM includes: Arm Function (7 items), Leg Function (5 items), and Endurance (2 items). Pilot data were collected for 10 subjects (ages 5-29 years). We calculated Spearman's rho correlations between age, severity and summary z-scores on the MPS-PPM. Subjects had variable presentations, as correlations among the three sub-test scores were not significant. Increasing age was related to greater severity in physical performance (r = 0.72, p<0.05) and lower scores on the Leg Function (r = -0.67, p<0.05) and Endurance (r = -0.65, p<0.05) sub-tests. The MPS-PPM was sensitive to detecting physical performance deficits, as six subjects could not complete the full battery of Arm Function items and eight subjects were unable to complete all Leg Function items. Subjects walked more slowly and expended more energy than typically developing peers. Individuals with MPS I have difficulty with arm and leg function and reduced endurance. The MPS-PPM is a clinically feasible measure that detects limitations in physical performance and may have potential to quantify changes in function following intervention. Copyright 2004 Taylor and Francis Ltd.
ERIC Educational Resources Information Center
Matlock, Ki Lynn; Turner, Ronna
2016-01-01
When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
ERIC Educational Resources Information Center
Kostin, Irene
2004-01-01
The purpose of this study is to explore the relationship between a set of item characteristics and the difficulty of TOEFL[R] dialogue items. Identifying characteristics that are related to item difficulty has the potential to improve the efficiency of the item-writing process The study employed 365 TOEFL dialogue items, which were coded on 49…
Statistical Approaches to the Study of Item Difficulty.
ERIC Educational Resources Information Center
Olson, John F.; And Others
Traditionally, item difficulty has been defined in terms of the performance of examinees. For test development purposes, a more useful concept would be some kind of intrinsic item difficulty, defined in terms of the item's content, context, or characteristics and the task demands set by the item. In this investigation, the measurement literature…
Benaïm, C; Perennou, D-A; Pelissier, J-Y; Daures, J-P
2010-02-01
Many clinical scales contain items that are scored separately prior to being compiled into a single score. However, if the items have different degrees of importance, they should be weighted differently before being compiled. The principal aims of this study were to show how the "analytic hierarchy process" (AHP), which has never been used for this purpose, can be applied to weighting the six items of the "London handicap scale", and to compare the AHP to the "conjoint analysis" (CA), which was previously implemented by Harwood et al. (1994) [1]. In order to assess the relative importance of the six items, we submitted AHP and CA to a group of 10 physiatrists. We compared the methods in terms of item ranking according to importance, assessment of fictitious patients based on weights determined by each method, and perceived difficulty by the physiatrist. For both techniques, "Physical independence" (PHY) was the best-weighted item, but other ranks varied depending on the technique. AHP was better than CA in terms of accuracy (global assessment of the clinical status) and perceived difficulty. AHP may be used to reveal the importance that experts assign to the items of a multidimensional scale, and to calculate the appropriate weights for specific items. For this purpose, AHP seems to be more accurate than CA.
Predicting Item Difficulty of Science National Curriculum Tests: The Case of Key Stage 2 Assessments
ERIC Educational Resources Information Center
El Masri, Yasmine H.; Ferrara, Steve; Foltz, Peter W.; Baird, Jo-Anne
2017-01-01
Predicting item difficulty is highly important in education for both teachers and item writers. Despite identifying a large number of explanatory variables, predicting item difficulty remains a challenge in educational assessment with empirical attempts rarely exceeding 25% of variance explained. This paper analyses 216 science items of key stage…
Lallukka, Tea; Ferrie, Jane E; Rahkonen, Ossi; Shipley, Martin J; Pietiläinen, Olli; Kivimäki, Mika; Marmot, Michael G; Lahelma, Eero
2013-09-01
The main aims of this longitudinal study were to (i) examine associations between changes in economic difficulties and health functioning among middle-aged employees and (ii) assess whether the associations remained after considering conventional domains of socioeconomic position. The associations were tested in two European welfare state occupational cohorts to strengthen the evidence base and improve generalizability. Data came from two cohorts: the Finnish Helsinki Health Study (baseline 2000-2002, follow-up 2007, N = 6328) and the British Whitehall II Study (baseline 1997-1999, follow-up 2003-2004, N = 4350). Responses to the survey item "finding it hard to afford adequate food and clothes and pay bills" repeated at baseline and follow-up were used to examine persistent, increasing, and decreasing economic difficulties. Poor physical and mental health functioning were denoted as being in the lowest quartile of the Short Form 36 physical and mental component summary. Logistic regression analyses were adjusted for sex, age, childhood economic difficulties, household income at baseline and follow-up, employment status at follow-up, and baseline health functioning. We observed strong sex- and age-adjusted associations between increasing [odds ratio (OR) range 1.69-2.96] and persistent (OR range 2.54-3.21) economic difficulties and poorer physical and mental health functioning in both British and Finnish occupational cohorts. These associations remained after full adjustments. Those reporting decreasing difficulties over follow-up also had poorer functioning (OR range 1.30-1.61) compared to those who did not have difficulties at baseline, possibly reflecting residual effects of economic difficulties at baseline. Changes in economic difficulties are associated with poorer physical and mental health functioning independent of income, employment status, and baseline health functioning.
Torgén, M; Winkel, J; Alfredsson, L; Kilbom, A
1999-06-01
The principal aim of the present study was to evaluate questionnaire-based information on past physical work loads (6-year recall). Effects of memory difficulties on reproducibility were evaluated for 82 subjects by comparing previously reported results on current work loads (test-retest procedure) with the same items recalled 6 years later. Validity was assessed by comparing self-reports in 1995, regarding work loads in 1989, with worksite measurements performed in 1989. Six-year reproducibility, calculated as weighted kappa coefficients (k(w)), varied between 0.36 and 0.86, with the highest values for proportion of the workday spent sitting and for perceived general exertion and the lowest values for trunk and neck flexion. The six-year reproducibility results were similar to previously reported test-retest results for these items; this finding indicates that memory difficulties was a minor problem. The validity of the questionnaire responses, expressed as rank correlations (r(s)) between the questionnaire responses and workplace measurements, varied between -0.16 and 0.78. The highest values were obtained for the items sitting and repetitive work, and the lowest and "unacceptable" values were for head rotation and neck flexion. Misclassification of exposure did not appear to be differential with regard to musculoskeletal symptom status, as judged by the calculated risk estimates. The validity of some of these self-administered questionnaire items appears sufficient for a crude assessment of physical work loads in the past in epidemiologic studies of the general population with predominantly low levels of exposure.
An Improved Internal Consistency Reliability Estimate.
ERIC Educational Resources Information Center
Cliff, Norman
1984-01-01
The proposed coefficient is derived by assuming that the average Goodman-Kruskal gamma between items of identical difficulty would be the same for items of different difficulty. An estimate of covariance between items of identical difficulty leads to an estimate of the correlation between two tests with identical distributions of difficulty.…
ERIC Educational Resources Information Center
Matlock, Ki Lynn
2013-01-01
When test forms that have equal total test difficulty and number of items vary in difficulty and length within sub-content areas, an examinee's estimated score may vary across equivalent forms, depending on how well his or her true ability in each sub-content area aligns with the difficulty of items and number of items within these areas.…
Rasch model based analysis of the Force Concept Inventory
NASA Astrophysics Data System (ADS)
Planinic, Maja; Ivanjek, Lana; Susac, Ana
2010-06-01
The Force Concept Inventory (FCI) is an important diagnostic instrument which is widely used in the field of physics education research. It is therefore very important to evaluate and monitor its functioning using different tools for statistical analysis. One of such tools is the stochastic Rasch model, which enables construction of linear measures for persons and items from raw test scores and which can provide important insight in the structure and functioning of the test (how item difficulties are distributed within the test, how well the items fit the model, and how well the items work together to define the underlying construct). The data for the Rasch analysis come from the large-scale research conducted in 2006-07, which investigated Croatian high school students’ conceptual understanding of mechanics on a representative sample of 1676 students (age 17-18 years). The instrument used in research was the FCI. The average FCI score for the whole sample was found to be (27.7±0.4)% , indicating that most of the students were still non-Newtonians at the end of high school, despite the fact that physics is a compulsory subject in Croatian schools. The large set of obtained data was analyzed with the Rasch measurement computer software WINSTEPS 3.66. Since the FCI is routinely used as pretest and post-test on two very different types of population (non-Newtonian and predominantly Newtonian), an additional predominantly Newtonian sample ( N=141 , average FCI score of 64.5%) of first year students enrolled in introductory physics course at University of Zagreb was also analyzed. The Rasch model based analysis suggests that the FCI has succeeded in defining a sufficiently unidimensional construct for each population. The analysis of fit of data to the model found no grossly misfitting items which would degrade measurement. Some items with larger misfit and items with significantly different difficulties in the two samples of students do require further examination. The analysis revealed some problems with item distribution in the FCI and suggested that the FCI may function differently in non-Newtonian and predominantly Newtonian population. Some possible improvements of the test are suggested.
Development and Psychometric Evaluation of the Gay Male Sexual Difficulties Scale.
McDonagh, Lorraine K; Stewart, Ian; Morrison, Melanie A; Morrison, Todd G
2016-08-01
Sexual difficulties (i.e., disturbances in normal sexual responding) have the potential to significantly and negatively affect men's social and psychological well-being. However, a review of published measurement tools indicates that most have limited applicability to gay men, and none offer a nuanced understanding of sexual difficulties, as experienced by members of this population. To address this omission, the Gay Male Sexual Difficulties Scale (GMSDS) was developed using a sequential mixed-methods approach. The 25-item GMSDS uses a 6-point frequency Likert-type response format and examines: difficulties with receptive and insertive anal intercourse (5 items each); erectile difficulties (4 items); foreskin difficulties (4 items); body embarrassment (4 items); and seminal fluid concerns (3 items). The measure's scale score dimensionality, assessed using both exploratory and confirmatory factor analyses, as well as scale score reliability and validity (e.g., known-groups and convergent) was tested and deemed to be satisfactory. Limitations of the current series of studies and directions for future research are discussed.
Comparison of Alternate and Original Items on the Montreal Cognitive Assessment.
Lebedeva, Elena; Huang, Mei; Koski, Lisa
2016-03-01
The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. None of the five items from the alternate versions matched the difficulty level of their corresponding original items. This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time.
Gimeno-Santos, Elena; Raste, Yogini; Demeyer, Heleen; Louvaris, Zafeiris; de Jong, Corina; Rabinovich, Roberto A; Hopkinson, Nicholas S; Polkey, Michael I; Vogiatzis, Ioannis; Tabberer, Maggie; Dobbels, Fabienne; Ivanoff, Nathalie; de Boer, Willem I; van der Molen, Thys; Kulich, Karoly; Serra, Ignasi; Basagaña, Xavier; Troosters, Thierry; Puhan, Milo A; Karlsson, Niklas; Garcia-Aymerich, Judith
2015-10-01
No current patient-centred instrument captures all dimensions of physical activity in chronic obstructive pulmonary disease (COPD). Our objective was item reduction and initial validation of two instruments to measure physical activity in COPD.Physical activity was assessed in a 6-week, randomised, two-way cross-over, multicentre study using PROactive draft questionnaires (daily and clinical visit versions) and two activity monitors. Item reduction followed an iterative process including classical and Rasch model analyses, and input from patients and clinical experts.236 COPD patients from five European centres were included. Results indicated the concept of physical activity in COPD had two domains, labelled "amount" and "difficulty". After item reduction, the daily PROactive instrument comprised nine items and the clinical visit contained 14. Both demonstrated good model fit (person separation index >0.7). Confirmatory factor analysis supported the bidimensional structure. Both instruments had good internal consistency (Cronbach's α>0.8), test-retest reliability (intraclass correlation coefficient ≥0.9) and exhibited moderate-to-high correlations (r>0.6) with related constructs and very low correlations (r<0.3) with unrelated constructs, providing evidence for construct validity.Daily and clinical visit "PROactive physical activity in COPD" instruments are hybrid tools combining a short patient-reported outcome questionnaire and two activity monitor variables which provide simple, valid and reliable measures of physical activity in COPD patients. Copyright ©ERS 2015.
ERIC Educational Resources Information Center
Nissan, Susan; And Others
One of the item types in the Listening Comprehension section of the Test of English as a Foreign Language (TOEFL) test is the dialogue. Because the dialogue item pool needs to have an appropriate balance of items at a range of difficulty levels, test developers have examined items at various difficulty levels in an attempt to identify their…
Sources of difficulty in assessment: example of PISA science items
NASA Astrophysics Data System (ADS)
Le Hebel, Florence; Montpied, Pascale; Tiberghien, Andrée; Fontanieu, Valérie
2017-03-01
The understanding of what makes a question difficult is a crucial concern in assessment. To study the difficulty of test questions, we focus on the case of PISA, which assesses to what degree 15-year-old students have acquired knowledge and skills essential for full participation in society. Our research question is to identify PISA science item characteristics that could influence the item's proficiency level. It is based on an a-priori item analysis and a statistical analysis. Results show that only the cognitive complexity and the format out of the different characteristics of PISA science items determined in our a-priori analysis have an explanatory power on an item's proficiency levels. The proficiency level cannot be explained by the dependence/independence of the information provided in the unit and/or item introduction and the competence. We conclude that in PISA, it appears possible to anticipate a high proficiency level, that is, students' low scores for items displaying a high cognitive complexity. In the case of a middle or low cognitive complexity level item, the cognitive complexity level is not sufficient to predict item difficulty. Other characteristics play a crucial role in item difficulty. We discuss anticipating the difficulties in assessment in a broader perspective.
Is the Factor Observed in Investigations on the Item-Position Effect Actually the Difficulty Factor?
ERIC Educational Resources Information Center
Schweizer, Karl; Troche, Stefan
2018-01-01
In confirmatory factor analysis quite similar models of measurement serve the detection of the difficulty factor and the factor due to the item-position effect. The item-position effect refers to the increasing dependency among the responses to successively presented items of a test whereas the difficulty factor is ascribed to the wide range of…
Berg-Poppe, Patti; MacCabe, Angela; Karges, Joy
2018-05-14
Ethical decision making is situated within dynamic contexts specific to practice standards and environments, contemporary policy, and responsive educational systems. Reflecting on an evolving profession lends insight into the role of ethical codes of conduct. The purpose of this study was to gather information about the frequency and perceived difficulty physical therapists (PTs) experience with common ethical situations within contemporary clinical practice. PTs within the United States were invited to participate in an online questionnaire replicating a 1980 study. Subjects were 336 PTs from a variety of practice environments. Just 1 item was reported as moderately/extremely difficult by contemporary respondents, compared to 16 items in the 1980 study. The number of items meeting moderate/high encounter frequency was similar between the two groups (2015 = 16/19; 1980 = 15/19). While today's PTs report that they encounter ethical situations at a frequency similar to PTs in 1980, these same PTs report these ethical challenges as minimally difficult when compared to PTs responding to the 1980 survey. It is proposed that a move toward autonomous practice, the elevation of the entry level professional degree, and changing health care policy and environments have been influential in shaping these changes over time.
Comparison of Alternate and Original Items on the Montreal Cognitive Assessment
Lebedeva, Elena; Huang, Mei; Koski, Lisa
2016-01-01
Background The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Methods Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. Results None of the five items from the alternate versions matched the difficulty level of their corresponding original items. Conclusions This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time. PMID:27076861
Item analysis of examinations in the Faculty of Medicine of Tunis.
Hermi, Amene; Achour, Wafa
2016-04-01
Introduction Item analysis is the process of collecting, summarizing and using information from students' responses to assess test items' quality. This study used this approach to evaluate the quality of items and examinations given in the Faculty of Medicine of Tunis (FMT). Methods This study concerned the examinations of 2012-2013 (principal session). It analyzed 3138 items from 66 examinations, of which, 46 were multidisciplinary (187 disciplines). A total of 2515 students took the examinations. "AnItem.xls" file was used for the analysis that focused on difficulty, discrimination and internal consistency. Results Mean difficulty for all examinations was optimum (mean difficulty index: 0.59). Majority of items (89.17%) were either easy or of acceptable difficulty. Mean discrimination for all examinations was moderate (mean item discrimination coefficient: 0.28) with poor discrimination in 23.62% of items. Maximal discrimination occurred with disciplines of difficulty index between 0.4-0.6. « Ideal » items represented 27.02%. Mean internal consistency for all examinations was acceptable (Cronbach's alpha: 0.79). Disciplines with nonacceptable internal consistency (68.45%) contained a maximum of 33 items (each one) and a positive correlation between their alpha and the number of their questions. Distributions were mostly (72.73%) platykurtic and negatively asymmetric (89.39%). First year of studies had the best parameters. Conclusion Our examinations had an acceptable internal consistency, and a good level of difficulty and discrimination. They tended to facility and discriminated basically students of medium level. Item analysis is useful as a guide to item writers to improve the overall quality of questions in the future.
ERIC Educational Resources Information Center
Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem
2016-01-01
The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
ERIC Educational Resources Information Center
Scheuneman, Janice Dowd; Gerritz, Kalle
1990-01-01
Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)
Item Structural Properties as Predictors of Item Difficulty and Item Association.
ERIC Educational Resources Information Center
Solano-Flores, Guillermo
1993-01-01
Studied the ability of logical test design (LTD) to predict student performance in reading Roman numerals for 211 sixth graders in Mexico City tested on Roman numeral items varying on LTD-related and non-LTD-related variables. The LTD-related variable item iterativity was found to be the best predictor of item difficulty. (SLD)
Predicting Item Difficulty in a Reading Comprehension Test with an Artificial Neural Network.
ERIC Educational Resources Information Center
Perkins, Kyle; And Others
1995-01-01
This article reports the results of using a three-layer back propagation artificial neural network to predict item difficulty in a reading comprehension test. Three classes of variables were examined: text structure, propositional analysis, and cognitive demand. Results demonstrate that the networks can consistently predict item difficulty. (JL)
Extending item response theory to online homework
NASA Astrophysics Data System (ADS)
Kortemeyer, Gerd
2014-06-01
Item response theory (IRT) becomes an increasingly important tool when analyzing "big data" gathered from online educational venues. However, the mechanism was originally developed in traditional exam settings, and several of its assumptions are infringed upon when deployed in the online realm. For a large-enrollment physics course for scientists and engineers, the study compares outcomes from IRT analyses of exam and homework data, and then proceeds to investigate the effects of each confounding factor introduced in the online realm. It is found that IRT yields the correct trends for learner ability and meaningful item parameters, yet overall agreement with exam data is moderate. It is also found that learner ability and item discrimination is robust over a wide range with respect to model assumptions and introduced noise. Item difficulty is also robust, but over a narrower range.
Multiple choice questions can be designed or revised to challenge learners' critical thinking.
Tractenberg, Rochelle E; Gushta, Matthew M; Mulroney, Susan E; Weissinger, Peggy A
2013-12-01
Multiple choice (MC) questions from a graduate physiology course were evaluated by cognitive-psychology (but not physiology) experts, and analyzed statistically, in order to test the independence of content expertise and cognitive complexity ratings of MC items. Integration of higher order thinking into MC exams is important, but widely known to be challenging-perhaps especially when content experts must think like novices. Expertise in the domain (content) may actually impede the creation of higher-complexity items. Three cognitive psychology experts independently rated cognitive complexity for 252 multiple-choice physiology items using a six-level cognitive complexity matrix that was synthesized from the literature. Rasch modeling estimated item difficulties. The complexity ratings and difficulty estimates were then analyzed together to determine the relative contributions (and independence) of complexity and difficulty to the likelihood of correct answers on each item. Cognitive complexity was found to be statistically independent of difficulty estimates for 88 % of items. Using the complexity matrix, modifications were identified to increase some item complexities by one level, without affecting the item's difficulty. Cognitive complexity can effectively be rated by non-content experts. The six-level complexity matrix, if applied by faculty peer groups trained in cognitive complexity and without domain-specific expertise, could lead to improvements in the complexity targeted with item writing and revision. Targeting higher order thinking with MC questions can be achieved without changing item difficulties or other test characteristics, but this may be less likely if the content expert is left to assess items within their domain of expertise.
An objective measure of physical function of elderly outpatients. The Physical Performance Test.
Reuben, D B; Siu, A L
1990-10-01
Direct observation of physical function has the advantage of providing an objective, quantifiable measure of functional capabilities. We have developed the Physical Performance Test (PPT), which assesses multiple domains of physical function using observed performance of tasks that simulate activities of daily living of various degrees of difficulty. Two versions are presented: a nine-item scale that includes writing a sentence, simulated eating, turning 360 degrees, putting on and removing a jacket, lifting a book and putting it on a shelf, picking up a penny from the floor, a 50-foot walk test, and climbing stairs (scored as two items); and a seven-item scale that does not include stairs. The PPT can be completed in less than 10 minutes and requires only a few simple props. We then tested the validity of PPT using 183 subjects (mean age, 79 years) in six settings including four clinical practices (one of Parkinson's disease patients), a board-and-care home, and a senior citizens' apartment. The PPT was reliable (Cronbach's alpha = 0.87 and 0.79, interrater reliability = 0.99 and 0.93 for the nine-item and seven-item tests, respectively) and demonstrated concurrent validity with self-reported measures of physical function. Scores on the PPT for both scales were highly correlated (.50 to .80) with modified Rosow-Breslau, Instrumental and Basic Activities of Daily Living scales, and Tinetti gait score. Scores on the PPT were more moderately correlated with self-reported health status, cognitive status, and mental health (.24 to .47), and negatively with age (-.24 and -.18). Thus, the PPT also demonstrated construct validity. The PPT is a promising objective measurement of physical function, but its clinical and research value for screening, monitoring, and prediction will have to be determined.
The Effects of Judgment-Based Stratum Classifications on the Efficiency of Stratum Scored CATs.
ERIC Educational Resources Information Center
Finney, Sara J.; Smith, Russell W.; Wise, Steven L.
Two operational item pools were used to investigate the performance of stratum computerized adaptive tests (CATs) when items were assigned to strata based on empirical estimates of item difficulty or human judgments of item difficulty. Items from the first data set consisted of 54 5-option multiple choice items from a form of the ACT mathematics…
Vaughn, Kalif E; Rawson, Katherine A; Pyc, Mary A
2013-12-01
A wealth of previous research has established that retrieval practice promotes memory, particularly when retrieval is successful. Although successful retrieval promotes memory, it remains unclear whether successful retrieval promotes memory equally well for items of varying difficulty. Will easy items still outperform difficult items on a final test if all items have been correctly recalled equal numbers of times during practice? In two experiments, normatively difficult and easy Lithuanian-English word pairs were learned via test-restudy practice until each item had been correctly recalled a preassigned number of times (from 1 to 11 correct recalls). Despite equating the numbers of successful recalls during practice, performance on a delayed final cued-recall test was lower for difficult than for easy items. Experiment 2 was designed to diagnose whether the disadvantage for difficult items was due to deficits in cue memory, target memory, and/or associative memory. The results revealed a disadvantage for the difficult versus the easy items only on the associative recognition test, with no differences on cue recognition, and even an advantage on target recognition. Although successful retrieval enhanced memory for both difficult and easy items, equating retrieval success during practice did not eliminate normative item difficulty differences.
The Effect of the Position of an Item within a Test on the Item Difficulty Value.
ERIC Educational Resources Information Center
Rubin, Lois S.; Mott, David E. W.
An investigation of the effect on the difficulty value of an item due to position placement within a test was made. Using a 60-item operational test comprised of 5 subtests, 60 items were placed as experimental items on a number of spiralled test forms in three different positions (first, middle, last) within the subtest composed of like items.…
Item Response Theory Modeling of the Philadelphia Naming Test.
Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D
2015-06-01
In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating explanatory variables to item difficulty. This article describes the statistical model underlying the computer adaptive PNT presented in a companion article (Hula, Kellough, & Fergadiotis, 2015). Using archival data, we evaluated the fit of the PNT to 1- and 2-parameter logistic models and examined the precision of the resulting parameter estimates. We regressed the item difficulty estimates on three predictor variables: word length, age of acquisition, and contextual diversity. The 2-parameter logistic model demonstrated marginally better fit, but the fit of the 1-parameter logistic model was adequate. Precision was excellent for both person ability and item difficulty estimates. Word length, age of acquisition, and contextual diversity all independently contributed to variance in item difficulty. Item-response-theory methods can be productively used to analyze and quantify anomia severity in aphasia. Regression of item difficulty on lexical variables supported the validity of the PNT and interpretation of anomia severity scores in the context of current word-finding models.
Using the Nudge and Shove Methods to Adjust Item Difficulty Values.
Royal, Kenneth D
2015-01-01
In any examination, it is important that a sufficient mix of items with varying degrees of difficulty be present to produce desirable psychometric properties and increase instructors' ability to make appropriate and accurate inferences about what a student knows and/or can do. The purpose of this "teaching tip" is to demonstrate how examination items can be affected by the quality of distractors, and to present a simple method for adjusting items to meet difficulty specifications.
Applicability of the Newtonian gravity concept inventory to introductory college physics classes
NASA Astrophysics Data System (ADS)
Williamson, Kathryn; Prather, Edward E.; Willoughby, Shannon
2016-06-01
The study described here extends the applicability of the Newtonian Gravity Concept Inventory (NGCI) to college algebra-based physics classes, beyond the general education astronomy courses for which it was originally developed. The four conceptual domains probed by the NGCI (Directionality, Force Law, Independence of Other Forces, and Threshold) are well suited for investigating students' reasoning about gravity in both populations, making the NGCI a highly versatile instrument. Classical test theory statistical analysis with physics student responses pre-instruction (N = 1,392) and post-instruction (N = 929) from eight colleges and universities across the United States indicate that the NGCI is composed of items with appropriate difficulty and discrimination and is reliable for this population. Also, expert review and student interviews support the NGCI's validity for the physics population. Emergent similarities and differences in how physics students reason about gravity compared to astronomy students are discussed, as well as future directions for analyzing the instrument's item parameters across both populations.
Component Identification and Item Difficulty of Raven's Matrices Items.
ERIC Educational Resources Information Center
Green, Kathy E.; Kluever, Raymond C.
Item components that might contribute to the difficulty of items on the Raven Colored Progressive Matrices (CPM) and the Standard Progressive Matrices (SPM) were studied. Subjects providing responses to CPM items were 269 children aged 2 years 9 months to 11 years 8 months, most of whom were referred for testing as potentially gifted. A second…
Rasch Measurement and Item Banking: Theory and Practice.
ERIC Educational Resources Information Center
Nakamura, Yuji
The Rasch Model is an item response theory, one parameter model developed that states that the probability of a correct response on a test is a function of the difficulty of the item and the ability of the candidate. Item banking is useful for language testing. The Rasch Model provides estimates of item difficulties that are meaningful,…
Hong, Ickpyo; Lee, Mi Jung; Kim, Moon Young; Park, Hae Yean
2017-10-01
The aim of this study is to investigate the psychometrics of the 12 items of an instrument assessing activities of daily living (ADL) using an item response theory model. A total of 648 adults with physical disabilities and having difficulties in ADLs were retrieved from the 2014 Korean National Survey on People with Disabilities. The psychometric testing included factor analysis, internal consistency, precision, and differential item functioning (DIF) across categories including sex, older age, marital status, and physical impairment area. The sample had a mean age of 69.7 years old (SD = 13.7). The majority of the sample had lower extremity impairments (62.0%) and had at least 2.1 chronic conditions. The instrument demonstrated unidimensional construct and good internal consistency (Cronbach's alpha = 0.95). The instrument precisely estimated person measures within a wide range of theta values (-2.22 logits < θ < 0.27 logits) with a reliability of 0.9. Only the changing position item demonstrated misfit (χ 2 = 36.6, df = 17, p = 0.0038), and the dressing item demonstrated DIF on the impairment type (upper extremity/others, McFadden's Pseudo R 2 > 5.0%). Our findings indicate that the dressing item would need to be modified to improve its psychometrics. Overall, the ADL instrument demonstrates good psychometrics, and thus, it may be used as a standardized instrument for measuring disability in rehabilitation contexts. However, the findings are limited to adults with physical disabilities. Future studies should replicate psychometric testing for survey respondents with other disorders and for children.
ERIC Educational Resources Information Center
Forbey, Johnathan D.; Ben-Porath, Yossef S.; Arbisi, Paul A.
2012-01-01
The ability to screen quickly and thoroughly for psychological difficulties in existing and returning combat veterans who are seeking treatment for physical ailments would be of significant benefit. In the current study, item and time savings, as well as extratest correlations, associated with an audio-augmented version of the computerized…
When Listening Is Better Than Reading: Performance Gains on Cardiac Auscultation Test Questions.
Short, Kathleen; Bucak, S Deniz; Rosenthal, Francine; Raymond, Mark R
2018-05-01
In 2007, the United States Medical Licensing Examination embedded multimedia simulations of heart sounds into multiple-choice questions. This study investigated changes in item difficulty as determined by examinee performance over time. The data reflect outcomes obtained following initial use of multimedia items from 2007 through 2012, after which an interface change occurred. A total of 233,157 examinees responded to 1,306 cardiology test items over the six-year period; 138 items included multimedia simulations of heart sounds, while 1,168 text-based items without multimedia served as controls. The authors compared changes in difficulty of multimedia items over time with changes in difficulty of text-based cardiology items over time. Further, they compared changes in item difficulty for both groups of items between graduates of Liaison Committee on Medical Education (LCME)-accredited and non-LCME-accredited (i.e., international) medical schools. Examinee performance on cardiology test items with multimedia heart sounds improved by 12.4% over the six-year period, while performance on text-based cardiology items improved by approximately 1.4%. These results were similar for graduates of LCME-accredited and non-LCME-accredited medical schools. Examinees' ability to interpret auscultation findings in test items that include multimedia presentations increased from 2007 to 2012.
ERIC Educational Resources Information Center
Ali, Usama S.; Walker, Michael E.
2014-01-01
Two methods are currently in use at Educational Testing Service (ETS) for equating observed item difficulty statistics. The first method involves the linear equating of item statistics in an observed sample to reference statistics on the same items. The second method, or the item response curve (IRC) method, involves the summation of conditional…
ERIC Educational Resources Information Center
Vorstenbosch, Marc A. T. M.; Klaassen, Tim P. F. M.; Kooloos, Jan G. M.; Bolhuis, Sanneke M.; Laan, Roland F. J. M.
2013-01-01
Anatomists often use images in assessments and examinations. This study aims to investigate the influence of different types of images on item difficulty and item discrimination in written assessments. A total of 210 of 460 students volunteered for an extra assessment in a gross anatomy course. This assessment contained 39 test items grouped in…
Analyzing force concept inventory with item response theory
NASA Astrophysics Data System (ADS)
Wang, Jing; Bao, Lei
2010-10-01
Item response theory is a popular assessment method used in education. It rests on the assumption of a probability framework that relates students' innate ability and their performance on test questions. Item response theory transforms students' raw test scores into a scaled proficiency score, which can be used to compare results obtained with different test questions. The scaled score also addresses the issues of ceiling effects and guessing, which commonly exist in quantitative assessment. We used item response theory to analyze the force concept inventory (FCI). Our results show that item response theory can be useful for analyzing physics concept surveys such as the FCI and produces results about the individual questions and student performance that are beyond the capability of classical statistics. The theory yields detailed measurement parameters regarding the difficulty, discrimination features, and probability of correct guess for each of the FCI questions.
Outcome-based self-assessment on a team-teaching subject in the medical school
Cho, Sa Sun
2014-01-01
We attempted to investigate the reason why the students got a worse grade in gross anatomy and the way how we can improve upon the teaching method since there were gaps between teaching and learning under recently changed integration curriculum. General characteristics of students and exploratory factors to testify the validity were compared between year 2011 and 2012. Students were asked to complete a short survey with a Likert scale. The results were as follows: although the percentage of acceptable items was similar between professors, professor C preferred questions with adequate item discrimination and inappropriate item difficulty whereas professor Y preferred adequate item discrimination and appropriate item difficulty with statistical significance (P<0.01). The survey revealed that 26.5% of total students gave up the exam on gross anatomy of professor Y irrespective of years. These results suggested that students were affected by the corrected item difficulty rather than item discrimination in order to obtain academic achievement. Therefore, professors in a team-teaching subject should reach a consensus on an item difficulty with proper teaching methods. PMID:25548724
A Comparison of Three Test Formats to Assess Word Difficulty
ERIC Educational Resources Information Center
Culligan, Brent
2015-01-01
This study compared three common vocabulary test formats, the Yes/No test, the Vocabulary Knowledge Scale (VKS), and the Vocabulary Levels Test (VLT), as measures of vocabulary difficulty. Vocabulary difficulty was defined as the item difficulty estimated through Item Response Theory (IRT) analysis. Three tests were given to 165 Japanese students,…
Sim, Si-Mui; Rasiah, Raja Isaiah
2006-02-01
This paper reports the relationship between the difficulty level and the discrimination power of true/false-type multiple-choice questions (MCQs) in a multidisciplinary paper for the para-clinical year of an undergraduate medical programme. MCQ items in papers taken from Year II Parts A, B and C examinations for Sessions 2001/02, and Part B examinations for 2002/03 and 2003/04, were analysed to obtain their difficulty indices and discrimination indices. Each paper consisted of 250 true/false items (50 questions of 5 items each) on topics drawn from different disciplines. The questions were first constructed and vetted by the individual departments before being submitted to a central committee, where the final selection of the MCQs was made, based purely on the academic judgement of the committee. There was a wide distribution of item difficulty indices in all the MCQ papers analysed. Furthermore, the relationship between the difficulty index (P) and discrimination index (D) of the MCQ items in a paper was not linear, but more dome-shaped. Maximal discrimination (D = 51% to 71%) occurred with moderately easy/difficult items (P = 40% to 74%). On average, about 38% of the MCQ items in each paper were "very easy" (P > or =75%), while about 9% were "very difficult" (P <25%). About two-thirds of these very easy/difficult items had "very poor" or even negative discrimination (D < or =20%). MCQ items that demonstrate good discriminating potential tend to be moderately difficult items, and the moderately-to-very difficult items are more likely to show negative discrimination. There is a need to evaluate the effectiveness of our MCQ items.
Mamikonian-Zarpas, Ani; Laganá, Luciana
2016-01-01
Functional status is often defined by cumulative scores across indices of independence in performing basic and instrumental activities of daily living (ADL/IADL), but little is known about the unique relationship of each daily activity item with the fall outcome. The purpose of this retrospective study was to examine the level of relative risk for a future fall associated with difficulty with performing various tasks of normal daily functioning among older adults who had fallen at least once in the past 12 months. The sample was comprised of community-dwelling individuals 70 years and older from the 1984–1990 Longitudinal Study of Aging by Kovar, Fitti, and Chyba (1992). Risk analysis was performed on individual items quantifying 6 ADLs and 7 IADLs, as well as 10 items related to mobility limitations. Within a subsample of 1,675 older adults with a history of at least one fall within the past year, the responses of individuals who reported multiple falls were compared to the responses of participants who had a single fall and reported 1) difficulty with walking and/or balance (FRAIL group, n = 413) vs. 2) no difficulty with walking or dizziness (NDW+ND group, n = 415). The items that had the strongest relationships and highest risk ratios for the FRAIL group (which had the highest probabilities for a future fall) included difficulty with: eating (73%); managing money (70%); biting or chewing food (66%); walking a quarter of a mile (65%); using fingers to grasp (65%); and dressing without help (65%). For the NDW+ND group, the most noteworthy items included difficulty with: bathing or showering (79%); managing money (77%); shopping for personal items (75%); walking up 10 steps without rest (72%); difficulty with walking a quarter of a mile (72%); and stooping/crouching/kneeling (70%). These findings suggest that individual items quantifying specific ADLs and IADLs have substantive relationships with the fall outcome among older adults who have difficulty with walking and balance, as well as among older individuals without dizziness or difficulty with walking. Furthermore, the examination of the relationships between items that are related to more challenging activities and the fall outcome revealed that higher functioning older adults who reported difficulty with the 6 items that yielded the highest risk ratios may also be at elevated risk for a fall. PMID:27200366
Jo, Min-Woo; Lee, Hyeon-Jeong; Kim, Soo Young; Kim, Seon-Ha; Chang, Hyejung; Ahn, Jeonghoon; Ock, Minsu
2017-01-01
Few attempts have been made to develop a generic health-related quality of life (HRQoL) instrument and to examine its validity and reliability in Korea. We aimed to do this in our present study. After a literature review of existing generic HRQoL instruments, a focus group discussion, in-depth interviews, and expert consultations, we selected 30 tentative items for a new HRQoL measure. These items were evaluated by assessing their ceiling effects, difficulty, and redundancy in the first survey. To validate the HRQoL instrument that was developed, known-groups validity and convergent/discriminant validity were evaluated and its test-retest reliability was examined in the second survey. Of the 30 items originally assessed for the HRQoL instrument, four were excluded due to high ceiling effects and six were removed due to redundancy. We ultimately developed a HRQoL instrument with a reduced number of 20 items, known as the Health-related Quality of Life Instrument with 20 items (HINT-20), incorporating physical, mental, social, and positive health dimensions. The results of the HINT-20 for known-groups validity were poorer in women, the elderly, and those with a low income. For convergent/discriminant validity, the correlation coefficients of items (except vitality) in the physical health dimension with the physical component summary of the Short Form 36 version 2 (SF-36v2) were generally higher than the correlations of those items with the mental component summary of the SF-36v2, and vice versa. Regarding test-retest reliability, the intraclass correlation coefficient of the total HINT-20 score was 0.813 (p<0.001). A novel generic HRQoL instrument, the HINT-20, was developed for the Korean general population and showed acceptable validity and reliability.
ERIC Educational Resources Information Center
Maries, Alexandru; Singh, Chandralekha
2016-01-01
The Force Concept Inventory (FCI) has been widely used to assess student understanding of introductory mechanics concepts by a variety of educators and physics education researchers. One reason for this extensive use is that many of the items on the FCI have strong distractor choices which correspond to students' alternate conceptions in…
Webster, Joseph B
2009-03-01
To determine the performance and change over time when incorporating questions in the core competency domains of practice-based learning and improvement (PBLI), systems-based practice (SBP), and professionalism (PROF) into the national PM&R Self-Assessment Examination for Residents (SAER). Prospective, longitudinal analysis. The national Self-Assessment Examination for Residents (SAER) in Physical Medicine and Rehabilitation, which is administered annually. Approximately 1100 PM&R residents who take the examination annually. Inclusion of progressively more challenging questions in the core competency domains of PBLI, SBP, and PROF. Individual test item level of difficulty (P value) and discrimination (point biserial index). Compared with the overall test, questions in the subtopic areas of PBLI, SBP, and PROF were relatively easier and less discriminating (correlation of resident performance on these domains compared with that on the total test). These differences became smaller during the 3-year time period. The difficulty level of the questions in each of the subtopic domains was raised during the 3 year period to a level close to the overall exam. Discrimination of the test items improved or remained stable. This study demonstrates that, with careful item writing and review, multiple-choice items in the PBLI, SBP, and PROF domains can be successfully incorporated into an annual, national self-assessment examination for residents. The addition of these questions had value in assessing competency while not compromising the overall validity and reliability of the exam. It is yet to be determined if resident performance on these questions corresponds to performance on other measures of competency in the areas of PBLI, SBP, and PROF.
Koh, Bongyeun; Hong, Sunggi; Kim, Soon-Sim; Hyun, Jin-Sook; Baek, Milye; Moon, Jundong; Kwon, Hayran; Kim, Gyoungyong; Min, Seonggi; Kang, Gu-Hyun
2016-01-01
The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.
Factors Affecting Item Difficulty in English Listening Comprehension Tests
ERIC Educational Resources Information Center
Sung, Pei-Ju; Lin, Su-Wei; Hung, Pi-Hsia
2015-01-01
Task difficulty is a critical issue affecting test developers. Controlling or balancing the item difficulty of an assessment improves its validity and discrimination. Test developers construct tests from the cognitive perspective, by making the test constructing process more scientific and efficient; thus, the scores obtained more precisely…
An Investigation of the Impact of Guessing on Coefficient α and Reliability
2014-01-01
Guessing is known to influence the test reliability of multiple-choice tests. Although there are many studies that have examined the impact of guessing, they used rather restrictive assumptions (e.g., parallel test assumptions, homogeneous inter-item correlations, homogeneous item difficulty, and homogeneous guessing levels across items) to evaluate the relation between guessing and test reliability. Based on the item response theory (IRT) framework, this study investigated the extent of the impact of guessing on reliability under more realistic conditions where item difficulty, item discrimination, and guessing levels actually vary across items with three different test lengths (TL). By accommodating multiple item characteristics simultaneously, this study also focused on examining interaction effects between guessing and other variables entered in the simulation to be more realistic. The simulation of the more realistic conditions and calculations of reliability and classical test theory (CTT) item statistics were facilitated by expressing CTT item statistics, coefficient α, and reliability in terms of IRT model parameters. In addition to the general negative impact of guessing on reliability, results showed interaction effects between TL and guessing and between guessing and test difficulty.
A Study of Inference in Standardized Reading Test Items and Its Relationship to Difficulty.
ERIC Educational Resources Information Center
Marzano, Robert J.
To study the relationship between inferences made on standardized reading tests and item difficulty, 50 items on the reading comprehension section of the Metropolitan Achievement Test were analyzed independently in this study by two raters using four general categories of inferences: (1) reference inferences, (2) between proposition inferences,…
The Definition of Difficulty and Discrimination for Multidimensional Item Response Theory Models.
ERIC Educational Resources Information Center
Reckase, Mark D.; McKinley, Robert L.
A study was undertaken to develop guidelines for the interpretation of the parameters of three multidimensional item response theory models and to determine the relationship between the parameters and traditional concepts of item difficulty and discrimination. The three models considered were multidimensional extensions of the one-, two-, and…
Iwashita, Yukio; Hibi, Taizo; Ohyama, Tetsuji; Honda, Goro; Yoshida, Masahiro; Miura, Fumihiko; Takada, Tadahiro; Han, Ho-Seong; Hwang, Tsann-Long; Shinya, Satoshi; Suzuki, Kenji; Umezawa, Akiko; Yoon, Yoo-Seok; Choi, In-Seok; Huang, Wayne Shih-Wei; Chen, Kuo-Hsin; Watanabe, Manabu; Abe, Yuta; Misawa, Takeyuki; Nagakawa, Yuichi; Yoon, Dong-Sup; Jang, Jin-Young; Yu, Hee Chul; Ahn, Keun Soo; Kim, Song Cheol; Song, In Sang; Kim, Ji Hoon; Yun, Sung Su; Choi, Seong Ho; Jan, Yi-Yin; Shan, Yan-Shen; Ker, Chen-Guo; Chan, De-Chuan; Wu, Cheng-Chung; Lee, King-Teh; Toyota, Naoyuki; Higuchi, Ryota; Nakamura, Yoshiharu; Mizuguchi, Yoshiaki; Takeda, Yutaka; Ito, Masahiro; Norimizu, Shinji; Yamada, Shigetoshi; Matsumura, Naoki; Shindoh, Junichi; Sunagawa, Hiroki; Gocho, Takeshi; Hasegawa, Hiroshi; Rikiyama, Toshiki; Sata, Naohiro; Kano, Nobuyasu; Kitano, Seigo; Tokumura, Hiromi; Yamashita, Yuichi; Watanabe, Goro; Nakagawa, Kunitoshi; Kimura, Taizo; Yamakawa, Tatsuo; Wakabayashi, Go; Mori, Rintaro; Endo, Itaru; Miyazaki, Masaru; Yamamoto, Masakazu
2017-04-01
We previously identified 25 intraoperative findings during laparoscopic cholecystectomy (LC) as potential indicators of surgical difficulty per nominal group technique. This study aimed to build a consensus among expert LC surgeons on the impact of each item on surgical difficulty. Surgeons from Japan, Korea, and Taiwan (n = 554) participated in a Delphi process and graded the 25 items on a seven-stage scale (range, 0-6). Consensus was defined as (1) the interquartile range (IQR) of overall responses ≤2 and (2) ≥66% of the responses concentrated within a median ± 1 after stratification by workplace and LC experience level. Response rates for the first and the second-round Delphi were 92.6% and 90.3%, respectively. Final consensus was reached for all the 25 items. 'Diffuse scarring in the Calot's triangle area' in the 'Factors related to inflammation of the gallbladder' category had the strongest impact on surgical difficulty (median, 5; IQR, 1). Surgeons agreed that the surgical difficulty increases as more fibrotic change and scarring develop. The median point for each item was set as the difficulty score. A Delphi consensus was reached among expert LC surgeons on the impact of intraoperative findings on surgical difficulty. © 2017 Japanese Society of Hepato-Biliary-Pancreatic Surgery.
Kim, Stella H; Strutt, Adriana M; Olabarrieta-Landa, Laiene; Lequerica, Anthony H; Rivera, Diego; De Los Reyes Aragon, Carlos Jose; Utria, Oscar; Arango-Lasprilla, Juan Carlos
2018-02-23
The Boston Naming Test (BNT) is a widely used measure of confrontation naming ability that has been criticized for its questionable construct validity for non-English speakers. This study investigated item difficulty and construct validity of the Spanish version of the BNT to assess cultural and linguistic impact on performance. Subjects were 1298 healthy Spanish speaking adults from Colombia. They were administered the 60- and 15-item Spanish version of the BNT. A Rasch analysis was computed to assess dimensionality, item hierarchy, targeting, reliability, and item fit. Both versions of the BNT satisfied requirements for unidimensionality. Although internal consistency was excellent for the 60-item BNT, order of difficulty did not increase consistently with item number and there were a number of items that did not fit the Rasch model. For the 15-item BNT, a total of 5 items changed position on the item hierarchy with 7 poor fitting items. Internal consistency was acceptable. Construct validity of the BNT remains a concern when it is administered to non-English speaking populations. Similar to previous findings, the order of item presentation did not correspond with increasing item difficulty, and both versions were inadequate at assessing high naming ability.
2016-01-01
Purpose: The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. Methods: The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. Results: In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). Conclusion: In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination. PMID:26883810
What Does a Verbal Test Measure? A New Approach to Understanding Sources of Item Difficulty.
ERIC Educational Resources Information Center
Berk, Eric J. Vanden; Lohman, David F.; Cassata, Jennifer Coyne
Assessing the construct relevance of mental test results continues to present many challenges, and it has proven to be particularly difficult to assess the construct relevance of verbal items. This study was conducted to gain a better understanding of the conceptual sources of verbal item difficulty using a unique approach that integrates…
On Maximizing Item Information and Matching Difficulty with Ability.
ERIC Educational Resources Information Center
Bickel, Peter; Buyske, Steven; Chang, Huahua; Ying, Zhiliang
2001-01-01
Examined the assumption that matching difficulty levels of test items with an examinee's ability makes a test more efficient and challenged this assumption through a class of one-parameter item response theory models. Found the validity of the fundamental assumption to be closely related to the van Zwet tail ordering of symmetric distributions (W.…
Detecting a Gender-Related Differential Item Functioning Using Transformed Item Difficulty
ERIC Educational Resources Information Center
Abedalaziz, Nabeel; Leng, Chin Hai; Alahmadi, Ahlam
2014-01-01
The purpose of the study was to examine gender differences in performance on multiple-choice mathematical ability test, administered within the context of high school graduation test that was designed to match eleventh grade curriculum. The transformed item difficulty (TID) was used to detect a gender related DIF. A random sample of 1400 eleventh…
ERIC Educational Resources Information Center
Solano-Flores, Guillermo; Wang, Chao; Shade, Chelsey
2016-01-01
We examined multimodality (the representation of information in multiple semiotic modes) in the context of international test comparisons. Using Program of International Student Assessment (PISA)-2009 data, we examined the correlation of the difficulty of science items and the complexity of their illustrations. We observed statistically…
Kashiwagi, Mitsuru; Suzuki, Shuhei
2009-09-01
Many children with developmental disorders are known to have motor impairment such as clumsiness and poor physical ability;however, the objective evaluation of such difficulties is not easy in routine clinical practice. In this study, we aimed to establish a simple method for evaluating motor difficulty of childhood. This method employs a scored interview and examination for detecting soft neurological signs (SNSs). After a preliminary survey with 22 normal children, we set the items and the cutoffs for the interview and SNSs. The interview consisted of questions pertaining to 12 items related to a child's motor skills in his/her past and current life, such as skipping, jumping a rope, ball sports, origami, and using chopsticks. The SNS evaluation included 5 tests, namely, standing on one leg with eyes closed, diadochokinesia, associated movements during diadochokinesia, finger opposition test, and laterally fixed gaze. We applied this method to 43 children, including 25 cases of developmental disorders. Children showing significantly high scores in both the interview and SNS were assigned to the "with motor difficulty" group, while those with low scores in both the tests were assigned to the "without motor difficulty" group. The remaining children were assigned to the "with suspicious motor difficulty" group. More than 90% of the children in the "with motor difficulty" group had high impairment scores in Movement Assessment Battery for Children (M-ABC), a standardized motor test, whereas 82% of the children in the "without motor difficulty" group revealed no motor impairment. Thus, we conclude that our simple method and criteria would be useful for the evaluation of motor difficulty of childhood. Further, we have discussed the diagnostic process for developmental coordination disorder using our evaluation method.
ERIC Educational Resources Information Center
Kramer, Gene A.; Smith, Richard M.
2001-01-01
Examined the role that gender differences play in the determination of the components influencing the difficulty of spatial ability items. Results for 2,245 examinees taking a spatial ability test that is part of the Dental School Admission Battery show that component difficulties show little variation across gender. (SLD)
NASA Astrophysics Data System (ADS)
Aubrecht, Gordon J.; Aubrecht, Judith D.
1983-07-01
True-false or multiple-choice tests can be useful instruments for evaluating student progress. We examine strategies for planning objective tests which serve to test the material covered in science (physics) courses. We also examine strategies for writing questions for tests within a test blueprint. The statistical basis for judging the quality of test items are discussed. Reliability, difficulty, and discrimination indices are defined and examples presented. Our recommendation are rather easily put into practice.
Watanabe, Yusuke; Madani, Amin; Ito, Yoichi M; Bilgic, Elif; McKendy, Katherine M; Feldman, Liane S; Fried, Gerald M; Vassiliou, Melina C
2017-02-01
The extent to which each item assessed using the Global Operative Assessment of Laparoscopic Skills (GOALS) contributes to the total score remains unknown. The purpose of this study was to evaluate the level of difficulty and discriminative ability of each of the 5 GOALS items using item response theory (IRT). A total of 396 GOALS assessments for a variety of laparoscopic procedures over a 12-year time period were included. Threshold parameters of item difficulty and discrimination power were estimated for each item using IRT. The higher slope parameters seen with "bimanual dexterity" and "efficiency" are indicative of greater discriminative ability than "depth perception", "tissue handling", and "autonomy". IRT psychometric analysis indicates that the 5 GOALS items do not demonstrate uniform difficulty and discriminative power, suggesting that they should not be scored equally. "Bimanual dexterity" and "efficiency" seem to have stronger discrimination. Weighted scores based on these findings could improve the accuracy of assessing individual laparoscopic skills. Copyright © 2016 Elsevier Inc. All rights reserved.
2017-01-01
Objectives Few attempts have been made to develop a generic health-related quality of life (HRQoL) instrument and to examine its validity and reliability in Korea. We aimed to do this in our present study. Methods After a literature review of existing generic HRQoL instruments, a focus group discussion, in-depth interviews, and expert consultations, we selected 30 tentative items for a new HRQoL measure. These items were evaluated by assessing their ceiling effects, difficulty, and redundancy in the first survey. To validate the HRQoL instrument that was developed, known-groups validity and convergent/discriminant validity were evaluated and its test-retest reliability was examined in the second survey. Results Of the 30 items originally assessed for the HRQoL instrument, four were excluded due to high ceiling effects and six were removed due to redundancy. We ultimately developed a HRQoL instrument with a reduced number of 20 items, known as the Health-related Quality of Life Instrument with 20 items (HINT-20), incorporating physical, mental, social, and positive health dimensions. The results of the HINT-20 for known-groups validity were poorer in women, the elderly, and those with a low income. For convergent/discriminant validity, the correlation coefficients of items (except vitality) in the physical health dimension with the physical component summary of the Short Form 36 version 2 (SF-36v2) were generally higher than the correlations of those items with the mental component summary of the SF-36v2, and vice versa. Regarding test-retest reliability, the intraclass correlation coefficient of the total HINT-20 score was 0.813 (p<0.001). Conclusions A novel generic HRQoL instrument, the HINT-20, was developed for the Korean general population and showed acceptable validity and reliability. PMID:28173686
ERIC Educational Resources Information Center
Retnawati, Heri; Kartowagiran, Badrun; Arlinwibowo, Janu; Sulistyaningsih, Eny
2017-01-01
The quality of national examination items plays an enormous role in identifying students' competencies mastery and their difficulties. This study aims to identify the difficult items in the Junior High School Mathematics National Examination, to find the factors that cause students' difficulty and to reveal the strategies that the teachers and the…
ERIC Educational Resources Information Center
Dodonova, Yulia A.; Dodonov, Yury S.
2013-01-01
Using more complex items than those commonly employed within the information-processing approach, but still easier than those used in intelligence tests, this study analyzed how the association between processing speed and accuracy level changes as the difficulty of the items increases. The study involved measuring cognitive ability using Raven's…
Predicting Item Difficulty in a Reading Comprehension Test with an Artificial Neural Network.
ERIC Educational Resources Information Center
Perkins, Kyle; And Others
This paper reports the results of using a three-layer backpropagation artificial neural network to predict item difficulty in a reading comprehension test. Two network structures were developed, one with and one without a sigmoid function in the output processing unit. The data set, which consisted of a table of coded test items and corresponding…
NASA Astrophysics Data System (ADS)
Istiyono, Edi
2017-08-01
The purpose of this research is to describe the results of higher order thinking skills in physics (PhysHOTS) measurement including: (1) percentage of PhysHOTS level and (2) percentage of the domination of response in the category of students in each analyzing, evaluating, and creating skill. There were 404 10th grade students in Bantul District as the respondents of this research. The instrument used for measurement was PhysReMChoTHOTS. It was divided into two sets consisting of 44 items and including 8 anchor items stated valid by a Physicist, Physics Education Expert, and Physics Education Measurement Expert. The instrument was fit to PCM. The reliability coefficient of this test is 0.71, while the difficulty index of the items ranges from -0.61 to 0.51. The results of the measurement show that: (1) The percentage of each category of PhysHOTS for the 10th grade students in Bantul District for the very low, low, medium, high, and very high category is 4.75 %, 40.30 %, 33.45 %, 19.50 %, and 2.00 %, respectively; and (2) The order in analyzing skills, starts from the weakest, is attributing, differentiating and organizing. The order in evaluating skills, starts from the weakest, is critiquing and checking. Meanwhile, the order in creating skills, starts from the weakest, is producing, planning, and generating.
Detecting unexpected variables in the MMPI 2 Social Introversion scale.
Chang, C H; Wright, B D
2001-01-01
The standard scoring structure of the revised Minnesota Multiphasic Personality Inventory (MMPI-2) Social Introversion (Si) scale was reexamined with Rasch Measurement. The 69-item Si scale split into two distinct dimensions when their standardized residuals were factor analyzed. Items keyed "true" to Si defined one dimension and items keyed "false" defined another. Relationships between Lexile values (an index of reading difficulty and comprehension) and item difficulties were also explored. The article shows how to use Rasch Measurement to understand and improve personality assessment.
Assessing student understanding of measurement and uncertainty
NASA Astrophysics Data System (ADS)
Abbott, David Scot
A test to assess student understanding of measurement and uncertainty has been developed and administered to more than 500 students at two large research universities. The aim is two-fold: (1) to assess what students learn in the first semester of introductory physics labs and (2) to uncover patterns in student reasoning and practice. The forty minute, eleven item test focuses on direct measurement and student attitudes toward multiple measurements. After one revision cycle using think-aloud interviews, the test was administered to students to three groups: students enrolled in traditional laboratory lab sections of first semester physics at North Carolina State University (NCSU), students in an experimental (SCALE-UP) section of first semester physics at NCSU, and students in first semester physics at the University of North Carolina at Chapel Hill. The results were analyzed using a mixture of qualitative and quantitative methods. In the traditional NCSU labs, where students receive no instruction in uncertainty and measurement, students show no improvement on any of the areas examined by the test. In SCALE-UP and at UNC, students show statistically significant gains in most areas of the test. Gains on specific test items in SCALE-UP and at UNC correspond to areas of instructional emphasis. Test items were grouped into four main aspects of performance: "point/set" reasoning, meaning of spread, ruler reading and "stacking." Student performance on the pretest was examined to identify links between these aspects. Items within each aspect are correlated to one another, sometimes quite strongly, but items from different aspects rarely show statistically significant correlation. Taken together, these results suggest that student difficulties may not be linked to a single underlying cause. The study shows that current instruction techniques improve student understanding, but that many students exit the introductory physics lab course without appreciation or coherent understanding for the concept of measurement uncertainty.
An Alternate Definition of the ETS Delta Scale of Item Difficulty. Program Statistics Research.
ERIC Educational Resources Information Center
Holland, Paul W.; Thayer, Dorothy T.
An alternative definition has been developed of the delta scale of item difficulty used at Educational Testing Service. The traditional delta scale uses an inverse normal transformation based on normal ogive models developed years ago. However, no use is made of this fact in typical uses of item deltas. It is simply one way to make the probability…
A Methodological Study of Order Effects in Reporting Relational Aggression Experiences.
Serico, Jennifer M; NeMoyer, Amanda; Goldstein, Naomi E S; Houck, Mark; Leff, Stephen S
2018-03-01
Unlike the overt nature of physical aggression, which lends itself to simpler and more direct methods of investigation, the often-masked nature of relational aggression has led to difficulties and debate regarding the most effective tools of study. Given concerns with the accuracy of third-party relational aggression reports, especially as individuals age, self-report measures may be particularly useful when assessing experiences with relational aggression. However, it is important to recognize validity concerns-in particular, the potential effects of item order presentation-associated with self-report of relational aggression perpetration and victimization. To investigate this issue, surveys were administered and completed by 179 young adults randomly assigned to one of four survey conditions reflecting manipulation of item order. Survey conditions included presentation of (a) perpetration items only, (b) victimization items only, (c) perpetration items followed by victimization items, and (d) victimization items followed by perpetration items. Results revealed that participants reported perpetrating relational aggression significantly more often when asked only about perpetration or when asked about perpetration before victimization, compared with participants who were asked about victimization before perpetration. Item order manipulation did not result in significant differences in self-reported victimization experiences. Results of this study indicate a need for greater consideration of item order when conducting research using self-report data and the importance of additional investigation into which form of item presentation elicits the most accurate self-report information.
The role of difficulty and gender in numbers, algebra, geometry and mathematics achievement
NASA Astrophysics Data System (ADS)
Rabab'h, Belal Sadiq Hamed; Veloo, Arsaythamby; Perumal, Selvan
2015-05-01
This study aims to identify the role of difficulty and gender in numbers, algebra, geometry and mathematics achievement among secondary schools students in Jordan. The respondent of the study were 337 students from eight public secondary school in Alkoura district by using stratified random sampling. The study comprised of 179 (53%) males and 158 (47%) females students. The mathematics test comprises of 30 items which has eight items for numbers, 14 items for algebra and eight items for geometry. Based on difficulties among male and female students, the findings showed that item 4 (fractions - 0.34) was most difficult for male students and item 6 (square roots - 0.39) for females in numbers. For the algebra, item 11 (inequality - 0.23) was most difficult for male students and item 6 (algebraic expressions - 0.35) for female students. In geometry, item 3 (reflection - 0.34) was most difficult for male students and item 8 (volume - 0.33) for female students. Based on gender differences, female students showed higher achievement in numbers and algebra compare to male students. On the other hand, there was no differences between male and female students achievement in geometry test. This study suggest that teachers need to give more attention on numbers and algebra when teaching mathematics.
Keller, Johannes
2007-06-01
Stereotype threat research revealed that negative stereotypes can disrupt the performance of persons targeted by such stereotypes. This paper contributes to stereotype threat research by providing evidence that domain identification and the difficulty level of test items moderate stereotype threat effects on female students' maths performance. The study was designed to test theoretical ideas derived from stereotype threat theory and assumptions outlined in the Yerkes-Dodson law proposing a nonlinear relationship between arousal, task difficulty and performance. Participants were 108 high school students attending secondary schools. Participants worked on a test comprising maths problems of different difficulty levels. Half of the participants learned that the test had been shown to produce gender differences (stereotype threat). The other half learned that the test had been shown not to produce gender differences (no threat). The degree to which participants identify with the domain of maths was included as a quasi-experimental factor. Maths-identified female students showed performance decrements under conditions of stereotype threat. Moreover, the stereotype threat manipulation had different effects on low and high domain identifiers' performance depending on test item difficulty. On difficult items, low identifiers showed higher performance under threat (vs. no threat) whereas the reverse was true in high identifiers. This interaction effect did not emerge on easy items. Domain identification and test item difficulty are two important factors that need to be considered in the attempt to understand the impact of stereotype threat on performance.
Ali, Amira Mohammed; Ahmed, Anwar; Sharaf, Amira; Kawakami, Norito; Abdeldayem, Samia M; Green, Joseph
2017-12-01
This study aimed to examine the validity of the Arabic version of the Depression Anxiety Stress Scale-21 (DASS-21) in 149 illicit drug users. We calculated α coefficient, inter-item and item-total correlations, coefficients of reproducibility and scalability (CR and CS), item difficulty and discrimination indices. The DASS-21 had an acceptable reliability; but values of the CR and the CS were less than acceptable. Items varied in difficulty and discrimination; some items are candidates for elimination. The DASS-21 is a probabilistic and not a deterministic measure of distress; it has problematic items and needs further investigations. Copyright © 2017 Elsevier B.V. All rights reserved.
Do item-writing flaws reduce examinations psychometric quality?
Pais, João; Silva, Artur; Guimarães, Bruno; Povo, Ana; Coelho, Elisabete; Silva-Pereira, Fernanda; Lourinho, Isabel; Ferreira, Maria Amélia; Severo, Milton
2016-08-11
The psychometric characteristics of multiple-choice questions (MCQ) changed when taking into account their anatomical sites and the presence of item-writing flaws (IWF). The aim is to understand the impact of the anatomical sites and the presence of IWF in the psychometric qualities of the MCQ. 800 Clinical Anatomy MCQ from eight examinations were classified as standard or flawed items and according to one of the eight anatomical sites. An item was classified as flawed if it violated at least one of the principles of item writing. The difficulty and discrimination indices of each item were obtained. 55.8 % of the MCQ were flawed items. The anatomical site of the items explained 6.2 and 3.2 % of the difficulty and discrimination parameters and the IWF explained 2.8 and 0.8 %, respectively. The impact of the IWF was heterogeneous, the Writing the Stem and Writing the Choices categories had a negative impact (higher difficulty and lower discrimination) while the other categories did not have any impact. The anatomical site effect was higher than IWF effect in the psychometric characteristics of the examination. When constructing MCQ, the focus should be in the topic/area of the items and only after in the presence of IWF.
Selecting Items for Criterion-Referenced Tests.
ERIC Educational Resources Information Center
Mellenbergh, Gideon J.; van der Linden, Wim J.
1982-01-01
Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)
Item Difficulty Modeling of Paragraph Comprehension Items
ERIC Educational Resources Information Center
Gorin, Joanna S.; Embretson, Susan E.
2006-01-01
Recent assessment research joining cognitive psychology and psychometric theory has introduced a new technology, item generation. In algorithmic item generation, items are systematically created based on specific combinations of features that underlie the processing required to correctly solve a problem. Reading comprehension items have been more…
Eton, David T.; Yost, Kathleen J.; Lai, Jin-shei; Ridgeway, Jennifer L.; Egginton, Jason S.; Rosedahl, Jordan K.; Linzer, Mark; Boehm, Deborah H.; Thakur, Azra; Poplau, Sara; Odell, Laura; Montori, Victor M.; May, Carl R.; Anderson, Roger T.
2017-01-01
Purpose The purpose of this study was to develop and validate a new comprehensive patient-reported measure of treatment burden – the Patient Experience with Treatment and Self-Management (PETS). Methods A conceptual framework was used to derive the PETS with items reviewed and cognitively tested with patients. A survey battery, including a pilot version of the PETS, was mailed to 838 multi-morbid patients from two healthcare institutions for validation. Results A total of 332 multi-morbid patients returned completed surveys. Diagnostics supported deletion and consolidation of some items and domains. Confirmatory factor analysis supported a domain model for scaling comprised of 9 factors: medical information, medications, medical appointments, monitoring health, interpersonal challenges, medical/healthcare expenses, difficulty with healthcare services, role/social activity limitations, and physical/mental exhaustion. Scales showed good internal consistency (alpha range: 0.79 – 0.95). Higher PETS scores, indicative of greater treatment burden, were correlated with more distress, less satisfaction with medications, lower self-efficacy, worse physical and mental health, and lower convenience of healthcare (Ps<.001). Patients with lower health literacy, less adherence to medications, and more financial difficulties reported higher PETS scores (Ps<.01). Conclusion A comprehensive patient-reported measure of treatment burden can help to better characterize the impact of treatment and self-management burden on patient well-being and guide care toward minimally disruptive medicine. PMID:27566732
Eton, David T; Yost, Kathleen J; Lai, Jin-Shei; Ridgeway, Jennifer L; Egginton, Jason S; Rosedahl, Jordan K; Linzer, Mark; Boehm, Deborah H; Thakur, Azra; Poplau, Sara; Odell, Laura; Montori, Victor M; May, Carl R; Anderson, Roger T
2017-02-01
The purpose of this study was to develop and validate a new comprehensive patient-reported measure of treatment burden-the Patient Experience with Treatment and Self-management (PETS). A conceptual framework was used to derive the PETS with items reviewed and cognitively tested with patients. A survey battery, including a pilot version of the PETS, was mailed to 838 multi-morbid patients from two healthcare institutions for validation. A total of 332 multi-morbid patients returned completed surveys. Diagnostics supported deletion and consolidation of some items and domains. Confirmatory factor analysis supported a domain model for scaling comprised of 9 factors: medical information, medications, medical appointments, monitoring health, interpersonal challenges, medical/healthcare expenses, difficulty with healthcare services, role/social activity limitations, and physical/mental exhaustion. Scales showed good internal consistency (α range 0.79-0.95). Higher PETS scores, indicative of greater treatment burden, were correlated with more distress, less satisfaction with medications, lower self-efficacy, worse physical and mental health, and lower convenience of healthcare (Ps < 0.001). Patients with lower health literacy, less adherence to medications, and more financial difficulties reported higher PETS scores (Ps < 0.01). A comprehensive patient-reported measure of treatment burden can help to better characterize the impact of treatment and self-management burden on patient well-being and guide care toward minimally disruptive medicine.
Haverman, Lotte; Grootenhuis, Martha A; Raat, Hein; van Rossum, Marion A J; van Dulmen-den Broeder, Eline; Hoppenbrouwers, Karel; Correia, Helena; Cella, David; Roorda, Leo D; Terwee, Caroline B
2016-03-01
The Patient-Reported Outcomes Measurement Information System (PROMIS(®)) is a new, state-of-the-art assessment system for measuring patient-reported health and well-being of adults and children. It has the potential to be more valid, reliable, and responsive than existing PROMs. The items banks are designed to be self-reported and completed by children aged 8-18 years. The PROMIS items can be administered in short forms or through computerized adaptive testing. This paper describes the translation and cultural adaption of nine PROMIS item banks (151 items) for children in Dutch-Flemish. The translation was performed by FACITtrans using standardized PROMIS methodology and approved by the PROMIS Statistical Center. The translation included four forward translations, two back-translations, three independent reviews (at least two Dutch, one Flemish), and pretesting in 24 children from the Netherlands and Flanders. For some items, it was necessary to have separate translations for Dutch and Flemish: physical function-mobility (three items), anger (one item), pain interference (two items), and asthma impact (one item). Challenges faced in the translation process included scarcity or overabundance of possible translations, unclear item descriptions, constructs broader/smaller in the target language, difficulties in rank ordering items, differences in unit of measurement, irrelevant items, or differences in performance of activities. By addressing these challenges, acceptable translations were obtained for all items. The Dutch-Flemish PROMIS items are linguistically equivalent to the original USA version. Short forms are now available for use, and entire item banks are ready for cross-cultural validation in the Netherlands and Flanders.
Item difficulty and item validity for the Children's Group Embedded Figures Test.
Rusch, R R; Trigg, C L; Brogan, R; Petriquin, S
1994-02-01
The validity and reliability of the Children's Group Embedded Figures Test was reported for students in Grade 2 by Cromack and Stone in 1980; however, a search of the literature indicates no evidence for internal consistency or item analysis. Hence the purpose of this study was to examine the item difficulty and item validity of the test with children in Grades 1 and 2. Confusion in the literature over development and use of this test was seemingly resolved through analysis of these descriptions and through an interview with the test developer. One early-appearing item was unreasonably difficult. Two or three other items were quite difficult and made little contribution to the total score. Caution is recommended, however, in any reordering or elimination of items based on these findings, given the limited number of subjects (n = 84).
North American Veterinary Licensing Examination pacing study.
Subhiyah, Raja G; Boyce, John R
2010-01-01
The National Board of Veterinary Medical Examiners was interested in the possible effects of word count on the outcomes of the North American Veterinary Licensing Examination. In this study, the authors investigated the effects of increasing word count on the pacing of examinees during each section of the examination and on the performance of examinees on the items. Specifically, the authors analyzed the effect of item word count on the average time spent on each item within a section of the examination, the average number of items omitted at the end of a section, and the average difficulty of items as a function of presentation order. The average word count per item increased from 2001 to 2008. As expected, there was a relationship between word count and time spent on the item. No significant relationship was found between word count and item difficulty, and an analysis of omitted items and pacing patterns showed no indication of overall pacing problems.
Park, Jong Cook; Kim, Kwang Sig
2012-03-01
The reliability of test is determined by each items' characteristics. Item analysis is achieved by classical test theory and item response theory. The purpose of the study was to compare the discrimination indices with item response theory using the Rasch model. Thirty-one 4th-year medical school students participated in the clinical course written examination, which included 22 A-type items and 3 R-type items. Point biserial correlation coefficient (C(pbs)) was compared to method of extreme group (D), biserial correlation coefficient (C(bs)), item-total correlation coefficient (C(it)), and corrected item-total correlation coeffcient (C(cit)). Rasch model was applied to estimate item difficulty and examinee's ability and to calculate item fit statistics using joint maximum likelihood. Explanatory power (r2) of Cpbs is decreased in the following order: C(cit) (1.00), C(it) (0.99), C(bs) (0.94), and D (0.45). The ranges of difficulty logit and standard error and ability logit and standard error were -0.82 to 0.80 and 0.37 to 0.76, -3.69 to 3.19 and 0.45 to 1.03, respectively. Item 9 and 23 have outfit > or =1.3. Student 1, 5, 7, 18, 26, 30, and 32 have fit > or =1.3. C(pbs), C(cit), and C(it) are good discrimination parameters. Rasch model can estimate item difficulty parameter and examinee's ability parameter with standard error. The fit statistics can identify bad items and unpredictable examinee's responses.
Difficulty and Discriminability of Introductory Psychology Test Items.
ERIC Educational Resources Information Center
Scialfa, Charles; Legare, Connie; Wenger, Larry; Dingley, Louis
2001-01-01
Analyzes multiple-choice questions provided in test banks for introductory psychology textbooks. Study 1 offered a consistent picture of the objective difficulty of multiple-choice tests for introductory psychology students, while both studies 1 and 2 indicated that test items taken from commercial test banks have poor psychometric properties.…
Working memory capacity and fluid abilities: the more difficult the item, the more more is better.
Little, Daniel R; Lewandowsky, Stephan; Craig, Stewart
2014-01-01
The relationship between fluid intelligence and working memory is of fundamental importance to understanding how capacity-limited structures such as working memory interact with inference abilities to determine intelligent behavior. Recent evidence has suggested that the relationship between a fluid abilities test, Raven's Progressive Matrices, and working memory capacity (WMC) may be invariant across difficulty levels of the Raven's items. We show that this invariance can only be observed if the overall correlation between Raven's and WMC is low. Simulations of Raven's performance revealed that as the overall correlation between Raven's and WMC increases, the item-wise point bi-serial correlations involving WMC are no longer constant but increase considerably with item difficulty. The simulation results were confirmed by two studies that used a composite measure of WMC, which yielded a higher correlation between WMC and Raven's than reported in previous studies. As expected, with the higher overall correlation, there was a significant positive relationship between Raven's item difficulty and the extent of the item-wise correlation with WMC.
Item selection via Bayesian IRT models.
Arima, Serena
2015-02-10
With reference to a questionnaire that aimed to assess the quality of life for dysarthric speakers, we investigate the usefulness of a model-based procedure for reducing the number of items. We propose a mixed cumulative logit model, which is known in the psychometrics literature as the graded response model: responses to different items are modelled as a function of individual latent traits and as a function of item characteristics, such as their difficulty and their discrimination power. We jointly model the discrimination and the difficulty parameters by using a k-component mixture of normal distributions. Mixture components correspond to disjoint groups of items. Items that belong to the same groups can be considered equivalent in terms of both difficulty and discrimination power. According to decision criteria, we select a subset of items such that the reduced questionnaire is able to provide the same information that the complete questionnaire provides. The model is estimated by using a Bayesian approach, and the choice of the number of mixture components is justified according to information criteria. We illustrate the proposed approach on the basis of data that are collected for 104 dysarthric patients by local health authorities in Lecce and in Milan. Copyright © 2014 John Wiley & Sons, Ltd.
Yost, Kathleen J; Webster, Kimberly; Baker, David W; Choi, Seung W; Bode, Rita K; Hahn, Elizabeth A
2009-06-01
Current health literacy measures are too long, imprecise, or have questionable equivalence of English and Spanish versions. The purpose of this paper is to describe the development and pilot testing of a new bilingual computer-based health literacy assessment tool. We analyzed literacy data from three large studies. Using a working definition of health literacy, we developed new prose, document and quantitative items in English and Spanish. Items were pilot tested on 97 English- and 134 Spanish-speaking participants to assess item difficulty. Items covered topics relevant to primary care patients and providers. English- and Spanish-speaking participants understood the tasks involved in answering each type of question. The English Talking Touchscreen was easy to use and the English and Spanish items provided good coverage of the difficulty continuum. Qualitative and quantitative results provided useful information on computer acceptability and initial item difficulty. After the items have been administered on the Talking Touchscreen (la Pantalla Parlanchina) to 600 English-speaking (and 600 Spanish-speaking) primary care patients, we will develop a computer adaptive test. This health literacy tool will enable clinicians and researchers to more precisely determine the level at which low health literacy adversely affects health and healthcare utilization.
Burke, Adam; Peper, Erik
2002-01-01
Cumulative trauma disorder is a major health problem for adults. Despite a growing understanding of adult cumulative trauma disorder, however, little is known about the risks for younger populations. This investigation examined issues related to child/adolescent computer product use and upper body physical discomfort. A convenience sample of 212 students, grades 1-12, was interviewed at their homes by a college-age sibling or relative. One of the child's parents was also interviewed. A 22-item questionnaire was used for data-gathering. Questionnaire items included frequency and duration of use, type of computer products/games and input devices used, presence of physical discomfort, and parental concerns related to the child's computer use. Many students experienced physical discomfort attributed to computer use, such as wrist pain (30%) and back pain (15%). Specific computer activities-such as using a joystick or playing noneducational games-were significantly predictive of physical discomfort using logistic multiple regression. Many parents reported difficulty getting their children off the computer (46%) and that their children spent less time outdoors (35%). Computer product use within this cohort was associated with self-reported physical discomfort. Results suggest a need for more extensive study, including multiyear longitudinal surveys.
Interpretation of the Rasch Ability and Difficulty Scales for Educational Purposes.
ERIC Educational Resources Information Center
Woodcock, Richard W.
Though many test developers have utilized item response theory in their work, few have taken advantage of the potential of item response theory for providing new interpretation procedures that accentuate the educational implications to be drawn from test scores. This paper describes several features, based upon the Rasch difficulty and ability…
The Effect of Anchor Test Construction on Scale Drift
ERIC Educational Resources Information Center
Antal, Judit; Proctor, Thomas P.; Melican, Gerald J.
2014-01-01
In common-item equating the anchor block is generally built to represent a miniature form of the total test in terms of content and statistical specifications. The statistical properties frequently reflect equal mean and spread of item difficulty. Sinharay and Holland (2007) suggested that the requirement for equal spread of difficulty may be too…
NASA Astrophysics Data System (ADS)
Slater, Stephanie
2009-05-01
The Test Of Astronomy STandards (TOAST) assessment instrument is a multiple-choice survey tightly aligned to the consensus learning goals stated by the American Astronomical Society - Chair's Conference on ASTRO 101, the American Association of the Advancement of Science's Project 2061 Benchmarks, and the National Research Council's National Science Education Standards. Researchers from the Cognition in Astronomy, Physics and Earth sciences Research (CAPER) Team at the University of Wyoming's Science and Math Teaching Center (UWYO SMTC) have been conducting a question-by-question distractor analysis procedure to determine the sensitivity and effectiveness of each item. In brief, the frequency each possible answer choice, known as a foil or distractor on a multiple-choice test, is determined and compared to the existing literature on the teaching and learning of astronomy. In addition to having statistical difficulty and discrimination values, a well functioning assessment item will show students selecting distractors in the relative proportions to how we expect them to respond based on known misconceptions and reasoning difficulties. In all cases, our distractor analysis suggests that all items are functioning as expected. These results add weight to the validity of the Test Of Astronomy STandards (TOAST) assessment instrument, which is designed to help instructors and researchers measure the impact of course-length duration instructional strategies for undergraduate science survey courses with learning goals tightly aligned to the consensus goals of the astronomy education community.
Simple mental addition in children with and without mild mental retardation.
Janssen, R; De Boeck, P; Viaene, M; Vallaeys, L
1999-11-01
The speeded performance on simple mental addition problems of 6- and 7-year-old children with and without mild mental retardation is modeled from a person perspective and an item perspective. On the person side, it was found that a single cognitive dimension spanned the performance differences between the two ability groups. However, a discontinuity, or "jump," was observed in the performance of the normal ability group on the easier items. On the item side, the addition problems were almost perfectly ordered in difficulty according to their problem size. Differences in difficulty were explained by factors related to the difficulty of executing nonretrieval strategies. All findings were interpreted within the framework of Siegler's (e.g., R. S. Siegler & C. Shipley, 1995) model of children's strategy choices in arithmetic. Models from item response theory were used to test the hypotheses. Copyright 1999 Academic Press.
ERIC Educational Resources Information Center
Hewitt, Margaret A.; Homan, Susan P.
2004-01-01
Test validity issues considered by test developers and school districts rarely include individual item readability levels. In this study, items from a major standardized test were examined for individual item readability level and item difficulty. The Homan-Hewitt Readability Formula was applied to items across three grade levels. Results of…
Bond, Kathy S; Chalmers, Kathryn J; Jorm, Anthony F; Kitchener, Betty A; Reavley, Nicola J
2015-06-03
There is a strong association between mental health problems and financial difficulties. Therefore, people who work with those who have financial difficulties (financial counsellors and financial institution staff) need to have knowledge and helping skills relevant to mental health problems. Conversely, people who support those with mental health problems (mental health professionals and carers) may need to have knowledge and helping skills relevant to financial difficulties. The Delphi expert consensus method was used to develop guidelines for people who work with or support those with mental health problems and financial difficulties. A systematic review of websites, books and journal articles was conducted to develop a questionnaire containing items about the knowledge, skills and actions relevant to working with or supporting someone with mental health problems and financial difficulties. These items were rated over three rounds by five Australian expert panels comprising of financial counsellors (n = 33), financial institution staff (n = 54), mental health professionals (n = 31), consumers (n = 20) and carers (n = 24). A total of 897 items were rated, with 462 items endorsed by at least 80 % of members of each of the expert panels. These endorsed statements were used to develop a set of guidelines for financial counsellors, financial institution staff, mental health professionals and carers about how to assist someone with mental health problems and financial difficulties. A diverse group of expert panel members were able to reach substantial consensus on the knowledge, skills and actions needed to work with and support people with mental health problems and financial difficulties. These guidelines can be used to inform policy and practice in the financial and mental health sectors.
Terwee, C B; Roorda, L D; de Vet, H C W; Dekker, J; Westhovens, R; van Leeuwen, J; Cella, D; Correia, H; Arnold, B; Perez, B; Boers, M
2014-08-01
The Patient-Reported Outcomes Measurement Information System (PROMIS(®)) is a new, state-of-the-art assessment system for measuring patient-reported health and well-being of adults and children that has the potential to be more valid, reliable and responsive than existing PROMs. The PROMIS items can be administered in short forms or, more efficiently, through computerized adaptive testing. This paper describes the translation of 563 items from 17 PROMIS item banks (domains) for adults from the English source into Dutch-Flemish. The translation was performed by FACITtrans using standardized methodology and approved by the PROMIS Statistical Center. The translation included four forward translations, two back-translations, three to five independent reviews (at least two Dutch, one Flemish) and pre-testing in 70 adults (age range 20-77) from the Netherlands and Flanders. A small number of items required separate translations for Dutch and Flemish: physical function (five items), pain behaviour (two items), pain interference (one item), social isolation (one item) and global health (one item). Challenges faced in the translation process included: scarcity or overabundance of possible translations, unclear item descriptions, constructs broader/smaller in the target language, difficulties in rank ordering items, differences in unit of measurement, irrelevant items or differences in performance of activities. By addressing these challenges, acceptable translations were obtained for all items. The methodology used and experience gained in this study can be used as an example for researchers in other countries interested in translating PROMIS. The Dutch-Flemish PROMIS items are linguistically equivalent. Short forms will soon be available for use and entire item banks are ready for cross-cultural validation in the Netherlands and Flanders.
[Difficulties at work and work motivation of ulcerative colitis suffers].
Nasu, Ayami; Yamada, Kazuko; Morioka, Ikuharu
2015-01-01
Because ulcerative colitis (UC) repeats remission and relapse, it is necessary to keep the condition at the relapse time in mind when considering support to provide UC suffers with at the workplace. The aim of this survey was to clarify the difficulties at work and work motivation that UC suffers feel at present and experience at the worsening time, and the factors for maintaining work motivation. We carried out an anonymous questionnaire survey of patients with present or past work experience. The difficulties at work (17 items) and work motivation (4 items) in the past week and at the time when the symptoms were most intensive during work were investigated using a newly designed questionnaire. We regarded the time in the past week as the present, and the time when the symptoms were most intensive during work as the worsening time. There were 70 respondents (response rate 32.0%). Their mean age was 43.8 years, and their mean age at onset was 33.8 years. All subjects, except 2 subjects after surgery, took medicine. Fifty-three (75.7%) of the subjects were in remission at the present, and most of them (91.4%) managed their physical condition well. Difficulties at work that many subjects worried about at the present were relevant to work conditions, such as "Others at workplace do not understand having an intractable and relapsing disease" (41.4%) or "Feel delayed or lack of chance of promotion or career advancement due to the disease" (38.6%). At the worsening time, the management of physical condition went wrong, and the frequency of hospital visits was increased, but few subjects consulted with superiors or colleagues at workplace. Difficulties at work that many subjects underwent at the worsening time were relevant to symptoms, such as "Feel physically tired" (80.0%) or "Decline foods or alcoholic beverages offered at business parties" (72.9%). Those who maintained work motivation even at the worsening time received no work-related consideration and had an adviser in the workplace to talk to about the disease. These results suggest that to provide UC suffers with support at the workplace, it is important to create a working atmosphere in which UC suffers can easily notify superiors and colleagues of their disease or can consult a doctor regularly, and for the superiors and colleagues to become advisors in the workplace to talk to about disease and work.
A New Functional Health Literacy Scale for Japanese Young Adults Based on Item Response Theory.
Tsubakita, Takashi; Kawazoe, Nobuo; Kasano, Eri
2017-03-01
Health literacy predicts health outcomes. Despite concerns surrounding the health of Japanese young adults, to date there has been no objective assessment of health literacy in this population. This study aimed to develop a Functional Health Literacy Scale for Young Adults (funHLS-YA) based on item response theory. Each item in the scale requires participants to choose the most relevant term from 3 choices in relation to a target item, thus assessing objective rather than perceived health literacy. The 20-item scale was administered to 1816 university students and 1751 responded. Cronbach's α coefficient was .73. Difficulty and discrimination parameters of each item were estimated, resulting in the exclusion of 1 item. Some items showed different difficulty parameters for male and female participants, reflecting that some aspects of health literacy may differ by gender. The current 19-item version of funHLS-YA can reliably assess the objective health literacy of Japanese young adults.
An evidence based approach to undergraduate physical assessment practicum course development.
Anderson, Brenda; Nix, Elizabeth; Norman, Bilinda; McPike, H Dawn
2014-05-01
Physical assessment is an important component of professional nursing practice. New nurse graduates experience difficulty transitioning the traditional head to toe physical assessment into real world nursing practice. This study was conducted to provide current data concerning physical assessment competencies utilized consistently by registered nurses. This quantitative study used a 126 item survey mailed to 900 Registered Nurses. Participants used a Likert-type scale to report frequency of use for physical assessment competencies. Thirty seven competencies were determined to be essential components of the physical assessment, 18 were determined supplemental, and 71 were determined to be non-essential. Transition of the new graduate nurse into professional practice can be enhanced by focusing content in physical assessment practicum courses on the essential competencies of physical assessment. Faculty for the university has analyzed data from this study to support evidence based changes to the undergraduate nursing program physical assessment practicum course. Copyright © 2013 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Shulruf, Boaz; Jones, Phil; Turner, Rolf
2015-01-01
The determination of Pass/Fail decisions over Borderline grades, (i.e., grades which do not clearly distinguish between the competent and incompetent examinees) has been an ongoing challenge for academic institutions. This study utilises the Objective Borderline Method (OBM) to determine examinee ability and item difficulty, and from that…
ERIC Educational Resources Information Center
Wu, Pei-Chen; Chang, Lily
2008-01-01
The authors investigated the Chinese version of the Beck Depression Inventory-II (BDI-II-C; Chinese Behavioral Science Corporation, 2000) within the Rasch framework in terms of dimensionality, item difficulty, and category functioning. Two underlying scale dimensions, relatively high item difficulties, and a need for collapsing 2 response…
A Comparison of Three Types of Test Development Procedures Using Classical and Latent Trait Methods.
ERIC Educational Resources Information Center
Benson, Jeri; Wilson, Michael
Three methods of item selection were used to select sets of 38 items from a 50-item verbal analogies test and the resulting item sets were compared for internal consistency, standard errors of measurement, item difficulty, biserial item-test correlations, and relative efficiency. Three groups of 1,500 cases each were used for item selection. First…
ERIC Educational Resources Information Center
Wang, Wen-Chung
2004-01-01
Scale indeterminacy in analysis of differential item functioning (DIF) within the framework of item response theory can be resolved by imposing 3 anchor item methods: the equal-mean-difficulty method, the all-other anchor item method, and the constant anchor item method. In this article, applicability and limitations of these 3 methods are…
Windsor, Timothy D; Rodgers, Bryan; Butterworth, Peter; Anstey, Kaarin J; Jorm, Anthony F
2006-09-01
The effects of using different approaches to scoring the SF-12 summary scales of physical and mental health were examined with a view to informing the design and interpretation of community-based survey research. Data from a population-based study of 7485 participants in three cohorts aged 20-24, 40-44 and 60-64 years were used to examine relationships among measures of physical and mental health calculated from the same items using the SF-12 and RAND-12 approaches to scoring, and other measures of chronic physical conditions and psychological distress. A measure of physical health constructed using the RAND-12 scoring showed a monotonic negative association with psychological distress as measured by the Goldberg depression and anxiety scales. However, a non-monotonic association was evident in the relationship between SF-12 physical health scores and distress, with very high SF-12 physical health scores corresponding with high levels of distress. These relationships highlight difficulties in interpretation that can arise when using the SF-12 summary scales in some analytical contexts. It is recommended that community surveys that measure physical and mental functioning using the SF-12 items generate summary scores using the RAND-12 protocol in addition to the SF-12 approach. In general, researchers should be wary of using factor scores based on orthogonal rotation, which assumes that measures are uncorrelated, to represent constructs that have an actual association.
Classical Item Analysis Using Latent Variable Modeling: A Note on a Direct Evaluation Procedure
ERIC Educational Resources Information Center
Raykov, Tenko; Marcoulides, George A.
2011-01-01
A directly applicable latent variable modeling procedure for classical item analysis is outlined. The method allows one to point and interval estimate item difficulty, item correlations, and item-total correlations for composites consisting of categorical items. The approach is readily employed in empirical research and as a by-product permits…
Item Estimates under Low-Stakes Conditions: How Should Omits Be Treated?
ERIC Educational Resources Information Center
DeMars, Christine
Using data from a pilot test of science and math from students in 30 high schools, item difficulties were estimated with a one-parameter model (partial-credit model for the multi-point items). Some items were multiple-choice items, and others were constructed-response items (open-ended). Four sets of estimates were obtained: estimates for males…
Fraundorf, Scott H; Benjamin, Aaron S
2016-09-01
Information about others' success in remembering is frequently available. For example, students taking an exam may assess its difficulty by monitoring when others turn in their exams. In two experiments, we investigated how rememberers use this information to guide recall. Participants studied paired associates, some semantically related (and thus easier to retrieve) and some unrelated (and thus harder). During a subsequent cued recall test, participants viewed fictive information about an opponent's accuracy on each item. In Experiment 1, participants responded to each cue once before seeing the opponent's performance and once afterwards. Participants reconsidered their responses least often when the opponent's accuracy matched the item difficulty (easy items the opponent recalled, hard items the opponent forgot) and most often when the opponent's accuracy and the item difficulty mismatched. When participants responded only after seeing the opponent's performance (Experiment 2), the same mismatch conditions that led to reconsideration even produced superior recall. These results suggest that rememberers monitor whether others' knowledge states accord or conflict with their own experience, and that this information shifts how they interrogate their memory and what they recall.
Intervention for children with word-finding difficulties: a parallel group randomised control trial.
Best, Wendy; Hughes, Lucy Mari; Masterson, Jackie; Thomas, Michael; Fedor, Anna; Roncoli, Silvia; Fern-Pollak, Liory; Shepherd, Donna-Lynn; Howard, David; Shobbrook, Kate; Kapikian, Anna
2017-07-31
The study investigated the outcome of a word-web intervention for children diagnosed with word-finding difficulties (WFDs). Twenty children age 6-8 years with WFDs confirmed by a discrepancy between comprehension and production on the Test of Word Finding-2, were randomly assigned to intervention (n = 11) and waiting control (n = 9) groups. The intervention group had six sessions of intervention which used word-webs and targeted children's meta-cognitive awareness and word-retrieval. On the treated experimental set (n = 25 items) the intervention group gained on average four times as many items as the waiting control group (d = 2.30). There were also gains on personally chosen items for the intervention group. There was little change on untreated items for either group. The study is the first randomised control trial to demonstrate an effect of word-finding therapy with children with language difficulties in mainstream school. The improvement in word-finding for treated items was obtained following a clinically realistic intervention in terms of approach, intensity and duration.
Fayyaz Khan, Humaira; Farooq Danish, Khalid; Saeed Awan, Azra; Anwar, Masood
2013-05-01
The purpose of the study was to identify technical item flaws in the multiple choice questions submitted for the final exams for the years 2009, 2010 and 2011. This descriptive analytical study was carried out in Islamic International Medical College (IIMC). The Data was collected from the MCQ's submitted by the faculty for the final exams for the year 2009, 2010 and 2011. The data was compiled and evaluated by a three member assessment committee. The data was analyzed for frequency and percentages the categorical data was analyzed by chi-square test. Overall percentage of flawed item was 67% for the year 2009 of which 21% were for testwiseness and 40% were for irrelevant difficulty. In year 2010 the total item flaws were 36% and 11% testwiseness and 22% were for irrelevant difficulty. The year 2011 data showed decreased overall flaws of 21%. The flaws of testwisness were 7%, irrelevant difficulty were 11%. Technical item flaws are frequently encountered during MCQ construction, and the identification of flaws leads to improved quality of the single best MCQ's.
A Review of Classical Methods of Item Analysis.
ERIC Educational Resources Information Center
French, Christine L.
Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…
Detecting a Gender-Related DIF Using Logistic Regression and Transformed Item Difficulty
ERIC Educational Resources Information Center
Abedlaziz, Nabeel; Ismail, Wail; Hussin, Zaharah
2011-01-01
Test items are designed to provide information about the examinees. Difficult items are designed to be more demanding and easy items are less so. However, sometimes, test items carry with their demands other than those intended by the test developer (Scheuneman & Gerritz, 1990). When personal attributes such as gender systematically affect…
Adaptable Learning Assistant for Item Bank Management
ERIC Educational Resources Information Center
Nuntiyagul, Atorn; Naruedomkul, Kanlaya; Cercone, Nick; Wongsawang, Damras
2008-01-01
We present PKIP, an adaptable learning assistant tool for managing question items in item banks. PKIP is not only able to automatically assist educational users to categorize the question items into predefined categories by their contents but also to correctly retrieve the items by specifying the category and/or the difficulty level. PKIP adapts…
Two-item same/different discrimination in rhesus monkeys (Macaca mulatta).
Basile, Benjamin M; Moylan, Emily J; Charles, David P; Murray, Elisabeth A
2015-11-01
Almost all nonhuman animals can recognize when one item is the same as another item. It is less clear whether nonhuman animals possess abstract concepts of "same" and "different" that can be divorced from perceptual similarity. Pigeons and monkeys show inconsistent performance, and often surprising difficulty, in laboratory tests of same/different learning that involve only two items. Previous results from tests using multi-item arrays suggest that nonhumans compute sameness along a continuous scale of perceptual variability, which would explain the difficulty of making two-item same/different judgments. Here, we provide evidence that rhesus monkeys can learn a two-item same/different discrimination similar to those on which monkeys and pigeons have previously failed. Monkeys' performance transferred to novel stimuli and was not affected by perceptual variations in stimulus size, rotation, view, or luminance. Success without the use of multi-item arrays, and the lack of effect of perceptual variability, suggests a computation of sameness that is more categorical, and perhaps more abstract, than previously thought.
The use of focus groups in the development of the PROMIS Pediatrics Item Bank
Walsh, Tasanee R.; Irwin, Debra E.; Meier, Andrea; Varni, James W.; DeWalt, Darren A.
2008-01-01
Objective To understand differences in perceptions of patient reported outcome domains between children with asthma and children from the general population. We used this information in the development of patient-reported outcome items for the Patient Reported Outcomes Measurement Information System Pediatrics project. Methods We conducted focus groups composed of ethnically, racially, and geographically diverse youth (8-12, 13-17 years) from the general population and youth with asthma. We performed content analysis to identify important themes. Results We identified five unique and different challenges that may confront youth with asthma as compared to general population youth: 1) They experience more difficulties when participating in physical activities; 2) They may experience anxiety about having an asthma attack at anytime and anywhere; 3) They may experience sleep disturbances and fatigue secondary to their asthma symptoms; 4) Their health condition has a greater effect on their emotional well-being and interpersonal relationships; and 5) Youth with asthma report that asthma often leaves them with insufficient energy to complete their school activities, especially physical activities. Conclusions The results confirm unique experiences for children with asthma across a broad range of health domains and enhance the breadth of all domains when creating an item bank. PMID:18427951
Odukoya, Jonathan A; Adekeye, Olajide; Igbinoba, Angie O; Afolabi, A
2018-01-01
Teachers and Students worldwide often dance to the tune of tests and examinations. Assessments are powerful tools for catalyzing the achievement of educational goals, especially if done rightly. One of the tools for 'doing it rightly' is item analysis. The core objectives for this study, therefore, were: ascertaining the item difficulty and distractive indices of the university wide courses. A range of 112-1956 undergraduate students participated in this study. With the use of secondary data, the ex-post facto design was adopted for this project. In virtually all cases, majority of the items (ranging between 65% and 97% of the 70 items fielded in each course) did not meet psychometric standard in terms of difficulty and distractive indices and consequently needed to be moderated or deleted. Considering the importance of these courses, the need to apply item analyses when developing these tests was emphasized.
NASA Astrophysics Data System (ADS)
Karim, Nafis I.; Maries, Alexandru; Singh, Chandralekha
2018-06-01
The Conceptual Survey of Electricity and Magnetism (CSEM) has been used to assess student understanding of introductory concepts of electricity and magnetism because many of the items on the CSEM have strong distractor choices which correspond to students' alternate conceptions. Instruction is unlikely to be effective if instructors do not know the common alternate conceptions of introductory physics students and explicitly take into account common student difficulties in their instructional design. Here, we discuss research involving the CSEM to evaluate one aspect of the pedagogical content knowledge of teaching assistants (TAs): knowledge of introductory students' alternate conceptions in electricity and magnetism as revealed by the CSEM. For each item on the CSEM, the TAs were asked to identify the most common incorrect answer choice selected by introductory physics students if they did not know the correct answer after traditional instruction. Then, we used introductory student CSEM post-test data to assess the extent to which TAs were able to identify the most common alternate conception of introductory students in each question on the CSEM. We find that the TAs were thoughtful when attempting to identify common student difficulties and they enjoyed learning about student difficulties this way. However, they struggled to identify many common difficulties of introductory students that persist after traditional instruction. We discuss specific alternate conceptions that persist after traditional instruction, the extent to which TAs were able to identify them, and results from think-aloud interviews with TAs which provided valuable information regarding why the TAs sometimes selected certain alternate conceptions as the most common but were instead very rare among introductory students. We also discuss how tasks such as the one used in this study can be used in professional development programs to engender productive discussions about the importance of being knowledgeable about student alternate conceptions in order to help students learn. Interviews with TAs engaged in this task as well as our experience with such tasks in our professional development programs suggest that they are beneficial.
Park, In Sook; Suh, Yeon Ok; Park, Hae Sook; Kang, So Young; Kim, Kwang Sung; Kim, Gyung Hee; Choi, Yeon-Hee; Kim, Hyun-Ju
2017-01-01
The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge. We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory. A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%-100% for 12 items; 60%-80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below -2.0), easy for 8 items (-2.0 to -0.5), medium for 6 items (-0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate. We surmised that the low reliability of the mock examination was influenced by the timing of the test for the examinees and the inappropriate difficulty of the items. Our study suggested a methodology for the development of future case-based items for the Korean Nursing Licensing Examination.
Fitting the Rasch Model to Account for Variation in Item Discrimination
ERIC Educational Resources Information Center
Weitzman, R. A.
2009-01-01
Building on the Kelley and Gulliksen versions of classical test theory, this article shows that a logistic model having only a single item parameter can account for varying item discrimination, as well as difficulty, by using item-test correlations to adjust incorrect-correct (0-1) item responses prior to an initial model fit. The fit occurs…
Effects of Item Exposure for Conventional Examinations in a Continuous Testing Environment.
ERIC Educational Resources Information Center
Hertz, Norman R.; Chinn, Roberta N.
This study explored the effect of item exposure on two conventional examinations administered as computer-based tests. A principal hypothesis was that item exposure would have little or no effect on average difficulty of the items over the course of an administrative cycle. This hypothesis was tested by exploring conventional item statistics and…
Karibe, Hiroyuki; Goddard, Greg; Shimazu, Kisaki; Kato, Yuichi; Warita-Naoi, Sachie; Kawakami, Tomomi
2014-12-11
Subjective symptoms of temporomandibular disorders (TMDs) have rarely been studied by age group. We aimed to compare self-reported pain intensity, sleeping difficulty, and treatment outcomes of patients with myofascial TMDs among three age groups. The study population included 179 consecutive patients (151 women and 28 men) who underwent comprehensive clinical examinations at a university-based orofacial pain center. They were classified into myofascial pain subgroups based on the Research Diagnostic Criteria for Temporomandibular Disorders. They were stratified by age group: M1, under 20 years; M2, 20-39 years; and M3, 40 years and older. The patients scored their pretreatment symptoms (first visit) and post-treatment symptoms (last visit) on a form composed of three items that assessed pain intensity and one item that assessed sleeping difficulty. Their treatment options (i.e., pharmacotherapy, physical therapy, and orthopedic appliances) and duration were recorded. All variables were compared between sexes in each group and between the age groups by using the Kruskal-Wallis test, the Mann-Whitney U test, the chi-square test, and analysis of variance (p < 0.05). No significant sex differences were found in any age group. Only sleeping difficulty was significantly different before treatment (p = 0.009). No significant differences were observed in the treatment options or treatment duration. After treatment, the intensity of jaw/face pain and headache and sleeping difficulty was significantly reduced in groups M2 and M3, but only the intensity of jaw/face pain was significantly decreased in group M1. The changes in the scores of pain intensity and sleeping difficulty were not different between the groups. Pain intensity does not differ by age group, but older patients with myofascial TMDs had greater sleeping difficulties. However, there were no differences between the age groups in the treatment outcomes. Clinicians should carefully consider the age-related characteristics of patients with myofascial TMDs when developing appropriate management strategies.
Efforts Toward the Development of Unbiased Selection and Assessment Instruments.
ERIC Educational Resources Information Center
Rudner, Lawrence M.
Investigations into item bias provide an empirical basis for the identification and elimination of test items which appear to measure different traits across populations or cultural groups. The Psychometric rationales for six approaches to the identification of biased test items are reviewed: (1) Transformed item difficulties: within-group…
Automatic Item Generation of Probability Word Problems
ERIC Educational Resources Information Center
Holling, Heinz; Bertling, Jonas P.; Zeuch, Nina
2009-01-01
Mathematical word problems represent a common item format for assessing student competencies. Automatic item generation (AIG) is an effective way of constructing many items with predictable difficulties, based on a set of predefined task parameters. The current study presents a framework for the automatic generation of probability word problems…
ERIC Educational Resources Information Center
Chauvin, Bruno; Leonova, Tamara
2016-01-01
Key concerns about the psychometric properties of the 25-item version of the Strengths and Difficulties Questionnaire (SDQ) have consistently been raised in the literature. The present study aimed at examining the meaningfulness of an alternative model to the SDQ in which 7 problematic items are excluded. French-speaking parents of 262 boys and…
Smolen, Tomasz; Chuderski, Adam
2015-01-01
Fluid intelligence (Gf) is a crucial cognitive ability that involves abstract reasoning in order to solve novel problems. Recent research demonstrated that Gf strongly depends on the individual effectiveness of working memory (WM). We investigated a popular claim that if the storage capacity underlay the WM-Gf correlation, then such a correlation should increase with an increasing number of items or rules (load) in a Gf-test. As often no such link is observed, on that basis the storage-capacity account is rejected, and alternative accounts of Gf (e.g., related to executive control or processing speed) are proposed. Using both analytical inference and numerical simulations, we demonstrated that the load-dependent change in correlation is primarily a function of the amount of floor/ceiling effect for particular items. Thus, the item-wise WM correlation of a Gf-test depends on its overall difficulty, and the difficulty distribution across its items. When the early test items yield huge ceiling, but the late items do not approach floor, that correlation will increase throughout the test. If the early items locate themselves between ceiling and floor, but the late items approach floor, the respective correlation will decrease. For a hallmark Gf-test, the Raven-test, whose items span from ceiling to floor, the quadratic relationship is expected, and it was shown empirically using a large sample and two types of WMC tasks. In consequence, no changes in correlation due to varying WM/Gf load, or lack of them, can yield an argument for or against any theory of WM/Gf. Moreover, as the mathematical properties of the correlation formula make it relatively immune to ceiling/floor effects for overall moderate correlations, only minor changes (if any) in the WM-Gf correlation should be expected for many psychological tests.
Fraundorf, Scott H.; Benjamin, Aaron S.
2015-01-01
Information about others’ success in remembering is frequently available. For example, students taking an exam may assess its difficulty by monitoring when others turn in their exams. In two experiments, we investigated how rememberers use this information to guide recall. Participants studied paired associates, some semantically related (and thus easier to retrieve) and some unrelated (and thus harder). During a subsequent cued recall test, participants viewed fictive information about an opponent’s accuracy on each item. In Experiment 1, participants responded to each cue once before seeing the opponent’s performance and once afterwards. Participants reconsidered their responses least often when the opponent’s accuracy matched the item difficulty (easy items the opponent recalled, hard items the opponent forgot) and most often when the opponent’s accuracy and the item difficulty mismatched. When participants responded only after seeing the opponent’s performance (Experiment 2), the same mismatch conditions that led to reconsideration even produced superior recall. These results suggest that rememberers monitor whether others’ knowledge states accord or conflict with their own experience, and that this information shifts how they interrogate their memory and what they recall. PMID:26247369
ERIC Educational Resources Information Center
Jones, Andrew T.
2011-01-01
Practitioners often depend on item analysis to select items for exam forms and have a variety of options available to them. These include the point-biserial correlation, the agreement statistic, the B index, and the phi coefficient. Although research has demonstrated that these statistics can be useful for item selection, no research as of yet has…
Rasch Analysis of the Power as Knowing Participation in Change Tool--the Brazilian version.
Guedes, Erika de Souza; Orozco-Vargas, Luiz Carlos; Turrini, Ruth Natália Teresa; de Sousa, Regina Márcia Cardoso; dos Santos, Mariana Alvina; da Cruz, Diná de Almeida Lopes Monteiro
2013-01-01
the objective of this study was to evaluate the items contained in the Brazilian version of the Power as Knowing Participation in Change Tool (PKPCT). investigation of the psychometric properties of the mentioned questionnaire through Rasch analysis. the data from 952 nursing assistants and 627 baccalaureate nurses were analyzed (average age 44.1 (SD=9.5); 13.0% men). The subscales Choices, Awareness, Freedom and Involvement were tested separately and presented unidimensionality; the categories of the responses given to the items were compiled from 7 to 3 levels and the items fit the model well, except for the following/leading item, in which the infit and outfit values were above 1.4; this item has also presented Differential Item Functioning (DIF) according to the participant's role. The reliability of the items was of 0.99 and the reliability of the participants ranged from 0.80 to 0.84 in the subscales. Items with extremely high levels of difficulty were not identified. the PKPCT should not be viewed as unidimensional, items with extremely high levels of difficulty in the scale need to be created and the differential functioning of some items has to be further investigated.
Cohn, Amy M.; Hagman, Brett T.; Graff, Fiona S.; Noel, Nora E.
2011-01-01
Objective: The present study examined the latent continuum of alcohol-related negative consequences among first-year college women using methods from item response theory and classical test theory. Method: Participants (N = 315) were college women in their freshman year who reported consuming any alcohol in the past 90 days and who completed assessments of alcohol consumption and alcohol-related negative consequences using the Rutgers Alcohol Problem Index. Results: Item response theory analyses showed poor model fit for five items identified in the Rutgers Alcohol Problem Index. Two-parameter item response theory logistic models were applied to the remaining 18 items to examine estimates of item difficulty (i.e., severity) and discrimination parameters. The item difficulty parameters ranged from 0.591 to 2.031, and the discrimination parameters ranged from 0.321 to 2.371. Classical test theory analyses indicated that the omission of the five misfit items did not significantly alter the psychometric properties of the construct. Conclusions: Findings suggest that those consequences that had greater severity and discrimination parameters may be used as screening items to identify female problem drinkers at risk for an alcohol use disorder. PMID:22051212
Item response theory and the measurement of motor behavior.
Safrit, M J; Cohen, A S; Costa, M G
1989-12-01
Item response theory (IRT) has been the focus of intense research and development activity in educational and psychological measurement during the past decade. Because this theory can provide more precise information about test items than other theories usually used in measuring motor behavior, the application of IRT in physical education and exercise science merits investigation. In IRT, the difficulty level of each item (e.g., trial or task) can be estimated and placed on the same scale as the ability of the examinee. Using this information, the test developer can determine the ability levels at which the test functions best. Equating the scores of individuals on two or more items or tests can be handled efficiently by applying IRT. The precision of the identification of performance standards in a mastery test context can be enhanced, as can adaptive testing procedures. In this tutorial, several potential benefits of applying IRT to the measurement of motor behavior were described. An example is provided using bowling data and applying the graded-response form of the Rasch IRT model. The data were calibrated and the goodness of fit was examined. This analysis is described in a step-by-step approach. Limitations to using an IRT model with a test consisting of repeated measures were noted.
Do Reading Experts Agree with MCAT Verbal Reasoning Item Classifications?
ERIC Educational Resources Information Center
Jackson, Evelyn W.; And Others
1994-01-01
Examined whether expert raters (n=5) could agree about classification of Medical College Admission Test (MCAT) items and whether they agreed with MCAT student manual in labeling skill being measured by each test item. Results revealed difficulties in replicating authors' labeling of skills for reading items on practice test provided with 1991 MCAT…
Combining the Best of Two Standard Setting Methods: The Ordered Item Booklet Angoff
ERIC Educational Resources Information Center
Smith, Russell W.; Davis-Becker, Susan L.; O'Leary, Lisa S.
2014-01-01
This article describes a hybrid standard setting method that combines characteristics of the Angoff (1971) and Bookmark (Mitzel, Lewis, Patz & Green, 2001) methods. The proposed approach utilizes strengths of each method while addressing weaknesses. An ordered item booklet, with items sorted based on item difficulty, is used in combination…
Comparative Racial Analysis of Enlisted Advancement Exams: Item- Difficulty.
1975-07-01
11cm-ana lysis Promotion Racial comparison Equal opportunity 1 20. ABSTRACT (Continue on reveree aide 11 neceeemry mnd Identity by block...improving equal oppor- tunity in career growth for minority groups. The study of exam item- difficulty levels is the first of a series of technical reports...under Exploratory Development Task Area PF55.521.032 (Contemporary Social Issues). J. J. CLARKIN Commanding Officer SUMMARY Purpose A number of
Levac, Danielle; Nawrotek, Joanna; Deschenes, Emilie; Giguere, Tia; Serafin, Julie; Bilodeau, Martin; Sveistrup, Heidi
2016-06-01
Virtual reality active video games are increasingly popular physical therapy interventions for children with cerebral palsy. However, physical therapists require educational resources to support decision making about game selection to match individual patient goals. Quantifying the movements elicited during virtual reality active video game play can inform individualized game selection in pediatric rehabilitation. The objectives of this study were to develop and evaluate the feasibility and reliability of the Movement Rating Instrument for Virtual Reality Game Play (MRI-VRGP). Item generation occurred through an iterative process of literature review and sample videotape viewing. The MRI-VRGP includes 25 items quantifying upper extremity, lower extremity, and total body movements. A total of 176 videotaped 90-second game play sessions involving 7 typically developing children and 4 children with cerebral palsy were rated by 3 raters trained in MRI-VRGP use. Children played 8 games on 2 virtual reality and active video game systems. Intraclass correlation coefficients (ICCs) determined intra-rater and interrater reliability. Excellent intrarater reliability was evidenced by ICCs of >0.75 for 17 of the 25 items across the 3 raters. Interrater reliability estimates were less precise. Excellent interrater reliability was achieved for far reach upper extremity movements (ICC=0.92 [for right and ICC=0.90 for left) and for squat (ICC=0.80) and jump items (ICC=0.99), with 9 items achieving ICCs of >0.70, 12 items achieving ICCs of between 0.40 and 0.70, and 4 items achieving poor reliability (close-reach upper extremity-ICC=0.14 for right and ICC=0.07 for left) and single-leg stance (ICC=0.55 for right and ICC=0.27 for left). Poor video quality, differing item interpretations between raters, and difficulty quantifying the high-speed movements involved in game play affected reliability. With item definition clarification and further psychometric property evaluation, the MRI-VRGP could inform the content of educational resources for therapists by ranking games according to frequency and type of elicited body movements.
Nawrotek, Joanna; Deschenes, Emilie; Giguere, Tia; Serafin, Julie; Bilodeau, Martin; Sveistrup, Heidi
2016-01-01
Background Virtual reality active video games are increasingly popular physical therapy interventions for children with cerebral palsy. However, physical therapists require educational resources to support decision making about game selection to match individual patient goals. Quantifying the movements elicited during virtual reality active video game play can inform individualized game selection in pediatric rehabilitation. Objective The objectives of this study were to develop and evaluate the feasibility and reliability of the Movement Rating Instrument for Virtual Reality Game Play (MRI-VRGP). Methods Item generation occurred through an iterative process of literature review and sample videotape viewing. The MRI-VRGP includes 25 items quantifying upper extremity, lower extremity, and total body movements. A total of 176 videotaped 90-second game play sessions involving 7 typically developing children and 4 children with cerebral palsy were rated by 3 raters trained in MRI-VRGP use. Children played 8 games on 2 virtual reality and active video game systems. Intraclass correlation coefficients (ICCs) determined intra-rater and interrater reliability. Results Excellent intrarater reliability was evidenced by ICCs of >0.75 for 17 of the 25 items across the 3 raters. Interrater reliability estimates were less precise. Excellent interrater reliability was achieved for far reach upper extremity movements (ICC=0.92 [for right and ICC=0.90 for left) and for squat (ICC=0.80) and jump items (ICC=0.99), with 9 items achieving ICCs of >0.70, 12 items achieving ICCs of between 0.40 and 0.70, and 4 items achieving poor reliability (close-reach upper extremity-ICC=0.14 for right and ICC=0.07 for left) and single-leg stance (ICC=0.55 for right and ICC=0.27 for left). Conclusions Poor video quality, differing item interpretations between raters, and difficulty quantifying the high-speed movements involved in game play affected reliability. With item definition clarification and further psychometric property evaluation, the MRI-VRGP could inform the content of educational resources for therapists by ranking games according to frequency and type of elicited body movements. PMID:27251029
MacDermid, Joy C; Tang, Kenneth; Sinden, Kathryn E; D'Amico, Robert
2018-05-25
Purpose Performance-based and disease indicators have been widely studied in firefighters; self-reported work role limitations have not. The aim of this study was to describe the distributions and correlations of a generic self-reported Work Limitations Questionnaire (WLQ-26) and firefighting-specific task performance-based tests. Methods Active firefighters from the City of Hamilton Fire Services (n = 293) were recruited. Participants completed the WLQ-26 to quantify on-the-job difficulties over five work domains: work scheduling (4 items), output demands (7 items), physical demands (8 items), mental demands (4 items), and social demands (3 items). A subset of participants (n = 149) were also assessed on hose drag and stair climb with a high-rise pack performance-based tests. Descriptive statistics and correlations were used to compare item/subscale performance; and to describe the inter-relationships between tests. Results The mean WLQ-26 item scores (/5) ranged from 4.1 to 4.4 (median = 5 for all items); most firefighters (54.5-80.5%) selected "difficult none of the time" response option on all items. A substantial ceiling effect was observed across all five WLQ-26 subscales as 44.0-55.6% were in the highest category. Subscale means ranged from 61.8 (social demands) to 78.7 (output demands and physical demands). Internal consistency exceeded 0.90 on all subscales. For the hose drag task, the mean time-to-completion was 48.0 s (SD = 14.5; range 20.4-95.0). For the stair climb task, the mean time-to-completion was 76.7 s (SD = 37.2; range 21.0-218.0). There were no significant correlations between self-report work limitations and performance of firefighting tasks. Conclusions The WLQ-26 measured five domains, but had ceiling effects in firefighters. Performance-based testing showed wider score range, lacked ceiling effects and did not correlate to the WLQ-26. A firefighter-specific, self-report role functioning scale may be needed to identify compromised work role capabilities in firefighters.
Understanding Orgasmic Difficulty in Women.
Rowland, David L; Kolba, Tiffany N
2016-08-01
Women's primary issue with the orgasmic phase is usually difficulty reaching orgasm. To identify predictors of orgasmic difficulty in women within the context of a partnered sexual experience; to assess the relation between orgasmic difficulty and self-reported levels of sexual desire or interest and arousal in women; and to assess the interrelations among three dimensions of orgasmic response during partnered sex: self-reported time to reach orgasm, general difficulty or ease of reaching orgasm, and level of distress or concern. Drawing from a community-based sample using the Internet, 866 women were queried on a 26-item survey regarding their difficulty reaching orgasm during partnered sex. Four hundred sixteen women who indicated difficulty also responded to items assessing arousal and desire difficulties, level of distress about their condition, and their estimated time to reach orgasm. Answers to a 26-item survey on surveyed women's difficulty reaching orgasm during partnered sex. Age, arousal difficulty, and lubrication difficulty predicted difficulty reaching orgasm in the overall sample. In the subsample of women reporting difficulty, approximately half reported issues with arousal. Women with arousal problems reported greater difficulty reaching orgasm but did not differ from those without arousal problems on measurements of orgasm latency or levels of distress. Slightly more than half the women experiencing difficulty reaching orgasm were distressed by their condition; distressed women reported greater difficulty reaching orgasm and longer latencies to orgasm than non-distressed counterparts. They also reported lower satisfaction with their sexual relationship. This study indicates the importance of assessing multiple parameters when investigating orgasmic problems in women, including arousal issues, levels of distress, and latency to orgasm. Results also clarify that women with arousal problems do not differ substantially from those without arousal problems; in contrast, women distressed by their condition differ from non-distressed women along some critical dimensions. Although orgasmic problems decreased with age, the overall relation of this variable to distress, arousal, and latency to orgasm was essentially unchanged across age groups. Copyright © 2016 International Society for Sexual Medicine. Published by Elsevier Inc. All rights reserved.
Psychometric Evaluation of a Cultural Competency Assessment Instrument for Health Professionals
Haywood, Sonja H.; Goode, Tawara; Gao, Yong; Smith, Kristyn; Bronheim, Suzanne; Flocke, Susan A; Zyzanski, Steve
2012-01-01
Background Few valid and reliable measures exist for health care professionals interested in determining their levels of cultural and linguistic competence. Objective To evaluate the measurement properties of the Cultural Competence Health Practitioner Assessment (CCHPA-129). Methods The CCHPA-129 is a 129-item web-based instrument, developed by the National Center for Cultural Competence (NCCC). Responses on the CCHPA -129 were examined using factor analysis; Rasch modeling; and Differential Item Functioning (DIF) across race, ethnicity, gender, and profession. Subjects 2504 practitioners, including 1864 nurses (RN/LPN,/BSN); 341 clinicians (PA/NP); and 299 physicians (MD/DO), who completed the CCHPA-129 online between 2005 and 2008. Results Three factors representing domains of knowledge, adapting practice, and promoting health for culturally and linguistically diverse populations accounted for 46% of the variance. Among Knowledge factor items, 53% (23/43) fit the Rasch model, item difficulties ranged from −1.01 logits (least difficult) to +1.11 logits (most difficult), separation index (SI) 13.82, and Cronbach’s α 0.92. Forty-seven percent (21/44) Adapting Practice factor items fit the model, item difficulties −0.07 to +1.11 logits, SI 11.59, Cronbach’s α 0.88; and 58% (23/39). Promoting Health factor items fit the model, item difficulties −1.01 to +1.38 logits, SI 22.64, Cronbach’s α 0.92. Early evidence of validity was established by known groups having statistically different scores. Conclusion The 67-item CCHPA-67 is psychometrically sound. This shorted instrument can be used to establish associations between practitioners’ cultural and linguistic competence and health outcomes as well as to evaluate interventions to increase practitioners’ cultural and linguistic competence. PMID:22437625
ERIC Educational Resources Information Center
Quaigrain, Kennedy; Arhin, Ato Kwamina
2017-01-01
Item analysis is essential in improving items which will be used again in later tests; it can also be used to eliminate misleading items in a test. The study focused on item and test quality and explored the relationship between difficulty index (p-value) and discrimination index (DI) with distractor efficiency (DE). The study was conducted among…
Estimating the Number of Examinees Who Did Not Reach the Last Item of a Section.
ERIC Educational Resources Information Center
Wainer, Howard
It is important to estimate the number of examinees who reached a test item, because item difficulty is defined by the number who answered correctly divided by the number who reached the item. A new method is presented and compared to the previously used definition of three categories of response to an item: (1) answered; (2) omitted--a…
ERIC Educational Resources Information Center
Masters, James S.
2010-01-01
With the need for larger and larger banks of items to support adaptive testing and to meet security concerns, large-scale item generation is a requirement for many certification and licensure programs. As part of the mass production of items, it is critical that the difficulty and the discrimination of the items be known without the need for…
Self-reported walking ability predicts functional mobility performance in frail older adults.
Alexander, N B; Guire, K E; Thelen, D G; Ashton-Miller, J A; Schultz, A B; Grunawalt, J C; Giordani, B
2000-11-01
To determine how self-reported physical function relates to performance in each of three mobility domains: walking, stance maintenance, and rising from chairs. Cross-sectional analysis of older adults. University-based laboratory and community-based congregate housing facilities. Two hundred twenty-one older adults (mean age, 79.9 years; range, 60-102 years) without clinical evidence of dementia (mean Folstein Mini-Mental State score, 28; range, 24-30). We compared the responses of these older adults on a questionnaire battery used by the Established Populations for the Epidemiologic Study of the Elderly (EPESE) project, to performance on mobility tasks of graded difficulty. Responses to the EPESE battery included: (1) whether assistance was required to perform seven Katz activities of daily living (ADL) items, specifically with walking and transferring; (2) three Rosow-Breslau items, including the ability to walk up stairs and walk a half mile; and (3) five Nagi items, including difficulty stooping, reaching, and lifting objects. The performance measures included the ability to perform, and time taken to perform, tasks in three summary score domains: (1) walking ("Walking," seven tasks, including walking with an assistive device, turning, stair climbing, tandem walking); (2) stance maintenance ("Stance," six tasks, including unipedal, bipedal, tandem, and maximum lean); and (3) chair rise ("Chair Rise," six tasks, including rising from a variety of seat heights with and without the use of hands for assistance). A total score combines scores in each Walking, Stance, and Chair Rise domain. We also analyzed how cognitive/ behavioral factors such as depression and self-efficacy related to the residuals from the self-report and performance-based ANOVA models. Rosow-Breslau items have the strongest relationship with the three performance domains, Walking, Stance, and Chair Rise (eta-squared ranging from 0.21 to 0.44). These three performance domains are as strongly related to one Katz ADL item, walking (eta-squared ranging from 0.15 to 0.33) as all of the Katz ADL items combined (eta-squared ranging from 0.21 to 0.35). Tests of problem solving and psychomotor speed, the Trails A and Trails B tests, are significantly correlated with the residuals from the self-report and performance-based ANOVA models. Compared with the rest of the EPESE self-report items, self-report items related to walking (such as Katz walking and Rosow-Breslau items) are better predictors of functional mobility performance on tasks involving walking, stance maintenance, and rising from chairs. Compared with other self-report items, self-reported walking ability may be the best predictor of overall functional mobility.
A Comparison of Alternate-Choice and True-False Item Forms Used in Classroom Examinations.
ERIC Educational Resources Information Center
Maihoff, N. A.; Mehrens, Wm. A.
A comparison is presented of alternate-choice and true-false item forms used in an undergraduate natural science course. The alternate-choice item is a modified two-choice multiple-choice item in which the two responses are included within the question stem. This study (1) compared the difficulty level, discrimination level, reliability, and…
Measuring the Instructional Sensitivity of ESL Reading Comprehension Items.
ERIC Educational Resources Information Center
Brutten, Sheila R.; And Others
A study attempted to estimate the instructional sensitivity of items in three reading comprehension tests in English as a second language (ESL). Instructional sensitivity is a test-item construct defined as the tendency for a test item to vary in difficulty as a function of instruction. Similar tasks were given to readers at different proficiency…
ERIC Educational Resources Information Center
Freund, Philipp Alexander; Hofer, Stefan; Holling, Heinz
2008-01-01
Figural matrix items are a popular task type for assessing general intelligence (Spearman's g). Items of this kind can be constructed rationally, allowing the implementation of computerized generation algorithms. In this study, the influence of different task parameters on the degree of difficulty in matrix items was investigated. A sample of N =…
Item Difficulty in the Evaluation of Computer-Based Instruction: An Example from Neuroanatomy
ERIC Educational Resources Information Center
Chariker, Julia H.; Naaz, Farah; Pani, John R.
2012-01-01
This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of…
Estimation of Item Response Theory Parameters in the Presence of Missing Data
ERIC Educational Resources Information Center
Finch, Holmes
2008-01-01
Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same…
Francis, Wendy S; Tokowicz, Natasha; Kroll, Judith F
2014-01-01
Repetition priming was used to assess how proficiency and the ease or difficulty of lexical access influence bilingual translation. Two experiments, conducted at different universities with different Spanish-English bilingual populations and materials, showed repetition priming in word translation for same-direction and different-direction repetitions. Experiment 1, conducted in an English-dominant environment, revealed an effect of translation direction but not of direction match, whereas Experiment 2, conducted in a more balanced bilingual environment, showed an effect of direction match but not of translation direction. A combined analysis on the items common to both studies revealed that bilingual proficiency was negatively associated with response time (RT), priming, and the degree of translation asymmetry in RTs and priming. An item analysis showed that item difficulty was positively associated with RTs, priming, and the benefit of same-direction over different-direction repetition. Thus, although both participant accuracy and item accuracy are indices of learning, they have distinct effects on translation RTs and on the learning that is captured by the repetition-priming paradigm.
The second version of the L. V. Prasad-functional vision questionnaire.
Gothwal, Vijaya K; Sumalini, Rebecca; Bharani, Seelam; Reddy, Shailaja P; Bagga, Deepak K
2012-11-01
The L. V. Prasad-Functional Vision Questionnaire (LVP-FVQ) was developed using Rasch analysis to assess self-reported difficulties in performing daily tasks in school children with visual impairment (VI) in India. However, the LVP-FVQ has psychometric problems of inadequate measurement precision and lack of detailed assessment of dimensionality. Furthermore, items pertaining to use of technology are lacking. The aim of this study was to present the development and validation of the second version of LVP-FVQ (LVP-FVQ II). Development of LVP-FVQ II involved extracting items from other similar questionnaires (albeit developed for Western populations) and focus group discussions of children with VI and their parents that resulted in a 32-item pilot questionnaire. Overall, six items from the LVP-FVQ were retained. The questionnaire underwent pilot testing in 25 such children, following which a 27-item LVP-FVQ II emerged, and this was administered to 150 children with VI. Response to each item was rated on a three-category scale. Rasch analysis was used to validate the LVP-FVQ II. Rating scale was used by participants as was intended to. Four mobility-related items required deletion, as these did not contribute toward measurement of a single construct, indicating a secondary dimension. Deletion of the four items resulted in the 23-item unidimensional LVP-FVQ II, with good measurement precision, effective targeting of item difficulty to participant ability, and lack of notable differential item functioning. The LVP-FVQ II has high reliability, indicating that it is effectively able to discriminate between visual disability of school children in India, and is valid across age, gender, duration of VI, and location of residence. Given the superior measurement properties and the interval-level scores, the LVP-FVQ II appears to offer advantages over LVP-FVQ in assessment of difficulties in performing daily tasks in this population. It can be adapted for use in other developing countries.
Development of the EORTC QLQ-CAX24, A Questionnaire for Cancer Patients With Cachexia.
Wheelwright, Sally J; Hopkinson, Jane B; Darlington, Anne-Sophie; Fitzsimmons, Deborah F; Fayers, Peter; Balstad, Trude R; Bredart, Anne; Hammerlid, Eva; Kaasa, Stein; Nicolatou-Galitis, Ourania; Pinto, Monica; Schmidt, Heike; Solheim, Tora S; Strasser, Florian; Tomaszewska, Iwona M; Johnson, Colin D
2017-02-01
Cachexia is commonly found in cancer patients and has profound consequences; yet there is only one questionnaire that examines the patient's perspective. To report a rigorously developed module for patient self-reported impact of cancer cachexia. Module development followed published guidelines. Patients from across the cancer cachexia trajectory were included. In Phase 1, health-related quality of life (HRQOL) issues were generated from a literature review and interviews with patients in four countries. The issues were revised based on patient and health care professional (HCP) input. In Phase 2, questionnaire items were formulated and translated into the languages required for Phase 3, the pilot phase, in which patients from eight countries scored the relevance and importance of each item, and provided qualitative feedback. A total of 39 patients and 12 HCPs took part in Phase 1. The literature review produced 68 HRQOL issues, with 22 new issues arising from the patient interviews. After patient and HCP input, 44 issues were formulated into questionnaire items in Phase 2. One hundred ten patients took part in Phase 3. One item was reworded, and 20 items were deleted as a consequence of patient feedback. The QLQ-CAX24 is a cancer cachexia-specific questionnaire, comprising 24 items, for HRQOL assessment in clinical trials and practice. It contains five multi-item scales (food aversion, eating and weight-loss worry, eating difficulties, loss of control, and physical decline) and four single items. Copyright © 2016 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
Item Difficulty in the Evaluation of Computer-Based Instruction: An Example from Neuroanatomy
Chariker, Julia H.; Naaz, Farah; Pani, John R.
2012-01-01
This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of comparisons between instructional methods changed with the difficulty of the items to be learned. More challenging items better differentiated between instructional methods. This set of results is important for two reasons. First, it suggests that instruction may be more efficient if sets of consistently difficult items are the targets of instructional methods particularly suited to them. Second, there is wide variation in the published literature regarding the outcomes of empirical evaluations of computer-based instruction. As a consequence, many questions arise as to the factors that may affect such evaluations. The present paper demonstrates that the level of challenge in the material that is presented to learners is an important factor to consider in the evaluation of a computer-based instructional system. PMID:22231801
Item difficulty in the evaluation of computer-based instruction: an example from neuroanatomy.
Chariker, Julia H; Naaz, Farah; Pani, John R
2012-01-01
This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of comparisons between instructional methods changed with the difficulty of the items to be learned. More challenging items better differentiated between instructional methods. This set of results is important for two reasons. First, it suggests that instruction may be more efficient if sets of consistently difficult items are the targets of instructional methods particularly suited to them. Second, there is wide variation in the published literature regarding the outcomes of empirical evaluations of computer-based instruction. As a consequence, many questions arise as to the factors that may affect such evaluations. The present article demonstrates that the level of challenge in the material that is presented to learners is an important factor to consider in the evaluation of a computer-based instructional system. Copyright © 2011 American Association of Anatomists.
Momsen, Jennifer; Offerdahl, Erika; Kryjevskaia, Mila; Montplaisir, Lisa; Anderson, Elizabeth; Grosz, Nate
2013-06-01
Assessments and student expectations can drive learning: students selectively study and learn the content and skills they believe critical to passing an exam in a given subject. Evaluating the nature of assessments in undergraduate science education can, therefore, provide substantial insight into student learning. We characterized and compared the cognitive skills routinely assessed by introductory biology and calculus-based physics sequences, using the cognitive domain of Bloom's taxonomy of educational objectives. Our results indicate that both introductory sequences overwhelmingly assess lower-order cognitive skills (e.g., knowledge recall, algorithmic problem solving), but the distribution of items across cognitive skill levels differs between introductory biology and physics, which reflects and may even reinforce student perceptions typical of those courses: biology is memorization, and physics is solving problems. We also probed the relationship between level of difficulty of exam questions, as measured by student performance and cognitive skill level as measured by Bloom's taxonomy. Our analyses of both disciplines do not indicate the presence of a strong relationship. Thus, regardless of discipline, more cognitively demanding tasks do not necessarily equate to increased difficulty. We recognize the limitations associated with this approach; however, we believe this research underscores the utility of evaluating the nature of our assessments.
Using Assessments to Investigate and Compare the Nature of Learning in Undergraduate Science Courses
Momsen, Jennifer; Offerdahl, Erika; Kryjevskaia, Mila; Montplaisir, Lisa; Anderson, Elizabeth; Grosz, Nate
2013-01-01
Assessments and student expectations can drive learning: students selectively study and learn the content and skills they believe critical to passing an exam in a given subject. Evaluating the nature of assessments in undergraduate science education can, therefore, provide substantial insight into student learning. We characterized and compared the cognitive skills routinely assessed by introductory biology and calculus-based physics sequences, using the cognitive domain of Bloom's taxonomy of educational objectives. Our results indicate that both introductory sequences overwhelmingly assess lower-order cognitive skills (e.g., knowledge recall, algorithmic problem solving), but the distribution of items across cognitive skill levels differs between introductory biology and physics, which reflects and may even reinforce student perceptions typical of those courses: biology is memorization, and physics is solving problems. We also probed the relationship between level of difficulty of exam questions, as measured by student performance and cognitive skill level as measured by Bloom's taxonomy. Our analyses of both disciplines do not indicate the presence of a strong relationship. Thus, regardless of discipline, more cognitively demanding tasks do not necessarily equate to increased difficulty. We recognize the limitations associated with this approach; however, we believe this research underscores the utility of evaluating the nature of our assessments. PMID:23737631
2011-01-01
Background The quality of data in national health information systems has been questionable in most developing countries. However, the mechanisms of errors in the case identification process are not fully understood. This study aimed to investigate the mechanisms of errors in the case identification process in the existing routine health information system (RHIS) in the Philippines by measuring the risk of committing errors for health program indicators used in the Field Health Services Information System (FHSIS 1996), and characterizing those indicators accordingly. Methods A structured questionnaire on the definitions of 12 selected indicators in the FHSIS was administered to 132 health workers in 14 selected municipalities in the province of Palawan. A proportion of correct answers (difficulty index) and a disparity of two proportions of correct answers between higher and lower scored groups (discrimination index) were calculated, and the patterns of wrong answers for each of the 12 items were abstracted from 113 valid responses. Results None of 12 items reached a difficulty index of 1.00. The average difficulty index of 12 items was 0.266 and the discrimination index that showed a significant difference was 0.216 and above. Compared with these two cut-offs, six items showed non-discrimination against lower difficulty indices of 0.035 (4/113) to 0.195 (22/113), two items showed a positive discrimination against lower difficulty indices of 0.142 (16/113) and 0.248 (28/113), and four items showed a positive discrimination against higher difficulty indices of 0.469 (53/113) to 0.673 (76/113). Conclusions The results suggest three characteristics of definitions of indicators such as those that are (1) unsupported by the current conditions in the health system, i.e., (a) data are required from a facility that cannot directly generate the data and, (b) definitions of indicators are not consistent with its corresponding program; (2) incomplete or ambiguous, which allow several interpretations; and (3) complete yet easily misunderstood by health workers. Taking systemic factors into account, the case identification step needs to be reviewed and designed to generate intended data in health information systems. PMID:21995369
Redintegration, task difficulty, and immediate serial recall tasks.
Ritchie, Gabrielle; Tolan, Georgina Anne; Tehan, Gerald
2015-03-01
While current theoretical models remain somewhat inconclusive in their explanation of short-term memory (STM), many theories suggest at least a contribution of long-term memory (LTM) to the short-term system. A number of researchers refer to this process as redintegration (e.g., Schweickert, 1993). Under short-term recall conditions, the current study investigated the effects of redintegration and task difficulty in order to extend research conducted by Neale and Tehan (2007). Thirty participants in Experiment 1 and 26 participants in Experiment 2 completed a serial recall task in which retention interval, presentation rate, and articulatory suppression were used to modify task difficulty. Redintegration was examined by manipulating the characteristics of the to-be-remembered items; lexicality in Experiment 1 and wordlikeness in Experiment 2. Responses were scored based on correct-in-position recall, item scoring, and order accuracy scoring. In line with the Neale and Tehan results, as the difficulty of the task increased so did the effects of redintegration. This was evident in that the advantage for words in Experiment 1 and wordlikeness in Experiment 2 decreased as task difficulty increased. This relationship was observed for item but not order memory, and findings were discussed in relation to the theory of redintegration. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
ERIC Educational Resources Information Center
Engelen, Ron J. H.; And Others
Fisher's information measure for the item difficulty parameter in the Rasch model and its marginal and conditional formulations are investigated. It is shown that expected item information in the unconditional model equals information in the marginal model, provided the assumption of sampling examinees from an ability distribution is made. For the…
2017-01-01
Background Palliative care is nowadays essential in nursing care, due to the increasing number of patients who require attention in final stages of their life. Nurses need to acquire specific knowledge and abilities to provide quality palliative care. Palliative Care Quiz for Nurses is a questionnaire that evaluates their basic knowledge about palliative care. The Palliative Care Quiz for Nurses (PCQN) is useful to evaluate basic knowledge about palliative care, but its adaptation into the Spanish language and the analysis of its effectiveness and utility for Spanish culture is lacking. Purpose To report the adaptation into the Spanish language and the psychometric analysis of the Palliative Care Quiz for Nurses. Method The Palliative Care Quiz for Nurses-Spanish Version (PCQN-SV) was obtained from a process including translation, back-translation, comparison with versions in other languages, revision by experts, and pilot study. Content validity and reliability of questionnaire were analyzed. Difficulty and discrimination indexes of each item were also calculated according to Item Response Theory (IRT). Findings Adequate internal consistency was found (S-CVI = 0.83); Cronbach's alpha coefficient of 0.67 and KR-20 test result of 0,72 reflected the reliability of PCQN-SV. The questionnaire had a global difficulty index of 0,55, with six items which could be considered as difficult or very difficult, and five items with could be considered easy or very easy. The discrimination indexes of the 20 items, show us that eight items are good or very good while six items are bad to discriminate between good and bad respondents. Discussion Although in shows internal consistency, reliability and difficulty indexes similar to those obtained by versions of PCQN in other languages, a reformulation of the items with lowest content validity or discrimination indexes and those showing difficulties with their comprehension is an aspect to take into account in order to improve the PCQN-SV. Conclusion The PCQN-SV is a useful Spanish language instrument for measuring Spanish nurses’ knowledge in palliative care and it is adequate to establish international comparisons. PMID:28545037
Chover-Sierra, Elena; Martínez-Sabater, Antonio; Lapeña-Moñux, Yolanda Raquel
2017-01-01
Palliative care is nowadays essential in nursing care, due to the increasing number of patients who require attention in final stages of their life. Nurses need to acquire specific knowledge and abilities to provide quality palliative care. Palliative Care Quiz for Nurses is a questionnaire that evaluates their basic knowledge about palliative care. The Palliative Care Quiz for Nurses (PCQN) is useful to evaluate basic knowledge about palliative care, but its adaptation into the Spanish language and the analysis of its effectiveness and utility for Spanish culture is lacking. To report the adaptation into the Spanish language and the psychometric analysis of the Palliative Care Quiz for Nurses. The Palliative Care Quiz for Nurses-Spanish Version (PCQN-SV) was obtained from a process including translation, back-translation, comparison with versions in other languages, revision by experts, and pilot study. Content validity and reliability of questionnaire were analyzed. Difficulty and discrimination indexes of each item were also calculated according to Item Response Theory (IRT). Adequate internal consistency was found (S-CVI = 0.83); Cronbach's alpha coefficient of 0.67 and KR-20 test result of 0,72 reflected the reliability of PCQN-SV. The questionnaire had a global difficulty index of 0,55, with six items which could be considered as difficult or very difficult, and five items with could be considered easy or very easy. The discrimination indexes of the 20 items, show us that eight items are good or very good while six items are bad to discriminate between good and bad respondents. Although in shows internal consistency, reliability and difficulty indexes similar to those obtained by versions of PCQN in other languages, a reformulation of the items with lowest content validity or discrimination indexes and those showing difficulties with their comprehension is an aspect to take into account in order to improve the PCQN-SV. The PCQN-SV is a useful Spanish language instrument for measuring Spanish nurses' knowledge in palliative care and it is adequate to establish international comparisons.
Hong, Ickpyo; Reistetter, Timothy A; Díaz-Venegas, Carlos; Michaels-Obregon, Alejandra; Wong, Rebeca
2018-05-10
Cross-national comparisons of patterns of population aging have emerged as comparable national micro-data have become available. This study creates a metric using Rasch analysis and determines the health of American and Mexican older adult populations. Secondary data analysis using representative samples aged 50 and older from 2012 U.S. Health and Retirement Study (n = 20,554); 2012 Mexican Health and Aging Study (n = 14,448). We developed a function measurement scale using Rasch analysis of 22 daily tasks and physical function questions. We tested psychometrics of the scale including factor analysis, fit statistics, internal consistency, and item difficulty. We investigated differences in function using multiple linear regression controlling for demographics. Lastly, we conducted subgroup analyses for chronic conditions. The created common metric demonstrated a unidimensional structure with good item fit, an acceptable precision (person reliability = 0.78), and an item difficulty hierarchy. The American adults appeared less functional than adults in Mexico (β = - 0.26, p < 0.0001) and across two chronic conditions (arthritis, β = - 0.36; lung problems, β = - 0.62; all p < 0.05). However, American adults with stroke were more functional than Mexican adults (β = 0.46, p = 0.047). The Rasch model indicates that Mexican adults were more functional than Americans at the population level and across two chronic conditions (arthritis and lung problems). Future studies would need to elucidate other factors affecting the function differences between the two countries.
Fractionating the Neural Substrates of Incidental Recognition Memory
ERIC Educational Resources Information Center
Greene, Ciara M.; Vidaki, Kleio; Soto, David
2015-01-01
Familiar stimuli are typically accompanied by decreases in neural response relative to the presentation of novel items, but these studies often include explicit instructions to discriminate old and new items; this creates difficulties in partialling out the contribution of top-down intentional orientation to the items based on recognition goals.…
ERIC Educational Resources Information Center
Gaitas, Sérgio; Alves Martins, Margarida
2017-01-01
This study analyses teacher perceived difficulty in implementing differentiated instructional strategies in regular classes. The participants were 273 Portuguese primary school teachers with teaching experience ranging from 1 to 33 years. A 39-item questionnaire was used to evaluate teacher perceived difficulty in relation to different…
Measuring and Predicting Graded Reader Difficulty
ERIC Educational Resources Information Center
Holster, Trevor A.; Lake, J. W.; Pellowe, William R.
2017-01-01
This study used many-faceted Rasch measurement to investigate the difficulty of graded readers using a 3-item survey. Book difficulty was compared with Kyoto Level, Yomiyasusa Level, Lexile Level, book length, mean sentence length, and mean word frequency. Word frequency and Kyoto Level were found to be ineffective in predicting students'…
Critical success factors in awareness of and choice towards low vision rehabilitation.
Fraser, Sarah A; Johnson, Aaron P; Wittich, Walter; Overbury, Olga
2015-01-01
The goal of the current study was to examine the critical factors indicative of an individual's choice to access low vision rehabilitation services. Seven hundred and forty-nine visually impaired individuals, from the Montreal Barriers Study, completed a structured interview and questionnaires (on visual function, coping, depression, satisfaction with life). Seventy-five factors from the interview and questionnaires were entered into a data-driven Classification and Regression Tree Analysis in order to determine the best predictors of awareness group: positive personal choice (I knew and I went), negative personal choice (I knew and did not go), and lack of information (Nobody told me, and I did not know). Having a response of moderate to no difficulty on item 6 (reading signs) of the Visual Function Index 14 (VF-14) indicated that the person had made a positive personal choice to seek rehabilitation, whereas reporting a great deal of difficulty on this item was associated with a lack of information on low vision rehabilitation. In addition to this factor, symptom duration of under nine years, moderate difficulty or less on item 5 (seeing steps or curbs) of the VF-14, and an indication of little difficulty or less on item 3 (reading large print) of the VF-14 further identified those who were more likely to have made a positive personal choice. Individuals in the lack of information group also reported greater difficulty on items 3 and 5 of the VF-14 and were more likely to be male. The duration-of-symptoms factor suggests that, even in the positive choice group, it may be best to offer rehabilitation services early. Being male and responding moderate difficulty or greater to the VF-14 questions about far, medium-distance and near situations involving vision was associated with individuals that lack information. Consequently, these individuals may need additional education about the benefits of low vision services in order to make a positive personal choice. © 2014 The Authors Ophthalmic & Physiological Optics © 2014 The College of Optometrists.
Hämäläinen, H Pauliina; Suni, Jaana H; Pasanen, Matti E; Malmberg, Jarmo J; Miilunpalo, Seppo I
2006-06-01
The functional independence of elderly populations deteriorates with age. Several tests of physical performance have been developed for screening elderly persons who are at risk of losing their functional independence. The purpose of the present study was to investigate whether several components of health-related fitness (HRF) are valid in predicting the occurrence of self-reported mobility difficulties (MD) among high-functioning older adults. Subjects were community-dwelling men and women, born 1917-1941, who participated in the assessment of HRF [6.1-m (20-ft) walk, one-leg stand, backwards walk, trunk side-bending, dynamic back extension, one-leg squat, 1-km walk] and who were free of MD in 1996 (no difficulties in walking 2- km, n=788; no difficulties in climbing stairs, n=647). Postal questionnaires were used to assess the prevalence of MD in 1996 and the occurrence of new MD in 2002. Logistic regression analysis was used as the statistical method. Both inability to perform the backwards walk and a poorer result in it were associated with risk of walking difficulties in the logistic model, with all the statistically significant single test items included. Results of 1-km walk time and one-leg squat strength test were also associated with risk, although the squat was statistically significant only in two older birth cohorts. Regarding stair-climbing difficulties, poorer results in the 1-km walk, dynamic back extension and one-leg squat tests were associated with increased risk of MD. The backwards walk, one-leg squat, dynamic back extension and 1-km walk tests were the best predictors of MD. These tests are recommended for use in screening high-functioning older people at risk of MD, as well as to target physical activity counseling to those components of HRF that are important for functional independence.
Increased susceptibility to proactive interference in adults with dyslexia?
Bogaerts, Louisa; Szmalec, Arnaud; Hachmann, Wibke M; Page, Mike P A; Woumans, Evy; Duyck, Wouter
2015-01-01
Recent findings show that people with dyslexia have an impairment in serial-order memory. Based on these findings, the present study aimed to test the hypothesis that people with dyslexia have difficulties dealing with proactive interference (PI) in recognition memory. A group of 25 adults with dyslexia and a group of matched controls were subjected to a 2-back recognition task, which required participants to indicate whether an item (mis)matched the item that had been presented 2 trials before. PI was elicited using lure trials in which the item matched the item in the 3-back position instead of the targeted 2-back position. Our results demonstrate that the introduction of lure trials affected 2-back recognition performance more severely in the dyslexic group than in the control group, suggesting greater difficulty in resisting PI in dyslexia.
ERIC Educational Resources Information Center
Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan
2016-01-01
This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming…
A Graphical Approach to Item Analysis. Research Report. ETS RR-04-10
ERIC Educational Resources Information Center
Livingston, Samuel A.; Dorans, Neil J.
2004-01-01
This paper describes an approach to item analysis that is based on the estimation of a set of response curves for each item. The response curves show, at a glance, the difficulty and the discriminating power of the item and the popularity of each distractor, at any level of the criterion variable (e.g., total score). The curves are estimated by…
ERIC Educational Resources Information Center
Lee, Young-Sun; Krishnan, Anita; Park, Yoon Soo
2012-01-01
The purpose of this study was to investigate psychometric properties of the Children's Depression Inventory within a nonclinical and longitudinal sample (8th and 12th grades). Using the Rasch rating scale, most items represented one dimension. There was adequate separation among items and no overlap between ranges of item difficulties with latent…
ERIC Educational Resources Information Center
Atalmis, Erkan Hasan
2016-01-01
Multiple-choice (MC) items are commonly used in high-stake tests. Thus, each item of such tests should be meticulously constructed to increase the accuracy of decisions based on test results. Haladyna and his colleagues (2002) addressed the valid item-writing guidelines to construct high quality MC items in order to increase test reliability and…
Rogers, Elizabeth A; Yost, Kathleen J; Rosedahl, Jordan K; Linzer, Mark; Boehm, Deborah H; Thakur, Azra; Poplau, Sara; Anderson, Roger T; Eton, David T
2017-01-01
Aims To validate a comprehensive general measure of treatment burden, the Patient Experience with Treatment and Self-Management (PETS), in people with diabetes. Methods We conducted a secondary analysis of a cross-sectional survey study with 120 people diagnosed with type 1 or type 2 diabetes and at least one additional chronic illness. Surveys included established patient-reported outcome measures and a 48-item version of the PETS, a new measure comprised of multi-item scales assessing the burden of chronic illness treatment and self-care as it relates to nine domains: medical information, medications, medical appointments, monitoring health, interpersonal challenges, health care expenses, difficulty with health care services, role activity limitations, and physical/mental exhaustion from self-management. Internal reliability of PETS scales was determined using Cronbach’s alpha. Construct validity was determined through correlation of PETS scores with established measures (measures of chronic condition distress, medication satisfaction, self-efficacy, and global well-being), and known-groups validity through comparisons of PETS scores across clinically distinct groups. In an exploratory test of predictive validity, step-wise regressions were used to determine which PETS scales were most associated with outcomes of chronic condition distress, overall physical and mental health, and medication adherence. Results Respondents were 37–88 years old, 59% female, 29% non-white, and 67% college-educated. PETS scales showed good reliability (Cronbach’s alphas ≥0.74). Higher PETS scale scores (greater treatment burden) were correlated with more chronic condition distress, less medication convenience, lower self-efficacy, and worse general physical and mental health. Participants less (versus more) adherent to medications and those with more (versus fewer) health care financial difficulties had higher mean PETS scores. Medication burden was the scale that was most consistently associated with well-being and patient-reported adherence. Conclusion The PETS is a reliable and valid measure for assessing perceived treatment burden in people coping with diabetes. PMID:29184456
Rogers, Elizabeth A; Yost, Kathleen J; Rosedahl, Jordan K; Linzer, Mark; Boehm, Deborah H; Thakur, Azra; Poplau, Sara; Anderson, Roger T; Eton, David T
2017-01-01
To validate a comprehensive general measure of treatment burden, the Patient Experience with Treatment and Self-Management (PETS), in people with diabetes. We conducted a secondary analysis of a cross-sectional survey study with 120 people diagnosed with type 1 or type 2 diabetes and at least one additional chronic illness. Surveys included established patient-reported outcome measures and a 48-item version of the PETS, a new measure comprised of multi-item scales assessing the burden of chronic illness treatment and self-care as it relates to nine domains: medical information, medications, medical appointments, monitoring health, interpersonal challenges, health care expenses, difficulty with health care services, role activity limitations, and physical/mental exhaustion from self-management. Internal reliability of PETS scales was determined using Cronbach's alpha. Construct validity was determined through correlation of PETS scores with established measures (measures of chronic condition distress, medication satisfaction, self-efficacy, and global well-being), and known-groups validity through comparisons of PETS scores across clinically distinct groups. In an exploratory test of predictive validity, step-wise regressions were used to determine which PETS scales were most associated with outcomes of chronic condition distress, overall physical and mental health, and medication adherence. Respondents were 37-88 years old, 59% female, 29% non-white, and 67% college-educated. PETS scales showed good reliability (Cronbach's alphas ≥0.74). Higher PETS scale scores (greater treatment burden) were correlated with more chronic condition distress, less medication convenience, lower self-efficacy, and worse general physical and mental health. Participants less (versus more) adherent to medications and those with more (versus fewer) health care financial difficulties had higher mean PETS scores. Medication burden was the scale that was most consistently associated with well-being and patient-reported adherence. The PETS is a reliable and valid measure for assessing perceived treatment burden in people coping with diabetes.
Mackus, Marlou; Kruijff, Deborah de; Otten, Leila S; Kraneveld, Aletta D; Garssen, Johan; Verster, Joris C
2017-04-12
Altered immune functioning has been demonstrated in individuals with autism spectrum disorder (ASD). The current study explores the relationship between perceived immune functioning and experiencing ASD traits in healthy young adults. N = 410 students from Utrecht University completed a survey on immune functioning and autistic traits. In addition to a 1-item perceived immune functioning rating, the Immune Function Questionnaire (IFQ) was completed to assess perceived immune functioning. The Dutch translation of the Autism-Spectrum Quotient (AQ) was completed to examine variation in autistic traits, including the domains "social insights and behavior", "difficulties with change", "communication", "phantasy and imagination", and "detail orientation". The 1-item perceived immune functioning score did not significantly correlate with the total AQ score. However, a significant negative correlation was found between perceived immune functioning and the AQ subscale "difficulties with change" (r = -0.119, p = 0.019). In women, 1-item perceived immune functioning correlated significantly with the AQ subscales "difficulties with change" (r = -0.149, p = 0.029) and "communication" (r = -0.145, p = 0.032). In men, none of the AQ subscales significantly correlated with 1-item perceived immune functioning. In conclusion, a modest relationship between perceived immune functioning and several autistic traits was found.
Assessing the Conceptual Understanding about Heat and Thermodynamics at Undergraduate Level
ERIC Educational Resources Information Center
Kulkarni, Vasudeo Digambar; Tambade, Popat Savaleram
2013-01-01
In this study, a Thermodynamic Concept Test (TCT) was designed to assess student's conceptual understanding heat and thermodynamics at undergraduate level. The different statistical tests such as item difficulty index, item discrimination index, point biserial coefficient were used for assessing TCT. For each item of the test these indices were…
Modeling Booklet Effects for Nonequivalent Group Designs in Large-Scale Assessment
ERIC Educational Resources Information Center
Hecht, Martin; Weirich, Sebastian; Siegle, Thilo; Frey, Andreas
2015-01-01
Multiple matrix designs are commonly used in large-scale assessments to distribute test items to students. These designs comprise several booklets, each containing a subset of the complete item pool. Besides reducing the test burden of individual students, using various booklets allows aligning the difficulty of the presented items to the assumed…
Effects of Using Modified Items to Test Students with Persistent Academic Difficulties
ERIC Educational Resources Information Center
Elliott, Stephen N.; Kettler, Ryan J.; Beddow, Peter A.; Kurz, Alexander; Compton, Elizabeth; McGrath, Dawn; Bruen, Charles; Hinton, Kent; Palmer, Porter; Rodriguez, Michael C.; Bolt, Daniel; Roach, Andrew T.
2010-01-01
This study investigated the effects of using modified items in achievement tests to enhance accessibility. An experiment determined whether tests composed of modified items would reduce the performance gap between students eligible for an alternate assessment based on modified achievement standards (AA-MAS) and students not eligible, and the…
Regression Effects in Angoff Ratings: Examples from Credentialing Exams
ERIC Educational Resources Information Center
Wyse, Adam E.
2018-01-01
This article discusses regression effects that are commonly observed in Angoff ratings where panelists tend to think that hard items are easier than they are and easy items are more difficult than they are in comparison to estimated item difficulties. Analyses of data from two credentialing exams illustrate these regression effects and the…
A Five-Year Evaluation of Examination Structure in a Cardiovascular Pharmacotherapy Course
Kolar, Claire; Janke, Kristin K.
2015-01-01
Objective. To evaluate the composition and effectiveness as an assessment tool of a criterion-referenced examination comprised of clinical cases tied to practice decisions, to examine the effect of varying audience response system (ARS) questions on student examination preparation, and to articulate guidelines for structuring examinations to maximize evaluation of student learning. Design. Multiple-choice items developed over 5 years were evaluated using Bloom’s Taxonomy classification, point biserial correlation, item difficulty, and grade distribution. In addition, examination items were classified into categories based on similarity to items used in ARS preparation. Assessment. As the number of items directly tied to clinical practice rose, Bloom’s Taxonomy level and item difficulty also rose. In examination years where Bloom’s levels were high but preparation was minimal, average grade distribution was lower compared with years in which student preparation was higher. Conclusion. Criterion-referenced examinations can benefit from systematic evaluation of their composition and effectiveness as assessment tools. Calculated design and delivery of classroom preparation is an asset in improving examination performance on rigorous, practice-relevant examinations. PMID:27168611
Kılıç, Aslı; Hoyer, William J; Howard, Marc W
2013-01-01
BACKGROUND/STUDY CONTEXT: Older adults exhibit an age-related deficit in item memory as a function of the length of the retention interval, but older adults and young adults usually show roughly equivalent benefits due to the spacing of item repetitions in continuous memory tasks. The current experiment investigates the seemingly paradoxical effects of retention interval and spacing in young and older adults using a continuous recognition memory procedure. Fifty young adults and 52 older adults gave memory confidence ratings to words that were presented once (P1), twice (P2), or three times (P3), and the effects of the lag length and retention interval were assessed at P2 and at P3, respectively. Response times at P2 were disproportionately longer for older adults than for younger adults as a function of the number of items occurring between P1 and P2, suggestive of age-related loss in item memory. Ratings of confidence in memory responses revealed that older adults remembered fewer items at P2 with a high degree of certainty. Confidence ratings given at P3 suggested that young and older adults derived equivalent benefits from the spacing between P1 and P2. Findings of this study support theoretical accounts that suggest that recursive reminding and/or item retrieval difficulty promote item retention in older adults.
ERIC Educational Resources Information Center
Carroll, H. C. M.
2013-01-01
Two complementary studies of poor and better attenders are presented. To measure emotional and behavioural difficulties (EBD) different teacher-completed rating scales were employed, and to determine social difficulties, the studies used sociometry and some items from the scales. One study had a longitudinal design. It revealed that, after…
Perceived Difficulty with Physical Tasks, Lifestyle, and Physical Performance in Obese Children
D'Amico, Osvaldo; Sticco, Maura; Nugnes, Rosa; Mozzillo, Enza; Franzese, Adriana
2014-01-01
We estimated perceived difficulty with physical tasks, lifestyle, and physical performance in 382 children and adolescents (163 obese, 54 overweight, and 165 normal-weight subjects) and the relationship between perceived physical difficulties and sports participation, sedentary behaviors, or physical performance. Perceived difficulty with physical tasks and lifestyle habits was assessed by interview using a structured questionnaire, while physical performance was assessed through the six-minute walking test (6MWT). Obese children had higher perceived difficulty with several activities of daily living, were less engaged in sports, and had lower physical performance than normal-weight or overweight children; on the contrary, they did not differ with regard to time spent in sedentary behaviors. Perceived difficulty in running and hopping negatively predicted sports participation (P < 0.05 and <0.01, resp.), while perceived difficulty in almost all physical activities negatively predicted the 6MWT, independently of BMI (P < 0.01). Our results indicate that perception of task's difficulty level may reflect an actual difficulty in obese children. These findings may have practical implications for approaching physical activity in obese children. Exploring both the perception of a task's difficulty level and physical performance may be useful to design exercise programs that allow safe and successful participation. PMID:25105139
Paz, Sylvia H; Spritzer, Karen L; Morales, Leo S; Hays, Ron D
2013-03-29
To evaluate the equivalence of the PROMIS® wave 1 physical functioning item bank, by age (50 years or older versus 18-49). A total of 114 physical functioning items with 5 response choices were administered to English- (n=1504) and Spanish-language (n=640) adults. Item frequencies, means and standard deviations, item-scale correlations, and internal consistency reliability were estimated. Differential Item Functioning (DIF) by age was evaluated. Thirty of the 114 items were fagged for DIF based on an R-squared of 0.02 or above criterion. The expected total score was higher for those respondents who were 18-49 than those who were 50 or older. Those who were 50 years or older versus 18-49 years old with the same level of physical functioning responded differently to 30 of the 114 items in the PROMIS® physical functioning item bank. This study yields essential information about the equivalence of the physical functioning items in older versus younger individuals.
Middle school students' reading comprehension of mathematical texts and algebraic equations
NASA Astrophysics Data System (ADS)
Duru, Adem; Koklu, Onder
2011-06-01
In this study, middle school students' abilities to translate mathematical texts into algebraic representations and vice versa were investigated. In addition, students' difficulties in making such translations and the potential sources for these difficulties were also explored. Both qualitative and quantitative methods were used to collect data for this study: questionnaire and clinical interviews. The questionnaire consisted of two general types of items: (1) selected-response (multiple-choice) items for which the respondent selects from multiple options and (2) open-ended items for which the respondent constructs a response. In order to further investigate the students' strategies while they were translating the given mathematical texts to algebraic equations and vice versa, five randomly chosen (n = 5) students were interviewed. Data were collected in the 2007-2008 school year from 185 middle-school students in five teachers' classrooms in three different schools in the city of Adıyaman, Turkey. After the analysis of data, it was found that students who participated in this study had difficulties in translating the mathematical texts into algebraic equations by using symbols. It was also observed that these students had difficulties in translating the symbolic representations into mathematical texts because of their weak reading comprehension. In addition, finding of this research revealed that students' difficulties in translating the given mathematical texts into symbolic representations or vice versa come from different sources.
The degree of social difficulties experienced by cancer patients and their spouses.
Takeuchi, Takashi; Ichikura, Kanako; Amano, Kanako; Takeshita, Wakana; Hisamura, Kazuho
2018-06-08
Although recent studies have increasingly reported physical and psychological problems associated with cancer and its treatment, social problems of cancer patients and their families have not been sufficiently elucidated. The present study aimed to identify cancer-associated social problems from the perspectives of both patients and their spouses and to compare and analyze differences in their problems. This was a cross-sectional internet-based study. Subjects were 259 patients who developed cancer within the previous five years and 259 patients' spouses; the data were derived from two surveys in 2010 (patients) and 2016 (spouses) whose participants were not part of the same dyad but matched by propensity scores, estimated for age, sex, and the presence or absence of recurrence. We investigated the social difficulties of cancer patients and patients' spouses. Regarding social difficulties experienced by cancer patients and spouses, the 60 patient survey items were categorized into 14 labels by the Jiro Kawakita (KJ) method, which is a qualitative synthesis method developed by Kawakita to classify categorical data. Although patients had higher scores on most subcategories, young spouses aged 39 or younger and female spouses had difficulty scores as high as the corresponding patients on many subcategories. Health care providers should show sufficient concern for both patients and their spouses, particularly young and female spouses.
Gabay, Yafit; Karni, Avi; Banai, Karen
2017-01-01
Speech perception can improve substantially with practice (perceptual learning) even in adults. Here we compared the effects of four training protocols that differed in whether and how task difficulty was changed during a training session, in terms of the gains attained and the ability to apply (transfer) these gains to previously un-encountered items (tokens) and to different talkers. Participants trained in judging the semantic plausibility of sentences presented as time-compressed speech and were tested on their ability to reproduce, in writing, the target sentences; trail-by-trial feedback was afforded in all training conditions. In two conditions task difficulty (low or high compression) was kept constant throughout the training session, whereas in the other two conditions task difficulty was changed in an adaptive manner (incrementally from easy to difficult, or using a staircase procedure). Compared to a control group (no training), all four protocols resulted in significant post-training improvement in the ability to reproduce the trained sentences accurately. However, training in the constant-high-compression protocol elicited the smallest gains in deciphering and reproducing trained items and in reproducing novel, untrained, items after training. Overall, these results suggest that training procedures that start off with relatively little signal distortion (“easy” items, not far removed from standard speech) may be advantageous compared to conditions wherein severe distortions are presented to participants from the very beginning of the training session. PMID:28545039
Adaptive Mental Testing: The State of the Art
1979-11-01
typically vary in their psychometric properties --particularly in their difficulty--the test designer must decide what configuration of these item...psychometric properties best suits the test’s purpose. There are two extreme ration- ales to guide that decision. One rationale is to choose items that are...development of item response theory (Rasch, 1960; Lord, 1952, 1970, 1974a; Birnbaum, 1968) that provided the needed invariance properties for item
ERIC Educational Resources Information Center
van der Linden, Wim J.; Eggen, Theo J. H. M.
A procedure for the sequential optimization of the calibration of an item bank is given. The procedure is based on an empirical Bayes approach to a reformulation of the Rasch model as a model for paired comparisons between the difficulties of test items in which ties are allowed to occur. First, it is indicated how a paired-comparisons design…
Assessment of item-writing flaws in multiple-choice questions.
Nedeau-Cayo, Rosemarie; Laughlin, Deborah; Rus, Linda; Hall, John
2013-01-01
This study evaluated the quality of multiple-choice questions used in a hospital's e-learning system. Constructing well-written questions is fraught with difficulty, and item-writing flaws are common. Study results revealed that most items contained flaws and were written at the knowledge/comprehension level. Few items had linked objectives, and no association was found between the presence of objectives and flaws. Recommendations include education for writing test questions.
Narimoto, Tadamasa; Matsuura, Naomi; Takezawa, Tomohiro; Mitsuhashi, Yoshinori; Hiratani, Michio
2013-01-01
The authors investigated whether impaired spatial short-term memory exhibited by children with nonverbal learning disabilities is due to a problem in the encoding process. Children with or without nonverbal learning disabilities performed a simple spatial test that required them to remember 3, 5, or 7 spatial items presented simultaneously in random positions (i.e., spatial configuration) and to decide if a target item was changed or all items including the target were in the same position. The results showed that, even when the spatial positions in the encoding and probe phases were similar, the mean proportion correct of children with nonverbal learning disabilities was 0.58 while that of children without nonverbal learning disabilities was 0.84. The authors argue with the results that children with nonverbal learning disabilities have difficulty encoding relational information between spatial items, and that this difficulty is responsible for their impaired spatial short-term memory.
Application of Computerized Adaptive Testing to Entrance Examination for Graduate Studies in Turkey
ERIC Educational Resources Information Center
Bulut, Okan; Kan, Adnan
2012-01-01
Problem Statement: Computerized adaptive testing (CAT) is a sophisticated and efficient way of delivering examinations. In CAT, items for each examinee are selected from an item bank based on the examinee's responses to the items. In this way, the difficulty level of the test is adjusted based on the examinee's ability level. Instead of…
Examining the Invariance of Rater and Project Calibrations Using a Multi-facet Rasch Model.
ERIC Educational Resources Information Center
O'Neill, Thomas R.; Lunz, Mary E.
To generalize test results beyond the particular test administration, an examinee's ability estimate must be independent of the particular items attempted, and the item difficulty calibrations must be independent of the particular sample of people attempting the items. This stability is a key concept of the Rasch model, a latent trait model of…
Rasch Based Analysis of Oral Proficiency Test Data.
ERIC Educational Resources Information Center
Nakamura, Yuji
2001-01-01
This paper examines the rating scale data of oral proficiency tests analyzed by a Rasch Analysis focusing on an item map and factor analysis. In discussing the item map, the difficulty order of six items and students' answering patterns are analyzed using descriptive statistics and measures of central tendency of test scores. The data ranks the…
Investigating the Performance of Omega Index According to Item Parameters and Ability Levels
ERIC Educational Resources Information Center
Sunbul, Onder; Yormaz, Seha
2018-01-01
Purpose: Several studies can be found in the literature that investigate the performance of ? under various conditions. However no study for the effects of item difficulty, item discrimination, and ability restrictions on the performance of ? could be found. The current study aims to investigate the performance of ? for the conditions given below.…
ERIC Educational Resources Information Center
Parish, Jane A.; Karisch, Brandi B.
2013-01-01
Item analysis can serve as a useful tool in improving multiple-choice questions used in Extension programming. It can identify gaps between instruction and assessment. An item analysis of Mississippi Master Cattle Producer program multiple-choice examination responses was performed to determine the difficulty of individual examinations, assess the…
Exploring the Manifestations of Anxiety in Children with Autism Spectrum Disorders
ERIC Educational Resources Information Center
Hallett, Victoria; Lecavalier, Luc; Sukhodolsky, Denis G.; Cipriano, Noreen; Aman, Michael G.; McCracken, James T.; McDougle, Christopher J.; Tierney, Elaine; King, Bryan H.; Hollander, Eric; Sikich, Linmarie; Bregman, Joel; Anagnostou, Evdokia; Donnelly, Craig; Katsovich, Lily; Dukes, Kimberly; Vitiello, Benedetto; Gadow, Kenneth; Scahill, Lawrence
2013-01-01
This study explores the manifestation and measurement of anxiety symptoms in 415 children with ASDs on a 20-item, parent-rated, DSM-IV referenced anxiety scale. In both high and low-functioning children (IQ above vs. below 70), commonly endorsed items assessed restlessness, tension and sleep difficulties. Items requiring verbal expression of worry…
Sensitivity of Equated Aggregate Scores to the Treatment of Misbehaving Common Items
ERIC Educational Resources Information Center
Michaelides, Michalis P.
2010-01-01
The delta-plot method (Angoff, 1972) is a graphical technique used in the context of test equating for identifying common items with aberrant changes in their item difficulties across administrations or alternate forms. This brief research report explores the effects on equated aggregate scores when delta-plot outliers are either retained in or…
de Sá Junior, Antonio Reis; de Andrade, Arthur Guerra; Andrade, Laura Helena; Gorenstein, Clarice; Wang, Yuan-Pang
2018-07-01
This study examines the response pattern of depressive symptoms in a nationwide student sample, through item analyses of a rating scale by both classical test theory (CTT) and item response theory (IRT). The 21-item Beck Depression Inventory-II (BDI-II) was administered to 12,711 college students. First, the psychometric properties of the scale were described. Thereafter, the endorsement probability of depressive symptom in each scale item was analyzed through CTT and IRT. Graphical plots depicted the endorsement probability of scale items and intensity of depression. Three items of different difficulty level were compared through CTT and IRT approach. Four in five students reported the presence of depressive symptoms. The BDI-II items presented good reliability and were distributed along the symptomatic continuum of depression. Similarly, in both CTT and IRT approaches, the item 'changes in sleep' was easily endorsed, 'loss of interest' moderately and 'suicidal thoughts' hardly. Graphical representation of BDI-II of both methods showed much equivalence in terms of item discrimination and item difficulty. The item characteristic curve of the IRT method provided informative evaluation of item performance. The inventory was applied only in college students. Depressive symptoms were frequent psychopathological manifestations among college students. The performance of the BDI-II items indicated convergent results from both methods of analysis. While the CTT was easy to understand and to apply, the IRT was more complex to understand and to implement. Comprehensive assessment of the functioning of each BDI-II item might be helpful in efficient detection of depressive conditions in college students. Copyright © 2018 Elsevier B.V. All rights reserved.
Validity of a Protocol for Adult Self-Report of Dyslexia and Related Difficulties
ERIC Educational Resources Information Center
Snowling, Margaret; Dawes, Piers; Nash, Hannah; Hulme, Charles
2012-01-01
Background: There is an increased prevalence of reading and related difficulties in children of dyslexic parents. In order to understand the causes of these difficulties, it is important to quantify the risk factors passed from parents to their offspring. Method: 417 adults completed a protocol comprising a 15-item questionnaire rating reading and…
Lynch, Andrew D; Dodds, Nathan E; Yu, Lan; Pilkonis, Paul A; Irrgang, James J
2016-05-11
The content and wording of the Patient Reported Outcome Measurement Information System (PROMIS) Physical Function and Pain Interference item banks have not been qualitatively assessed by individuals with knee joint impairments. The purpose of this investigation was to identify items in the PROMIS Physical Function and Pain Interference Item Banks that are irrelevant, unclear, or otherwise difficult to respond to for individuals with impairment of the knee and to suggest modifications based on cognitive interviews. Twenty-nine individuals with knee joint impairments qualitatively assessed items in the Pain Interference and Physical Function Item Banks in a mixed-methods cognitive interview. Field notes were analyzed to identify themes and frequency counts were calculated to identify items not relevant to individuals with knee joint impairments. Issues with clarity were identified in 23 items in the Physical Function Item Bank, resulting in the creation of 43 new or modified items, typically changing words within the item to be clearer. Interpretation issues included whether or not the knee joint played a significant role in overall health and age/gender differences in items. One quarter of the original items (31 of 124) in the Physical Function Item Bank were identified as irrelevant to the knee joint. All 41 items in the Pain Interference Item Bank were identified as clear, although individuals without significant pain substituted other symptoms which interfered with their life. The Physical Function Item Bank would benefit from additional items that are relevant to individuals with knee joint impairments and, by extension, to other lower extremity impairments. Several issues in clarity were identified that are likely to be present in other patient cohorts as well.
ERIC Educational Resources Information Center
Brese, Falk, Ed.
2012-01-01
The goal for selecting the released set of test items was to have approximately 25% of each of the full item sets for mathematics content knowledge (MCK) and mathematics pedagogical content knowledge (MPCK) that would represent the full range of difficulty, content, and item format used in the TEDS-M study. The initial step in the selection was to…
Gerrard, Paul
2013-01-01
Nursing facility patients are a population that has not been well studied with regard to functional status and independence previously. As such, the manner in which activities of daily living (ADL) relate to one another is not well understood in this population. An understanding of ADL difficulty ordering has helped to devise systems of functional independence grading in other populations, which have value in understanding patients' global levels of independence and providing expectations regarding changes in function. This study seeks to examine the hierarchy of ADL in the nursing facility population. Data were analyzed from the 2004 National Nursing Home Survey, a cross-sectional data set of 13 507 skilled nursing facility subjects with functional independence items. The ADL difficulty hierarchy was determined using Rasch analysis. Item fit values for the Rasch model using Mean-Square infit statistics were also determined. The robustness of the hierarchy was tested for each ADL. Two grading systems were devised from the results of the item difficulty ordering. One was based on the most difficult item that he or she could perform, and the other assigned a grade based on the least difficult item that a subject could not perform. A total of 13 113 patients were included in this analysis, the majority of whom were female and white. They had an average age of 81 years. An ordered hierarchy of ADL was found with eating being the easiest and bathing the most difficult. All items in the Katz index fit the Rasch model adequately well. The majority of patients able to perform any particular ADL were also able to perform all easier ADL. Cohen's κ for the 2 grading systems was 0.73. This study is the first to show the expected hierarchy of difficulty of the 6 activities of daily proposed in the Katz index in the nursing facility population. The hierarchy found in this population matches the original hierarchy found in older adults in the community and acute care settings. It is also similar to hierarchy found in the inpatient rehabilitation setting. Patients would be expected to lose or gain function based on the order of difficulty, but this remains to be confirmed. Among the 6 activities of daily living tested here, their order from easiest to most difficult is eating, maintaining continence, transferring, toileting, dressing, and bathing. In addition, the index formed by these 6 items has construct validity in the nursing facility population.
Validation of a clinical critical thinking skills test in nursing.
Shin, Sujin; Jung, Dukyoo; Kim, Sungeun
2015-01-27
The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability.
Validation of a clinical critical thinking skills test in nursing
2015-01-01
Purpose: The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. Methods: This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Results: Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. Conclusion: From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability. PMID:25622716
Patient-clinician agreement on signs and symptoms of 'strep throat': a MetroNet study.
Xu, Jinping; Schwartz, Kendra; Monsur, Joseph; Northrup, Justin; Neale, Anne Victoria
2004-12-01
Despite substantial use of the telephone in health care, only a few studies have formally evaluated the appropriateness of telephone-based management for acute medical problems. The accuracy of patients' report of signs and symptoms remains unknown. We compared the agreement between patient self-assessment and clinician assessment on the typical signs and symptoms of group A beta-haemolytic Streptococcus (GABHS) to investigate the potential difficulties of using patient self-report to triage sore throat patients. In this cross-sectional study, each of 200 adult pharyngitis patients was instructed to examine him/herself and to record the symptoms and physical findings. Two clinicians independently interviewed and examined each patient and recorded their findings. Each patient then had a rapid GABHS antigen test, the results of which were blinded to both clinicians and patients. Each patient self-assessment was compared with the findings of each clinician, and the agreement and disagreement between them computed. We found varying levels of agreement (kappa=-0.05 to 0.71) between patients and clinicians on sore throat history and physical assessments. Importantly, there was fair to substantial agreement (kappa=0.20-0.71) on the key signs and symptoms used in GABHS clinical prediction rules. As expected, history items had the highest agreement (kappa=0.52-0.71). Patients were more likely than clinicians to report rather than deny a specific physical sign. Adult sore throat patients may reliably report their symptoms, but may not be able to assess and report accurately on relevant physical signs of pharyngitis. Patients have a tendency to over-report physical signs. This study indicates the potential difficulties associated with telephone triage of sore throat patients, or other illnesses that require assessment of physical signs.
ERIC Educational Resources Information Center
Schroeders, Ulrich; Robitzsch, Alexander; Schipolowski, Stefan
2014-01-01
C-tests are a specific variant of cloze tests that are considered time-efficient, valid indicators of general language proficiency. They are commonly analyzed with models of item response theory assuming local item independence. In this article we estimated local interdependencies for 12 C-tests and compared the changes in item difficulties,…
ERIC Educational Resources Information Center
Magno, Carlo
2009-01-01
The present report demonstrates the difference between classical test theory (CTT) and item response theory (IRT) approach using an actual test data for chemistry junior high school students. The CTT and IRT were compared across two samples and two forms of test on their item difficulty, internal consistency, and measurement errors. The specific…
The Accuracy of Estimated Total Test Statistics. Final Report.
ERIC Educational Resources Information Center
Kleinke, David J.
In a post-mortem study of item sampling, 1,050 examinees were divided into ten groups 50 times. Each time, their papers were scored on four different sets of item samples from a 150-item test of academic aptitude. These samples were selected using (a) unstratified random sampling and stratification on (b) content, (c) difficulty, and (d) both.…
ERIC Educational Resources Information Center
Kibble, Jonathan D.; Johnson, Teresa
2011-01-01
The purpose of this study was to evaluate whether multiple-choice item difficulty could be predicted either by a subjective judgment by the question author or by applying a learning taxonomy to the items. Eight physiology faculty members teaching an upper-level undergraduate human physiology course consented to participate in the study. The…
ERIC Educational Resources Information Center
Brackenbury, Tim; Zickar, Michael J.; Munson, Benjamin; Storkel, Holly L.
2017-01-01
Purpose: Item response theory (IRT) is a psychometric approach to measurement that uses latent trait abilities (e.g., speech sound production skills) to model performance on individual items that vary by difficulty and discrimination. An IRT analysis was applied to preschoolers' productions of the words on the Goldman-Fristoe Test of…
ERIC Educational Resources Information Center
Sullins, Walter L.
Five-hundred dichotomously scored response patterns were generated with sequentially independent (SI) items and 500 with dependent (SD) items for each of thirty-six combinations of sampling parameters (i.e., three test lengths, three sample sizes, and four item difficulty distributions). KR-20, KR-21, and Split-Half (S-H) reliabilities were…
ERIC Educational Resources Information Center
Pawade, Yogesh R.; Diwase, Dipti S.
2016-01-01
Item analysis of Multiple Choice Questions (MCQs) is the process of collecting, summarizing and utilizing information from students' responses to evaluate the quality of test items. Difficulty Index (p-value), Discrimination Index (DI) and Distractor Efficiency (DE) are the parameters which help to evaluate the quality of MCQs used in an…
Hällgren, Monica; Nygård, Louise; Kottorp, Anders
2014-05-01
While the development and possibilities of technology today are commonly regarded to be unlimited, knowledge regarding the technological needs of people with mental retardation is fairly limited. The aim of this study was to enhance knowledge of perceived relevance and difficulty in using everyday technology (ET) such as stoves, cell phones, and elevators in adults with mental retardation. 120 participants with different levels of mental retardation were interviewed with the Everyday Technology Use Questionnaire (ETUQ) about their use of such technologies in their everyday life. Analyses of variance, post hoc tests, and regression analyses were used to explore the data. Participants with moderate and severe mental retardation differed in mean perceived difficulty from those with mild mental retardation, suggesting that increased perceived difficulty in ET use is related to the level of mental retardation. Differences between groups were also found in the proportion of items that were relevant for each person. The variables Level of Mental Retardation, Additional Disabilities, and Proportional Relevance of ET Items could together predict 67.2% of the variation in perceived difficulty in technology use. The findings also indicate that age, housing, gender, and geographical district do not covariate with perceived difficulty in ET use.
CTTITEM: SAS macro and SPSS syntax for classical item analysis.
Lei, Pui-Wa; Wu, Qiong
2007-08-01
This article describes the functions of a SAS macro and an SPSS syntax that produce common statistics for conventional item analysis including Cronbach's alpha, item difficulty index (p-value or item mean), and item discrimination indices (D-index, point biserial and biserial correlations for dichotomous items and item-total correlation for polytomous items). These programs represent an improvement over the existing SAS and SPSS item analysis routines in terms of completeness and user-friendliness. To promote routine evaluations of item qualities in instrument development of any scale, the programs are available at no charge for interested users. The program codes along with a brief user's manual that contains instructions and examples are downloadable from suen.ed.psu.edu/-pwlei/plei.htm.
Dikken, Jeroen; Hoogerduijn, Jita G; Kruitwagen, Cas; Schuurmans, Marieke J
2016-11-01
To assess the content validity and psychometric characteristics of the Knowledge about Older Patients Quiz (KOP-Q), which measures nurses' knowledge regarding older hospitalized adults and their certainty regarding this knowledge. Cross-sectional. Content validity: general hospitals. Psychometric characteristics: nursing school and general hospitals in the Netherlands. Content validity: 12 nurse specialists in geriatrics. Psychometric characteristics: 107 first-year and 78 final-year bachelor of nursing students, 148 registered nurses, and 20 nurse specialists in geriatrics. Content validity: The nurse specialists rated each item of the initial KOP-Q (52 items) on relevance. Ratings were used to calculate Item-Content Validity Index and average Scale-Content Validity Index (S-CVI/ave) scores. Items with insufficient content validity were removed. Psychometric characteristics: Ratings of students, nurses, and nurse specialists were used to test for different item functioning (DIF) and unidimensionality before item characteristics (discrimination and difficulty) were examined using Item Response Theory. Finally, norm references were calculated and nomological validity was assessed. Content validity: Forty-three items remained after assessing content validity (S-CVI/ave = 0.90). Psychometric characteristics: Of the 43 items, two demonstrating ceiling effects and 11 distorting ability estimates (DIF) were subsequently excluded. Item characteristics were assessed for the remaining 30 items, all of which demonstrated good discrimination and difficulty parameters. Knowledge was positively correlated with certainty about this knowledge. The final 30-item KOP-Q is a valid, psychometrically sound, comprehensive instrument that can be used to assess the knowledge of nursing students, hospital nurses, and nurse specialists in geriatrics regarding older hospitalized adults. It can identify knowledge and certainty deficits for research purposes or serve as a tool in educational or quality improvement programs. © 2016, Copyright the Authors Journal compilation © 2016, The American Geriatrics Society.
Building an Evaluation Scale using Item Response Theory.
Lalor, John P; Wu, Hao; Yu, Hong
2016-11-01
Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.
Building an Evaluation Scale using Item Response Theory
Lalor, John P.; Wu, Hao; Yu, Hong
2016-01-01
Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.1 PMID:28004039
ERIC Educational Resources Information Center
Sahin, Esin; Yagbasan, Rahmi
2012-01-01
This study aims at diagnosing which subjects pre-service physics teachers have difficulty understanding in introductory physics courses and what accounts for these difficulties. A questionnaire consisting of two qualitative questions was used to collect data for this study. The questionnaire was administered to 101 pre-service physics teachers who…
A SIMPLE FRAILTY QUESTIONNAIRE (FRAIL) PREDICTS OUTCOMES IN MIDDLE AGED AFRICAN AMERICANS
MORLEY, J.E.; MALMSTROM, T.K.; MILLER, D.K.
2015-01-01
Objective To validate the FRAIL scale. Design Longitudinal study. Setting Community. Participants Representative sample of African Americans age 49 to 65 years at onset of study. Measurements The 5-item FRAIL scale (Fatigue, Resistance, Ambulation, Illnesses, & Loss of Weight), at baseline and activities of daily living (ADLs), instrumental activities of daily living (IADLs), mortality, short physical performance battery (SPPB), gait speed, one-leg stand, grip strength and injurious falls at baseline and 9 years. Blood tests for CRP, SIL6R, STNFR1, STNFR2 and 25 (OH) vitamin D at baseline. Results Cross-sectionally the FRAIL scale correlated significantly with IADL difficulties, SPPB, grip strength and one-leg stand among participants with no baseline ADL difficulties (N=703) and those outcomes plus gait speed in those with no baseline ADL dependencies (N=883). TNFR1 was increased in pre-frail and frail subjects and CRP in some subgroups. Longitudinally (N=423 with no baseline ADL difficulties or N=528 with no baseline ADL dependencies), and adjusted for the baseline value for each outcome, being pre-frail at baseline significantly predicted future ADL difficulties, worse one-leg stand scores, and mortality in both groups, plus IADL difficulties in the dependence-excluded group. Being frail at baseline significantly predicted future ADL difficulties, IADL difficulties, and mortality in both groups, plus worse SPPB in the dependence-excluded group. Conclusion This study has validated the FRAIL scale in a late middle-aged African American population. This simple 5-question scale is an excellent screening test for clinicians to identify frail persons at risk of developing disability as well as decline in health functioning and mortality. PMID:22836700
[Stress and burnout among Tunisian teachers].
Chennoufi, L; Ellouze, F; Cherif, W; Mersni, M; M'rad, M F
2012-12-01
Burnout, or professional exhaustion syndrome, is defined as a state of emotional, mental and physical exhaustion caused by excessive and prolonged stress at work. Despite the fact that it is not a recognized disorder in the DSM-IV, burnout has been widely described among medical and paramedical staff. In Tunisia, all the studies about this syndrome have only considered populations of doctors. However, professional exhaustion syndrome is not only limited to the medical sector, but can also be seen in any profession involving a relation of help. Thus, the teaching profession seems to be concerned with this syndrome. In fact, in our clinical practice, we are increasingly confronted with teachers' suffering. The latter face increasing difficulties in their work and moreover some of them can no longer resist and thus become vulnerable to the professional exhaustion syndrome. The aim of this study was to evaluate burnout among a population of Tunisian teachers and to examine the professional stressors associated with teachers' burnout. Our study was a transversal study conducted over five months (from October 2009 to February 2010) and it concerned teachers working in the public high schools of Manouba (Tunisia). The participants completed a self-questionnaire dealing with professional stressors. Five types of professional stressors were identified in the literature: bad working conditions, work overload, administrative difficulties, organizational factors and difficulties with pupils and their relatives. They were also explored by the scale of the burnout: the Maslach Burnout Inventory (MBI), which is the best-studied measurement of burnout in the literature. We used the French version of the MBI adapted to educational settings. It is a scale composed of 22 items and three dimensions: emotional exhaustion (nine items), dehumanization (five items) and reduced personal accomplishment (eight items). In our study, we considered a teacher was suffering from burnout when at least two among the three dimensions of this scale were pathological. From the total number of teachers working in public high schools of Manouba (n=876), only 398 teachers filled in our questionnaires. Hence the rate of participation was 45.4%. The mean age of those participants was 40.04 years. 52.3% of them were women (sex ratio=0.91) and the great majority was married (81.8%). The burnout syndrome was found in 21% of those teachers: Moderate professional exhaustion was found in 16.4% of cases and severe professional exhaustion was found in 4.6%. A high emotional exhaustion was found in 27.4% of cases. A percentage of 16.1 of participants had a high dehumanization and 45.5% of them were susceptible to reduced personal accomplishment. The majority of teachers (66.4%) declared being stressed at work. The professional stressors reported by the teachers were in decreasing order of rate: bad working conditions (80.3%), overload work (75.2%), administrative difficulties (70.4%), difficulties with pupils and their relatives (64.4%) and finally organizational factors (57.1%). In our study, we found a strong association between burnout syndrome among teachers and three types of professional stressors which were: bad working conditions (p=0.0017), administrative difficulties (p=0.005) and difficulties with pupils and their relatives (p=0.005). The organizational factors and the work overload were not associated with the burnout syndrome. The job of teaching accumulates many difficulties. Some Tunisian teachers cannot tolerate this professional stress and develop a burnout. This syndrome leads to a teachers' psychological distress with the risk of an increase in absenteeism at work. So, we hope that this study will give rise to future research on stress, coping and burnout among Tunisian teachers, with theoretical aims as well as practical applications to prevent and reduce the risk of this problem. Copyright © 2012. Published by Elsevier Masson SAS.
Rasch Measurement of Collaborative Problem Solving in an Online Environment.
Harding, Susan-Marie E; Griffin, Patrick E
2016-01-01
This paper describes an approach to the assessment of human to human collaborative problem solving using a set of online interactive tasks completed by student dyads. Within the dyad, roles were nominated as either A or B and students selected their own roles. The question as to whether role selection affected individual student performance measures is addressed. Process stream data was captured from 3402 students in six countries who explored the problem space by clicking, dragging the mouse, moving the cursor and collaborating with their partner through a chat box window. Process stream data were explored to identify behavioural indicators that represented elements of a conceptual framework. These indicative behaviours were coded into a series of dichotomous items. These items represented actions and chats performed by students. The frequency of occurrence was used as a proxy measure of item difficulty. Then given a measure of item difficulty, student ability could be estimated using the difficulty estimates of the range of items demonstrated by the student. The Rasch simple logistic model was used to review the indicators to identify those that were consistent with the assumptions of the model and were invariant across national samples, language, curriculum and age of the student. The data were analysed using a one and two dimension, one parameter model. Rasch separation reliability, fit to the model, distribution of students and items on the underpinning construct, estimates for each country and the effect of role differences are reported. This study provides evidence that collaborative problem solving can be assessed in an online environment involving human to human interaction using behavioural indicators shown to have a consistent relationship between the estimate of student ability, and the probability of demonstrating the behaviour.
Haggerty, Jeannie L; Levesque, Jean-Frédéric
2017-04-01
Patients are the most valid source for evaluating the accessibility of services, but a previous study observed differential psychometric performance of instruments in rural and urban respondents. To validate a measure of organizational accessibility free of differential rural-urban performance that predicts consequences of difficult access for patient-initiated care. Sequential qualitative-quantitative study. Qualitative findings used to adapt or develop evaluative and reporting items. Quantitative validation study. Primary data by telephone from 750 urban, rural and remote respondents in Quebec, Canada; follow-up mailed questionnaire to a subset of 316. Items were developed for barriers along the care trajectory. We used common factor and confirmatory factor analysis to identify constructs and compare models. We used item response theory analysis to test for differential rural-urban performance; examine individual item performance; adjust response options; and exclude redundant or non-discriminatory items. We used logistic regression to examine predictive validity of the subscale on access difficulty (outcome). Initial factor resolution suggested geographic and organizational dimensions, plus consequences of access difficulty. After second administration, organizational accommodation and geographic indicators were integrated into a 6-item subscale of Effective Availability and Accommodation, which demonstrates good variability and internal consistency (α = 0.84) and no differential functioning by geographic area. Each unit increase predicts decreased likelihood of consequences of access difficulties (unmet need and problem aggravation). The new subscale is a practical, valid and reliable measure for patients to evaluate first-contact health services accessibility, yielding valid comparisons between urban and rural contexts. © 2016 The Authors. Health Expectations published by John Wiley & Sons Ltd.
Oude Voshaar, Martijn A H; Ten Klooster, Peter M; Glas, Cees A W; Vonkeman, Harald E; Taal, Erik; Krishnan, Eswar; Bernelot Moens, Hein J; Boers, Maarten; Terwee, Caroline B; van Riel, Piet L C M; van de Laar, Mart A F J
2015-12-01
To evaluate the content validity and measurement properties of the Patient-Reported Outcome Measurement Information System (PROMIS) physical function item bank and a 20-item short form in patients with RA in comparison with the HAQ disability index (HAQ-DI) and 36-item Short Form Health Survey (SF-36) physical functioning scale (PF-10). The content validity of the instruments was evaluated by linking their items to the International Classification of Functioning, Disability and Health (ICF) core set for RA. The measures were administered to 690 RA patients enrolled in the Dutch Rheumatoid Arthritis Monitoring registry. Measurement precision was evaluated using item response theory methods and construct validity was evaluated by correlating physical function scores with other clinical and patient-reported outcome measures. All 207 health concepts identified in the physical function measures referred to activities that are featured in the ICF. Twenty-three of 26 ICF RA core set domains are featured in the full PROMIS physical function item bank compared with 13 and 8 for the HAQ-DI and PF-10, respectively. As hypothesized, all three physical function instruments were highly intercorrelated (r 0.74-0.84), moderately correlated with disease activity measures (r 0.44-0.63) and weakly correlated with age (rs 0.07-0.14). Item response theory-based analysis revealed that a 20-item PROMIS physical function short form covered a wider range of physical function levels than the HAQ-DI or PF-10. The PROMIS physical function item bank demonstrated excellent measurement properties in RA. A content-driven 20-item short form may be a useful tool for assessing physical function in RA. © The Author 2015. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Nelson, Philip
2015-03-01
I'll describe an intermediate-level course on ``Physical Models of Living Systems.'' The only prerequisite is first-year university physics and calculus. The course is a response to rapidly growing interest among undergraduates in a broad range of science and engineering majors. Students acquire several research skills that are often not addressed in traditional courses:
ERIC Educational Resources Information Center
Schmitt, T. A.; Sass, D. A.; Sullivan, J. R.; Walker, C. M.
2010-01-01
Imposed time limits on computer adaptive tests (CATs) can result in examinees having difficulty completing all items, thus compromising the validity and reliability of ability estimates. In this study, the effects of speededness were explored in a simulated CAT environment by varying examinee response patterns to end-of-test items. Expectedly,…
ERIC Educational Resources Information Center
Semino, Sara; Ring, Melanie; Bowler, Dermot M.; Gaigg, Sebastian B.
2018-01-01
Autism Spectrum Disorder (ASD) is generally associated with difficulties in contextual source memory but not single item memory. There are surprising inconsistencies in the literature, however, that the current study seeks to address by examining item and source memory in age and ability matched groups of 22 ASD and 21 comparison adults. Results…
2016-01-01
Purpose: To determine the agreement among the items of the Korean physical therapist licensing examination, learning objectives of class subjects, and physical therapists’ job descriptions. Methods: The main tasks of physical therapists were classified, and university courses related to the main tasks were also classified. Frequency analysis was used to determine the proportions of credits for the classified courses out of the total credits of major subjects, exam items related to the classified courses out of the total number of exam items, and universities that offer courses related to the Korean physical therapist licensing examination among the surveyed universities. Results: The proportions of credits for clinical decision making and physical therapy diagnosis-related courses out of the total number credits for major subjects at universities were relatively low (2.06% and 2.58%, respectively). Although the main tasks of physical therapists are related to diagnosis and evaluation, the proportion of physiotherapy intervention-related items (35%) was higher than that of examination and evaluation-related items (25%) on the Korean physical therapist licensing examination. The percentages of universities that offer physical therapy diagnosis and clinical decision making-related courses were 58.62% and 68.97%, respectively. Conclusion: Both the proportion of physiotherapy diagnosis and evaluation-related items on the Korean physical therapist licensing examination, and the number of subjects related to clinical decision making and physical therapy diagnosis in the physical therapy curriculum, should be increased to ensure that the examination items and physical therapy curriculum reflect the practical tasks of physical therapists. PMID:26767720
Kang, Min-Hyeok; Kwon, Oh-Yun; Kim, Yong-Wook; Kim, Ji-Won; Kim, Tae-Ho; Oh, Tae-Young; Weon, Jong-Hyuk; Lee, Tae-Sik; Oh, Jae-Seop
2016-01-01
To determine the agreement among the items of the Korean physical therapist licensing examination, learning objectives of class subjects, and physical therapists' job descriptions. The main tasks of physical therapists were classified, and university courses related to the main tasks were also classified. Frequency analysis was used to determine the proportions of credits for the classified courses out of the total credits of major subjects, exam items related to the classified courses out of the total number of exam items, and universities that offer courses related to the Korean physical therapist licensing examination among the surveyed universities. The proportions of credits for clinical decision making and physical therapy diagnosis-related courses out of the total number credits for major subjects at universities were relatively low (2.06% and 2.58%, respectively). Although the main tasks of physical therapists are related to diagnosis and evaluation, the proportion of physiotherapy intervention-related items (35%) was higher than that of examination and evaluation-related items (25%) on the Korean physical therapist licensing examination. The percentages of universities that offer physical therapy diagnosis and clinical decision making-related courses were 58.62% and 68.97%, respectively. Both the proportion of physiotherapy diagnosis and evaluation-related items on the Korean physical therapist licensing examination, and the number of subjects related to clinical decision making and physical therapy diagnosis in the physical therapy curriculum, should be increased to ensure that the examination items and physical therapy curriculum reflect the practical tasks of physical therapists.
Arnould, Carlyne; Vandervelde, Laure; Batcho, Charles Sèbiyo; Penta, Massimo; Thonnard, Jean-Louis
2012-01-01
Objectives Several ABILHAND Rasch-built manual ability scales were previously developed for chronic stroke (CS), cerebral palsy (CP), rheumatoid arthritis (RA), systemic sclerosis (SSc) and neuromuscular disorders (NMD). The present study aimed to explore the applicability of a generic manual ability scale unbiased by diagnosis and to study the nature of manual ability across diagnoses. Design Cross-sectional study. Setting Outpatient clinic homes (CS, CP, RA), specialised centres (CP), reference centres (CP, NMD) and university hospitals (SSc). Participants 762 patients from six diagnostic groups: 103 CS adults, 113 CP children, 112 RA adults, 156 SSc adults, 124 NMD children and 124 NMD adults. Primary and secondary outcome measures Manual ability as measured by the ABILHAND disease-specific questionnaires, diagnosis and nature (ie, uni-manual or bi-manual involvement and proximal or distal joints involvement) of the ABILHAND manual activities. Results The difficulties of most manual activities were diagnosis dependent. A principal component analysis highlighted that 57% of the variance in the item difficulty between diagnoses was explained by the symmetric or asymmetric nature of the disorders. A generic scale was constructed, from a metric point of view, with 11 items sharing a common difficulty among diagnoses and 41 items displaying a category-specific location (asymmetric: CS, CP; and symmetric: RA, SSc, NMD). This generic scale showed that CP and NMD children had significantly less manual ability than RA patients, who had significantly less manual ability than CS, SSc and NMD adults. However, the generic scale was less discriminative and responsive to small deficits than disease-specific instruments. Conclusions Our finding that most of the manual item difficulties were disease-dependent emphasises the danger of using generic scales without prior investigation of item invariance across diagnostic groups. Nevertheless, a generic manual ability scale could be developed by adjusting and accounting for activities perceived differently in various disorders. PMID:23117570
Lawton IADL scale in dementia: can item response theory make it more informative?
McGrory, Sarah; Shenkin, Susan D; Austin, Elizabeth J; Starr, John M
2014-07-01
impairment of functional abilities represents a crucial component of dementia diagnosis. Current functional measures rely on the traditional aggregate method of summing raw scores. While this summary score provides a quick representation of a person's ability, it disregards useful information on the item level. to use item response theory (IRT) methods to increase the interpretive power of the Lawton Instrumental Activities of Daily Living (IADL) scale by establishing a hierarchy of item 'difficulty' and 'discrimination'. this cross-sectional study applied IRT methods to the analysis of IADL outcomes. Participants were 202 members of the Scottish Dementia Research Interest Register (mean age = 76.39, range = 56-93, SD = 7.89 years) with complete itemised data available. a Mokken scale with good reliability (Molenaar Sijtsama statistic 0.79) was obtained, satisfying the IRT assumption that the items comprise a single unidimensional scale. The eight items in the scale could be placed on a hierarchy of 'difficulty' (H coefficient = 0.55), with 'Shopping' being the most 'difficult' item and 'Telephone use' being the least 'difficult' item. 'Shopping' was the most discriminatory item differentiating well between patients of different levels of ability. IRT methods are capable of providing more information about functional impairment than a summed score. 'Shopping' and 'Telephone use' were identified as items that reveal key information about a patient's level of ability, and could be useful screening questions for clinicians. © The Author 2013. Published by Oxford University Press on behalf of the British Geriatrics Society. All rights reserved. For Permissions, please email: journals.permissions@ oup.com.
Conceptual question response times in Peer Instruction classrooms
NASA Astrophysics Data System (ADS)
Miller, Kelly; Lasry, Nathaniel; Lukoff, Brian; Schell, Julie; Mazur, Eric
2014-12-01
Classroom response systems are widely used in interactive teaching environments as a way to engage students by asking them questions. Previous research on the time taken by students to respond to conceptual questions has yielded insights on how students think and change conceptions. We measure the amount of time students take to respond to in-class, conceptual questions [ConcepTests (CTs)] in two introductory physics courses taught using Peer Instruction and use item response theory to determine the difficulty of the CTs. We examine response time differences between correct and incorrect answers both before and after the peer discussion for CTs of varying difficulty. We also determine the relationship between response time and student performance on a standardized test of incoming physics knowledge, precourse self-efficacy, and gender. Our data reveal three results of interest. First, response time for correct answers is significantly faster than for incorrect answers, both before and after peer discussion, especially for easy CTs. Second, students with greater incoming physics knowledge and higher self-efficacy respond faster in both rounds. Third, there is no gender difference in response rate after controlling for incoming physics knowledge scores, although males register significantly more attempts before committing to a final answer than do female students. These results provide insight into effective CT pacing during Peer Instruction. In particular, in order to maintain a pace that keeps everyone engaged, students should not be given too much time to respond. When around 80% of the answers are in, the ratio of correct to incorrect responses rapidly approaches levels indicating random guessing and instructors should close the poll.
ERIC Educational Resources Information Center
Ackerman, Brian P.; And Others
1990-01-01
Results of four experiments show that developmental differences in elaborative conceptual processing at acquisition and retrieval contribute independently to developmental increases in recall. Item identification processes for both words and pictures constrain children's elaborative processing. The constraints are time limited. (RH)
Treatment of Not-Administered Items on Individually Administered Intelligence Tests
ERIC Educational Resources Information Center
He, Wei; Wolfe, Edward W.
2012-01-01
In administration of individually administered intelligence tests, items are commonly presented in a sequence of increasing difficulty, and test administration is terminated after a predetermined number of incorrect answers. This practice produces stochastically censored data, a form of nonignorable missing data. By manipulating four factors…
Crins, Martine H P; Terwee, Caroline B; Klausch, Thomas; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis A; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Roorda, Leo D
2017-07-01
The objective of this study was to assess the psychometric properties of the Dutch-Flemish Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank in Dutch patients with chronic pain. A bank of 121 items was administered to 1,247 Dutch patients with chronic pain. Unidimensionality was assessed by fitting a one-factor confirmatory factor analysis and evaluating resulting fit statistics. Items were calibrated with the graded response model and its fit was evaluated. Cross-cultural validity was assessed by testing items for differential item functioning (DIF) based on language (Dutch vs. English). Construct validity was evaluated by calculation correlations between scores on the Dutch-Flemish PROMIS Physical Function measure and scores on generic and disease-specific measures. Results supported the Dutch-Flemish PROMIS Physical Function item bank's unidimensionality (Comparative Fit Index = 0.976, Tucker Lewis Index = 0.976) and model fit. Item thresholds targeted a wide range of physical function construct (threshold-parameters range: -4.2 to 5.6). Cross-cultural validity was good as four items only showed DIF for language and their impact on item scores was minimal. Physical Function scores were strongly associated with scores on all other measures (all correlations ≤ -0.60 as expected). The Dutch-Flemish PROMIS Physical Function item bank exhibited good psychometric properties. Development of a computer adaptive test based on the large bank is warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
Correlation Between University Students' Kinematic Achievement and Learning Styles
NASA Astrophysics Data System (ADS)
Çirkinoǧlu, A. G.; Dem&ircidot, N.
2007-04-01
In the literature, some researches on kinematics revealed that students have many difficulties in connecting graphs and physics. Also some researches showed that the method used in classroom affects students' further learning. In this study the correlation between university students' kinematics achieve and learning style are investigated. In this purpose Kinematics Achievement Test and Learning Style Inventory were applied to 573 students enrolled in general physics 1 courses at Balikesir University in the fall semester of 2005-2006. Kinematics Test, consists of 12 multiple choose and 6 open ended questions, was developed by researchers to assess students' understanding, interpreting, and drawing graphs. Learning Style Inventory, a 24 items test including visual, auditory, and kinesthetic learning styles, was developed and used by Barsch. The data obtained from in this study were analyzed necessary statistical calculations (T-test, correlation, ANOVA, etc.) by using SPSS statistical program. Based on the research findings, the tentative recommendations are made.
Anesthesiology Journal club assessment by means of semantic changes.
Vieira, Joaquim Edson; Torres, Marcelo Luís Abramides; Pose, Regina Albanese; Auler, José Otávio Costa Junior
2014-01-01
the interactive approach of a journal club has been described in the medical education literature. The aim of this investigation is to present an assessment of journal club as a tool to address the question whether residents read more and critically. this study reports the performance of medical residents in anesthesiology from the Clinics Hospital - University of São Paulo Medical School. All medical residents were invited to answer five questions derived from discussed papers. The answer sheet consisted of an affirmative statement with a Likert type scale (totally disagree-disagree-not sure-agree-totally agree), each related to one of the chosen articles. The results were evaluated by means of item analysis - difficulty index and discrimination power. residents filled one hundred and seventy three evaluations in the months of December 2011 (n=51), July 2012 (n=66) and December 2012 (n=56). The first exam presented all items with straight statement, second and third exams presented mixed items. Separating "totally agree" from "agree" increased the difficulty indices, but did not improve the discrimination power. the use of a journal club assessment with straight and inverted statements and by means of five points scale for agreement has been shown to increase its item difficulty and discrimination power. This may reflect involvement either with the reading or the discussion during the journal meeting. Copyright © 2013 Sociedade Brasileira de Anestesiologia. Published by Elsevier Editora Ltda. All rights reserved.
Constructing three emotion knowledge tests from the invariant measurement approach
Prieto, Gerardo; Burin, Debora I.
2017-01-01
Background Psychological constructionist models like the Conceptual Act Theory (CAT) postulate that complex states such as emotions are composed of basic psychological ingredients that are more clearly respected by the brain than basic emotions. The objective of this study was the construction and initial validation of Emotion Knowledge measures from the CAT frame by means of an invariant measurement approach, the Rasch Model (RM). Psychological distance theory was used to inform item generation. Methods Three EK tests—emotion vocabulary (EV), close emotional situations (CES) and far emotional situations (FES)—were constructed and tested with the RM in a community sample of 100 females and 100 males (age range: 18–65), both separately and conjointly. Results It was corroborated that data-RM fit was sufficient. Then, the effect of type of test and emotion on Rasch-modelled item difficulty was tested. Significant effects of emotion on EK item difficulty were found, but the only statistically significant difference was that between “happiness” and the remaining emotions; neither type of test, nor interaction effects on EK item difficulty were statistically significant. The testing of gender differences was carried out after corroborating that differential item functioning (DIF) would not be a plausible alternative hypothesis for the results. No statistically significant sex-related differences were found out in EV, CES, FES, or total EK. However, the sign of d indicate that female participants were consistently better than male ones, a result that will be of interest for future meta-analyses. Discussion The three EK tests are ready to be used as components of a higher-level measurement process. PMID:28929013
ERIC Educational Resources Information Center
Wallace, Colin S.; Prather, Edward E.; Duncan, Douglas K.
2012-01-01
This is the third of five papers detailing our national study of general education astronomy students' conceptual and reasoning difficulties with cosmology. In this paper, we use item response theory to analyze students' responses to three out of the four conceptual cosmology surveys we developed. The specific item response theory model we use is…
Tsubakita, Takashi; Shimazaki, Kazuyo; Ito, Hiroshi; Kawazoe, Nobuo
2017-10-30
The Utrecht Work Engagement Scale for Students has been used internationally to assess students' academic engagement, but it has not been analyzed via item response theory. The purpose of this study was to conduct an item response theory analysis of the Japanese version of the Utrecht Work Engagement Scale for Students translated by authors. Using a two-parameter model and Samejima's graded response model, difficulty and discrimination parameters were estimated after confirming the factor structure of the scale. The 14 items on the scale were analyzed with a sample of 3214 university and college students majoring medical science, nursing, or natural science in Japan. The preliminary parameter estimation was conducted with the two parameter model, and indicated that three items should be removed because there were outlier parameters. Final parameter estimation was conducted using the survived 11 items, and indicated that all difficulty and discrimination parameters were acceptable. The test information curve suggested that the scale better assesses higher engagement than average engagement. The estimated parameters provide a basis for future comparative studies. The results also suggested that a 7-point Likert scale is too broad; thus, the scaling should be modified to fewer graded scaling structure.
Item analysis of three Spanish naming tests: a cross-cultural investigation.
Marquez de la Plata, Carlos; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C Munro
2009-01-01
Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test's construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (136 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided.
Examination of the item structure of the Alberta infant motor scale.
Liao, Pai-Jun M; Campbell, Suzann K
2004-01-01
The Alberta Infant Motor Scale (AIMS) is a screening tool for identifying delayed motor development from birth to 18 months of age. The purpose of this study was to examine the psychometric structure of the AIMS, including the hierarchical scale of items and the precision for measuring infant ability at different ages. Ninety-seven infants with varying degrees of risk of developmental disability were recruited from three hospitals or from the community in the Chicago metropolitan area. Infants were tested on the AIMS at three, six, nine, and 12 months of age. The hierarchical structure and the range and distribution of item difficulty on the AIMS were analyzed using Rasch psychometric analysis. The Rasch analysis confirmed that items for each of the four testing positions (supine, prone, sitting, and standing) were arranged in increasing order of difficulty, but a ceiling effect was present. Gaps exist at six ability levels, indicating low precision of measurement for differentiating among infants after about nine months of age. The AIMS shows a ceiling effect, measures infant ability best from three to nine months of age, and has few items available for discriminating among infants after they pass the controlled lowering through standing item. Clinical impressions should be drawn with caution at ages when the precision of measurement is low.
Better assessment of physical function: item improvement is neglected but essential
2009-01-01
Introduction Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. Methods The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. Results We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Conclusions Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes. PMID:20015354
Better assessment of physical function: item improvement is neglected but essential.
Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E
2009-01-01
Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes.
Work factors are associated with workplace activity limitations in systemic lupus erythematosus.
Al Dhanhani, Ali M; Gignac, Monique A M; Beaton, Dorcas E; Su, Jiandong; Fortin, Paul R
2014-11-01
The objective of this study was to examine the extent of workplace activity limitations among persons with lupus and to identify factors associated with activity limitations among those employed. We conducted a cross-sectional study using a mailed survey and clinical data of persons with lupus who attended a large lupus outpatient clinic. Data were collected on demographics, health, work factors and psychosocial measures. The workplace activity limitations scale (WALS) was used to measure difficulty related to different activities at work. Multivariable analysis examined the association of health, work context, psychosocial and demographic variables with workplace activity limitations. We received 362 responses from 604 (60%) mailed surveys. Among those not employed, 52% reported not working because of lupus. A range of physical and mental tasks were reported as difficult. Each of the physical, cognitive and energy work activities was cited as difficult by more than one-third of participants. Among employed participants, 40% had medium to high WALS difficulty scores. In the multivariable analysis, factors significantly associated with workplace activity limitations were older age, greater disease activity, fatigue, poorer health status measured by the 36-item Short Form Health Survey, lower job control, greater job strain and working more than 40 h/week. People with lupus experience limitations and difficulty at work. Determinants of workplace activity limitations are mainly those related to workplace and health factors. © The Author 2014. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
A Rasch measure of teachers' views of teacher-student relationships in the primary school.
Leitao, Natalie; Waugh, Russell F
2012-01-01
This study investigated teacher-student relationships from the teachers' point of view at Perth metropolitan schools in Western Australia. The study identified three key social and emotional aspects that affect teacher-student relationships, namely, Connectedness, Availability and Communication. Data were collected by questionnaire (N = 139) with stem-items answered in three perspectives: (1) Idealistic: this is what I would like to happen; (2) Capability: this is what I am capable of; and (3) Behaviour: this is what actually happens, using four ordered response categories: not at all (score 1), some of the time (score 2), most of the time (score 3), and almost always (score 4). Data were analysed with a Rasch measurement model and a uni-dimensional, linear scale with 24 items, ordered from easy to hard, was created. The data were shown to be highly reliable, so that valid inferences could be made from the scale. The Person Separation Index (akin to a reliability index) was 0.93; there was good global teacher and item fit to the measurement model; there was good item fit; the targeting of the item difficulties against the teacher measures was good, and the response categories were answered consistently and logically. Teachers said that the ideal items were all easier than their corresponding capability items which were in turn easier than the behaviour items (where the items fitted the model), as conceptualized. The easiest ideal items were: I like this child and This child and I get along well together. The hardest ideal item (but still easy) was: I am available for this child. The easiest behaviour item (but still hard) was: This child and I get along well together. The hardest behaviour item (and very hard) was: I am interested to learn about this child's personal thoughts, feelings and experiences. The difficulties of the items supported the conceptual structure of the variable.
ERIC Educational Resources Information Center
Cacchione, Trix; Indino, Marcello; Fujita, Kazuo; Itakura, Shoji; Matsuno, Toyomi; Schaub, Simone; Amici, Federica
2014-01-01
Previous research has demonstrated that adults are successful at visually tracking rigidly moving items, but experience great difficulties when tracking substance-like "pouring" items. Using a comparative approach, we investigated whether the presence/absence of the grammatical count-mass distinction influences adults and children's…
The Handling of Missing Binary Data in Language Research
ERIC Educational Resources Information Center
Pichette, François; Béland, Sébastien; Jolani, Shahab; Lesniewska, Justyna
2015-01-01
Researchers are frequently confronted with unanswered questions or items on their questionnaires and tests, due to factors such as item difficulty, lack of testing time, or participant distraction. This paper first presents results from a poll confirming previous claims (Rietveld & van Hout, 2006; Schafer & Graham, 2002) that data…
Decimal Fraction Arithmetic: Logical Error Analysis and Its Validation.
ERIC Educational Resources Information Center
Standiford, Sally N.; And Others
This report illustrates procedures of item construction for addition and subtraction examples involving decimal fractions. Using a procedural network of skills required to solve such examples, an item characteristic matrix of skills analysis was developed to describe the characteristics of the content domain by projected student difficulties. Then…
Mutual Information Item Selection in Adaptive Classification Testing
ERIC Educational Resources Information Center
Weissman, Alexander
2007-01-01
A general approach for item selection in adaptive multiple-category classification tests is provided. The approach uses mutual information (MI), a special case of the Kullback-Leibler distance, or relative entropy. MI works efficiently with the sequential probability ratio test and alleviates the difficulties encountered with using other local-…
Chen, Liang-Yu; Wu, Yi-Hui; Huang, Chung-Yu; Liu, Li-Kuo; Hwang, An-Chun; Peng, Li-Ning; Lin, Ming-Hsieh; Chen, Liang-Kung
2017-04-01
To identify potentially modifiable risk factors for cognitive decline among veterans' home residents in Taiwan METHODS: The present retrospective cohort study was part of the Veteran Affairs-Comprehensive Geriatric Assessment study that retrieved data of the comprehensive geriatric assessment for 946 residents living at four veterans' homes in Taiwan. The study participants were interviewed every 3-6 months from January 2012 and December 2014. Demographic characteristics,multimorbidity by Charlson's Comorbidities Index, physical function by the Barthel Index, cognition by the Mini-Mental State Examination (MMSE), depression by the five-item Geriatric Depression Scale and nutritional status by the Mini-Nutrition Assessment-Short Form were collected for analysis. A generalized estimating equation model was used after it was adjusted for age, educational level, five-item Geriatric Depression Scale, and problem of communication difficulty to identify potential modifiable risk factors for cognitive decline. The mean age of the participants was 85.7 ± 5.2 years, with a mean follow-up period of 41 ± 21.6 weeks. The prevalence of cognitive impairment (defined by MMSE <24) was 65.6%, whereas 34% of the study participants were positive for depressive symptoms. Approximately one-fifth of the study participants were using psychotropic agents, which was higher among participants with cognitive impairment (23.6% vs 15.6%, P < 0.05) than those without. In the generalized estimating equation model, physical function, nutritional status, depressive symptoms, ex-drinker, multimorbidity and stool incontinence were positively correlated with MMSE score; whereas advanced age, low educational level (<6 years), presence of communication difficulty and use of psychotropic agents were inversely associated with the MMSE score. Physical function and nutritional status were positively associated with the MMSE score, and use of psychotropic agents was negatively correlated with cognitive function. Further intervention study is required to improve the cognitive health of older adults living in the veterans' retirement communities. Geriatr Gerontol Int 2017: 17 (Suppl. 1): 7-13. © 2017 Japan Geriatrics Society.
Sadler, Philip M.; Coyle, Harold; Smith, Nancy Cook; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John
2013-01-01
We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K–8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test takers hold either a misconception or an accepted scientific view. Tested nationally with 30,594 students, following their study of life science, and their 353 teachers, these items reveal a range of interesting results, particularly student difficulties in mastering the NRC standards. Teachers also answered test items and demonstrated a high level of subject matter knowledge reflecting the standards of the grade level at which they teach, but exhibiting few misconceptions of their own. In addition, teachers predicted the difficulty of each item for their students and which of the wrong answers would be the most popular. Teachers were found to generally overestimate their own students’ performance and to have a high level of awareness of the particular misconceptions that their students hold on the K–4 standards, but a low level of awareness of misconceptions related to the 5–8 standards. PMID:24006402
Sadler, Philip M; Coyle, Harold; Smith, Nancy Cook; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John
2013-01-01
We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K-8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test takers hold either a misconception or an accepted scientific view. Tested nationally with 30,594 students, following their study of life science, and their 353 teachers, these items reveal a range of interesting results, particularly student difficulties in mastering the NRC standards. Teachers also answered test items and demonstrated a high level of subject matter knowledge reflecting the standards of the grade level at which they teach, but exhibiting few misconceptions of their own. In addition, teachers predicted the difficulty of each item for their students and which of the wrong answers would be the most popular. Teachers were found to generally overestimate their own students' performance and to have a high level of awareness of the particular misconceptions that their students hold on the K-4 standards, but a low level of awareness of misconceptions related to the 5-8 standards.
Equating with Miditests Using IRT
ERIC Educational Resources Information Center
Fitzpatrick, Joseph; Skorupski, William P.
2016-01-01
The equating performance of two internal anchor test structures--miditests and minitests--is studied for four IRT equating methods using simulated data. Originally proposed by Sinharay and Holland, miditests are anchors that have the same mean difficulty as the overall test but less variance in item difficulties. Four popular IRT equating methods…
A Test of the Similar Sequence Hypothesis.
ERIC Educational Resources Information Center
Silverstein, A. B.; And Others
1982-01-01
Scales for object permanence and spatial relationships were administered to 98 severely and profoundly mentally retarded children (mean age 13 years) on three occasions, 6 months apart. Differences in the difficulty of the items were quite stable, but their order of difficulty differed appreciably from that for nonretarded infants. (Author/SB)
Reproduction of Inflectional Markers in French-Speaking Children with Reading Impairment
ERIC Educational Resources Information Center
St-Pierre, Marie-Catherine; Beland, Renee
2010-01-01
Purpose: Children with reading impairment (RI) experience difficulties in oral and written production of inflectional markers. The origin of these difficulties is not well documented in French. According to some authors, acquisition of irregular items by typically developing children is predicted by token frequency, whereas acquisition of regular…
Chuang, I-Ching; Lin, Keh-Chung; Wu, Ching-Yi; Hsieh, Yu-Wei; Liu, Chien-Ting; Chen, Chia-Ling
2017-10-01
The Motor Activity Log (MAL) and Lower-Functioning MAL (LF-MAL) are used to assess the amount of use of the more impaired arm and the quality of movement during activities in real-life situations for patients with stroke. This study used Rasch analysis to examine the psychometric properties of the MAL and LF-MAL in patients with stroke. This is a methodological study. The MAL and LF-MAL include 2 scales: the amount of use (AOU) and the quality of movement (QOM). Rasch analysis was used to examine the unidimensionality, item difficulty hierarchy, targeting, reliability, and differential item functioning (DIF) of the MAL and LF-MAL. A total of 403 patients with mild or moderate stroke completed the MAL, and 134 patients with moderate/severe stroke finished the LF-MAL. Evidence of disordered thresholds and poor model fit were found both in the MAL and LF-MAL. After the rating categories were collapsed and misfit items were deleted, all items of the revised MAL and LF-MAL exhibited ordering and constituted unidimensional constructs. The person-item map showed that these assessments were difficult for our participants. The person reliability coefficients of these assessments ranged from .79 to .87. No items in the revised MAL and LF-MAL exhibited bias related to patients' characteristics. One limitation is the recruited patients, who have relatively high-functioning ability in the LF-MAL. The revised MAL and LF-MAL are unidimensional scales and have good reliability. The categories function well, and responses to all items in these assessments are not biased by patients' characteristics. However, the revised MAL and LF-MAL both showed floor effect. Further study might add easy items for assessing the performance of activity in real-life situations for patients with stroke. © 2017 American Physical Therapy Association
Jafari, Peyman; Bagheri, Zahra; Ayatollahi, Seyyed Mohamad Taghi; Soltani, Zahra
2012-03-13
Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales. The PedsQL™ 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients.
Hays, Ron D; Spritzer, Karen L; Amtmann, Dagmar; Lai, Jin-Shei; Dewitt, Esi Morgan; Rothrock, Nan; Dewalt, Darren A; Riley, William T; Fries, James F; Krishnan, Eswar
2013-11-01
To create upper-extremity and mobility subdomain scores from the Patient-Reported Outcomes Measurement Information System (PROMIS) physical functioning adult item bank. Expert reviews were used to identify upper-extremity and mobility items from the PROMIS item bank. Psychometric analyses were conducted to assess empirical support for scoring upper-extremity and mobility subdomains. Data were collected from the U.S. general population and multiple disease groups via self-administered surveys. The sample (N=21,773) included 21,133 English-speaking adults who participated in the PROMIS wave 1 data collection and 640 Spanish-speaking Latino adults recruited separately. Not applicable. We used English- and Spanish-language data and existing PROMIS item parameters for the physical functioning item bank to estimate upper-extremity and mobility scores. In addition, we fit graded response models to calibrate the upper-extremity items and mobility items separately, compare separate to combined calibrations, and produce subdomain scores. After eliminating items because of local dependency, 16 items remained to assess upper extremity and 17 items to assess mobility. The estimated correlation between upper extremity and mobility was .59 using existing PROMIS physical functioning item parameters (r=.60 using parameters calibrated separately for upper-extremity and mobility items). Upper-extremity and mobility subdomains shared about 35% of the variance in common, and produced comparable scores whether calibrated separately or together. The identification of the subset of items tapping these 2 aspects of physical functioning and scored using the existing PROMIS parameters provides the option of scoring these subdomains in addition to the overall physical functioning score. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Engaging families in physical activity research: a family-based focus group study.
Brown, Helen Elizabeth; Schiff, Annie; van Sluijs, Esther M F
2015-11-25
Family-based interventions present a much-needed opportunity to increase children's physical activity levels. However, little is known about how best to engage parents and their children in physical activity research. This study aimed to engage with the whole family to understand how best to recruit for, and retain participation in, physical activity research. Families (including a 'target' child aged between 8 and 11 years, their parents, siblings, and others) were recruited through schools and community groups. Focus groups were conducted using a semi-structured approach (informed by a pilot session). Families were asked to order cards listing the possible benefits of, and the barriers to, being involved in physical activity research and other health promotion activities, highlighting the items they consider most relevant, and suggesting additional items. Duplicate content analysis was used to identify transcript themes and develop a coding frame. Eighty-two participants from 17 families participated, including 17 'target' children (mean age 9.3 ± 1.1 years, 61.1% female), 32 other children and 33 adults (including parents, grandparents, and older siblings). Social, health and educational benefits were cited as being key incentives for involvement in physical activity research, with emphasis on children experiencing new things, developing character, and increasing social contact (particularly for shy children). Children's enjoyment was also given priority. The provision of child care or financial reward was not considered sufficiently appealing. Increased time commitment or scheduling difficulties were quoted as the most pertinent barriers to involvement (especially for families with several children), but parents commented these could be overcome if the potential value for children was clear. Lessons learned from this work may contribute to the development of effective recruitment and retention strategies for children and their families. Making the wide range of potential benefits clear to families, providing regular feedback, and carefully considering family structure, may prove useful in achieving desired research participation. This may subsequently assist in engaging families in interventions to increase physical activity in children.
ERIC Educational Resources Information Center
Palmer, D. G.
This publication presents an organized collection of biology questions, designed for use in evaluation at the secondary level in Tasmania. Each item has been tried for quality and is accompanied by its difficulty percentage as well as by its content area and the mental processes required to answer it. The content areas include: Diversity,…
Nie, Guangning; Yang, Hongyan; Liu, Jian; Zhao, ChunMei; Wang, Xiaoyun
2017-05-01
The Menopause-Specific Quality-of-Life (MENQOL) questionnaire was developed as a specific tool to measure the health-related quality-of-life of postmenopausal women. Thus far, the Chinese version questionnaire has not been subjected to psychometric assessment with a large sample. This study aims to evaluate the validity and reliability of the Chinese version of the MENQOL specific to postmenopausal women in China. A total of 1,137 menopausal symptomatic and 491 menopausal asymptomatic women from eight cities in China were recruited using a convenience sampling method. Psychometric properties were evaluated by descriptive statistics, validity, and reliability. Reliability was assessed for each subscale of the MENQOL through internal consistency reliability with Cronbach's α and intersubscale correlations. Item-domain correlations, principal components analysis (PCA), and confirmatory factor analysis were performed to determine construct validity. t tests were used to compare the differences between the menopausal symptomatic and asymptomatic women and to evaluate the discriminate validity. Pearson correlation coefficients were calculated between MENQOL scores and the Kupperman index to assess criterion-related validity. The most common symptoms in Chinese menopausal symptomatic women were "experiencing poor memory" (94.4%), "feeling tired or worn out" (93.8%), "aching in muscle and joints" (89.4%), "low backache" (86.9%), "decrease in physical strength" (86.6%), "aches in back of neck or head" (86.2%), "difficulty sleeping" (83.6%), "accomplishing less than I used to" (83.4%), "feeling a lack of energy" (83.3%), "change in your sexual desire" (81%), and "hot flash" (80.7%) among others. The symptoms of "increased facial hair" were rarely seen (9.9%). The vasomotor domain, as well as psychosocial, physical, and sexual domains showed high reliability (Cronbach's α 0.84, 0.87, 0.89, and 0.86, respectively). Item-domain correlation analysis showed that all items correlated more strongly with their own domains than with other domains. In the PCA, after deleting the "increased facial hair" item, items in the vasomotor, sexual, and psychosocial subscales loaded on their respective domains by and large, and items in the physical subscale divided into two factors. The PCA revealed a latent structure of the Chinese version of MENQOL nearly identical to the original MENQOL domains. The confirmatory factor analysis demonstrated that the questionnaire fits well with a four-domain model. The MENQOL can discriminate between menopausal symptomatic women with asymptomatic women as it showed good discriminate validity. Criterion-related validity was confirmed by a significant correlation between MENQOL scores and the Kupperman index. This study showed that Chinese version of MENQOL has good psychometric properties and would be suitable to measure the health-related quality-of-life of Chinese menopausal women except for item 21 (increased facial hair).
How Task Features Impact Evidence from Assessments Embedded in Simulations and Games
ERIC Educational Resources Information Center
Almond, Russell G.; Kim, Yoon Jeon; Velasquez, Gertrudes; Shute, Valerie J.
2014-01-01
One of the key ideas of evidence-centered assessment design (ECD) is that task features can be deliberately manipulated to change the psychometric properties of items. ECD identifies a number of roles that task-feature variables can play, including determining the focus of evidence, guiding form creation, determining item difficulty and…
An Eye-Movement Study of Relational Memory in Adults with Autism Spectrum Disorder
ERIC Educational Resources Information Center
Ring, Melanie; Bowler, Dermot M.; Gaigg, Sebastian B.
2017-01-01
Persons with Autism Spectrum Disorder (ASD) demonstrate good memory for single items but difficulties remembering contextual information related to these items. Recently, we found compromised explicit but intact implicit retrieval of object-location information in ASD (Ring et al. "Autism Res" 8(5):609-619, 2015). Eye-movement data…
Cognitive Complexity in the Remote Association Test--Chinese Version
ERIC Educational Resources Information Center
Hung, Su-Pin; Huang, Po-Sheng; Chen, Hsueh-Chih
2016-01-01
The remote association test (RAT) has been applied in various fields; however, evidence of construct validity for the original version and subsequent extensions of the RAT remains limited. This study aimed to elucidate the dimensionality and the relationship between item features and item difficulties for the RAT--Chinese Version (RAT-C) using the…
Analysis of Open-Ended Statistics Questions with Many Facet Rasch Model
ERIC Educational Resources Information Center
Güler, Nese
2014-01-01
Problem Statement: The most significant disadvantage of open-ended items that allow the valid measurement of upper level cognitive behaviours, such as synthesis and evaluation, is scoring. The difficulty associated with objectively scoring the answers to the items contributes to the reduction of the reliability of the scores. Moreover, other…
Developing and Evaluating a Machine-Scorable, Constrained Constructed-Response Item.
ERIC Educational Resources Information Center
Braun, Henry I.; And Others
The use of constructed response items in large scale standardized testing has been hampered by the costs and difficulties associated with obtaining reliable scores. The advent of expert systems may signal the eventual removal of this impediment. This study investigated the accuracy with which expert systems could score a new, non-multiple choice…
A Comparison between Element Salience versus Context as Item Difficulty Factors in Raven's Matrices
ERIC Educational Resources Information Center
Perez-Salas, Claudia P.; Streiner, David L.; Roberts, Maxwell J.
2012-01-01
The nature of contextual facilitation effects for items derived from Raven's Progressive Matrices was investigated in two experiments. For these, the original matrices were modified, creating either abstract versions with high element salience, or versions which comprised realistic entities set in familiar contexts. In order to replicate and…
An Application of the Rasch Model.
ERIC Educational Resources Information Center
Veitch, William R.
The one parameter latent trait theory of Georg Rasch has two assumptions: that student abilities can be measured on an equal interval scale, and that the success of a student with a given item is a function of student achievement and item difficulty. The grade four Michigan Educational Assessment Program reading test was designed to measure…
Shalev, Anat; Shor, Ron
2016-12-01
Limited research attention has been given to the needs of family caregivers of persons with mental illness in psychiatric hospitals despite the stressors and difficulties they experience. In light of the recognition of the significance of helping family caregivers, a new model of consultation and support centers for family caregivers, called Meital, has been developed. To examine the needs of family caregivers who receive help in Meital, at the Beer Sheva Mental Health Center. Eighty-five family caregivers participated in the research. They completed a structured questionnaire constructed for this research two weeks after they started receiving services from Meital. The questionnaire included four areas of needs for help. These areas examined the extent of the need for help with respect to each of the items in the instrument. The mean of the extent of need for help of the items in the 'information and knowledge' subscale was the highest. Average to high means of the items of the subscales were found in the subscales relating to 'difficulties stemming from the impact of the situation of the person with mental illness on the function of the family caregiver receiving help,' 'on the function of other family members' and 'difficulties coping with the person with mental illness.' The mean of the items of the subscale 'relationships with professionals and informal systems' was the lowest. An examination of the items within the subscales indicated that items relating to the 'impact of the situation of the person with mental illness on the family caregiver who receives help' were ranked higher than the items relating to the 'impact on the function of other family caregivers.' Items relating to 'relationships with professionals' were ranked higher than items relating to 'relationships with informal systems.' This research emphasizes the importance of implementing the family-centered approach, the basis of the Meital Model, in psychiatric institutions. The focus of this approach is on the need for help of family caregivers beyond the help needed for them to function as a resource of help for the ill person. The findings also illuminate the importance of making information and knowledge accessible for family caregivers.
2015-01-01
Purpose: The situational judgment test (SJT) shows promise for assessing the non-cognitive skills of medical school applicants, but has only been used in Europe. Since the admissions processes and education levels of applicants to medical school are different in the United States and in Europe, it is necessary to obtain validity evidence of the SJT based on a sample of United States applicants. Methods: Ninety SJT items were developed and Kane’s validity framework was used to create a test blueprint. A total of 489 applicants selected for assessment/interview day at the University of Utah School of Medicine during the 2014-2015 admissions cycle completed one of five SJTs, which assessed professionalism, coping with pressure, communication, patient focus, and teamwork. Item difficulty, each item’s discrimination index, internal consistency, and the categorization of items by two experts were used to create the test blueprint. Results: The majority of item scores were within an acceptable range of difficulty, as measured by the difficulty index (0.50-0.85) and had fair to good discrimination. However, internal consistency was low for each domain, and 63% of items appeared to assess multiple domains. The concordance of categorization between the two educational experts ranged from 24% to 76% across the five domains. Conclusion: The results of this study will help medical school admissions departments determine how to begin constructing a SJT. Further testing with a more representative sample is needed to determine if the SJT is a useful assessment tool for measuring the non-cognitive skills of medical school applicants. PMID:26582629
ERIC Educational Resources Information Center
Zhang, Dake; Ding, Yi; Stegall, Joanna; Mo, Lei
2012-01-01
Students who struggle with learning mathematics often have difficulties with geometry problem solving, which requires strong visual imagery skills. These difficulties have been correlated with deficiencies in visual working memory. Cognitive psychology has shown that chunking of visual items accommodates students' working memory deficits. This…
ERIC Educational Resources Information Center
Dickey, Wayne C.; Blumberg, Stephen J.
2004-01-01
Objective: The Strengths and Difficulties Questionnaire is a 25-item instrument developed to assess emotional and behavioral problems. The current study attempted to replicate previous European structural analyses and to describe the latent dimensions that underlie responses to the parent-reported version of the Strengths and Difficulties…
Eye Movements Reveal How Task Difficulty Moulds Visual Search
ERIC Educational Resources Information Center
Young, Angela H.; Hulleman, Johan
2013-01-01
In two experiments we investigated the relationship between eye movements and performance in visual search tasks of varying difficulty. Experiment 1 provided evidence that a single process is used for search among static and moving items. Moreover, we estimated the functional visual field (FVF) from the gaze coordinates and found that its size…
Comparison of Difficulties and Reliabilities of Math-Completion and Multiple-Choice Item Formats.
ERIC Educational Resources Information Center
Oosterhof, Albert C.; Coats, Pamela K.
Instructors who develop classroom examinations that require students to provide a numerical response to a mathematical problem are often very concerned about the appropriateness of the multiple-choice format. The present study augments previous research relevant to this concern by comparing the difficulty and reliability of multiple-choice and…
Oude Voshaar, Martijn A H; Ten Klooster, Peter M; Vonkeman, Harald E; van de Laar, Mart A F J
2017-11-01
Traditional patient-reported physical function instruments often poorly differentiate patients with mild-to-moderate disability. We describe the development and psychometric evaluation of a generic item bank for measuring everyday activity limitations in outpatient populations. Seventy-two items generated from patient interviews and mapped to the International Classification of Functioning, Disability and Health (ICF) domestic life chapter were administered to 1128 adults representative of the Dutch population. The partial credit model was fitted to the item responses and evaluated with respect to its assumptions, model fit, and differential item functioning (DIF). Measurement performance of a computerized adaptive testing (CAT) algorithm was compared with the SF-36 physical functioning scale (PF-10). A final bank of 41 items was developed. All items demonstrated acceptable fit to the partial credit model and measurement invariance across age, sex, and educational level. Five- and ten-item CAT simulations were shown to have high measurement precision, which exceeded that of SF-36 physical functioning scale across the physical function continuum. Floor effects were absent for a 10-item empirical CAT simulation, and ceiling effects were low (13.5%) compared with SF-36 physical functioning (38.1%). CAT also discriminated better than SF-36 physical functioning between age groups, number of chronic conditions, and respondents with or without rheumatic conditions. The Rasch assessment of everyday activity limitations (REAL) item bank will hopefully prove a useful instrument for assessing everyday activity limitations. T-scores obtained using derived measures can be used to benchmark physical function outcomes against the general Dutch adult population.
Belief-bias reasoning in non-clinical delusion-prone individuals.
Anandakumar, T; Connaughton, E; Coltheart, M; Langdon, R
2017-03-01
It has been proposed that people with delusions have difficulty inhibiting beliefs (i.e., "doxastic inhibition") so as to reason about them as if they might not be true. We used a continuity approach to test this proposal in non-clinical adults scoring high and low in psychometrically assessed delusion-proneness. High delusion-prone individuals were expected to show greater difficulty than low delusion-prone individuals on "conflict" items of a "belief-bias" reasoning task (i.e. when required to reason logically about statements that conflicted with reality), but not on "non-conflict" items. Twenty high delusion-prone and twenty low delusion-prone participants (according to the Peters et al. Delusions Inventory) completed a belief-bias reasoning task and tests of IQ, working memory and general inhibition (Excluded Letter Fluency, Stroop and Hayling Sentence Completion). High delusion-prone individuals showed greater difficulty than low delusion-prone individuals on the Stroop and Excluded Letter Fluency tests of inhibition, but no greater difficulty on the conflict versus non-conflict items of the belief-bias task. They did, however, make significantly more errors overall on the belief-bias task, despite controlling for IQ, working memory and general inhibitory control. The study had a relatively small sample size and used non-clinical participants to test a theory of cognitive processing in individuals with clinically diagnosed delusions. Results failed to support a role for doxastic inhibitory failure in non-clinical delusion-prone individuals. These individuals did, however, show difficulty with conditional reasoning about statements that may or may not conflict with reality, independent of any general cognitive or inhibitory deficits. Copyright © 2016 Elsevier Ltd. All rights reserved.
Belief-bias reasoning in non-clinical delusion-prone individuals.
Anandakumar, T; Connaughton, E; Coltheart, M; Langdon, R
2017-09-01
It has been proposed that people with delusions have difficulty inhibiting beliefs (i.e., "doxastic inhibition") so as to reason about them as if they might not be true. We used a continuity approach to test this proposal in non-clinical adults scoring high and low in psychometrically assessed delusion-proneness. High delusion-prone individuals were expected to show greater difficulty than low delusion-prone individuals on "conflict" items of a "belief-bias" reasoning task (i.e. when required to reason logically about statements that conflicted with reality), but not on "non-conflict" items. Twenty high delusion-prone and twenty low delusion-prone participants (according to the Peters et al. Delusions Inventory) completed a belief-bias reasoning task and tests of IQ, working memory and general inhibition (Excluded Letter Fluency, Stroop and Hayling Sentence Completion). High delusion-prone individuals showed greater difficulty than low delusion-prone individuals on the Stroop and Excluded Letter Fluency tests of inhibition, but no greater difficulty on the conflict versus non-conflict items of the belief-bias task. They did, however, make significantly more errors overall on the belief-bias task, despite controlling for IQ, working memory and general inhibitory control. The study had a relatively small sample size and used non-clinical participants to test a theory of cognitive processing in individuals with clinically diagnosed delusions. Results failed to support a role for doxastic inhibitory failure in non-clinical delusion-prone individuals. These individuals did, however, show difficulty with conditional reasoning about statements that may or may not conflict with reality, independent of any general cognitive or inhibitory deficits. Copyright © 2016 Elsevier Ltd. All rights reserved.
Cross-cultural comparisons of the Mini-mental State Examination between Japanese and U.S. cohorts
Meguro, Kenichi; Ishii, Hiroshi; Yamaguchi, Satoshi; Saxton, Judith A.; Ganguli, Mary
2009-01-01
Background The Mini-mental State Examination (MMSE) is widely used in Japan and the U.S.A. for cognitive screening in the clinical setting and in epidemiological studies. A previous Japanese community study reported distributions of the MMSE total score very similar to that of the U.S.A. Methods Data were obtained from the Monongahela Valley Independent Elder's Study (MoVIES), a representative sample of community-dwelling elderly people aged 65 and older living near Pittsburgh, U.S.A., and from the Tajiri Project, with similar aims in Tajiri, Japan. We examined item-by-item distributions of the MMSE between two cohorts, comparing (1) percentage of correct answers for each item within each cohort, and (2) relative difficulty of each item measured by Item Characteristic Curve analysis (ICC), which estimates log odds of obtaining a correct answer adjusted for the remaining MMSE items, demographic variables (age, gender, education) and interactions of demographic variables and cohort. Results Median MMSE scores were very similar between the two samples within the same education groups. However, the relative difficulty of each item differed substantially between the two cohorts. Specifically, recall and auditory comprehension were easier for the Tajiri group, but reading comprehension and sentence construction were easier for the MoVIES group. Conclusions Our results reaffirm the importance of validation and examination of thresholds in each cohort to be studied when a common instrument is used as a dementia screening tool or for defining cognitive impairment. PMID:18925977
Braun, J
1994-02-01
In more than one respect, visual search for the most salient or the least salient item in a display are different kinds of visual tasks. The present work investigated whether this difference is primarily one of perceptual difficulty, or whether it is more fundamental and relates to visual attention. Display items of different salience were produced by varying either size, contrast, color saturation, or pattern. Perceptual masking was employed and, on average, mask onset was delayed longer in search for the least salient item than in search for the most salient item. As a result, the two types of visual search presented comparable perceptual difficulty, as judged by psychophysical measures of performance, effective stimulus contrast, and stability of decision criterion. To investigate the role of attention in the two types of search, observers attempted to carry out a letter discrimination and a search task concurrently. To discriminate the letters, observers had to direct visual attention at the center of the display and, thus, leave unattended the periphery, which contained target and distractors of the search task. In this situation, visual search for the least salient item was severely impaired while visual search for the most salient item was only moderately affected, demonstrating a fundamental difference with respect to visual attention. A qualitatively identical pattern of results was encountered by Schiller and Lee (1991), who used similar visual search tasks to assess the effect of a lesion in extrastriate area V4 of the macaque.
Wang, Xiaoli; Xuan, Yifu; Jarrold, Christopher
2016-01-01
Previous studies have examined whether difficulties in short-term memory for verbal information, that might be associated with dyslexia, are driven by problems in retaining either information about to-be-remembered items or the order in which these items were presented. However, such studies have not used process-pure measures of short-term memory for item or order information. In this work we adapt a process dissociation procedure to properly distinguish the contributions of item and order processes to verbal short-term memory in a group of 28 adults with a self-reported diagnosis of dyslexia and a comparison sample of 29 adults without a dyslexia diagnosis. In contrast to previous work that has suggested that individuals with dyslexia experience item deficits resulting from inefficient phonological representation and language-independent order memory deficits, the results showed no evidence of specific problems in short-term retention of either item or order information among the individuals with a self-reported diagnosis of dyslexia, despite this group showing expected difficulties on separate measures of word and non-word reading. However, there was some suggestive evidence of a link between order memory for verbal material and individual differences in non-word reading, consistent with other claims for a role of order memory in phonologically mediated reading. The data from the current study therefore provide empirical evidence to question the extent to which item and order short-term memory are necessarily impaired in dyslexia. PMID:26941679
Wang, Xiaoli; Xuan, Yifu; Jarrold, Christopher
2016-01-01
Previous studies have examined whether difficulties in short-term memory for verbal information, that might be associated with dyslexia, are driven by problems in retaining either information about to-be-remembered items or the order in which these items were presented. However, such studies have not used process-pure measures of short-term memory for item or order information. In this work we adapt a process dissociation procedure to properly distinguish the contributions of item and order processes to verbal short-term memory in a group of 28 adults with a self-reported diagnosis of dyslexia and a comparison sample of 29 adults without a dyslexia diagnosis. In contrast to previous work that has suggested that individuals with dyslexia experience item deficits resulting from inefficient phonological representation and language-independent order memory deficits, the results showed no evidence of specific problems in short-term retention of either item or order information among the individuals with a self-reported diagnosis of dyslexia, despite this group showing expected difficulties on separate measures of word and non-word reading. However, there was some suggestive evidence of a link between order memory for verbal material and individual differences in non-word reading, consistent with other claims for a role of order memory in phonologically mediated reading. The data from the current study therefore provide empirical evidence to question the extent to which item and order short-term memory are necessarily impaired in dyslexia.
Vélez, Claudia Marcela; Villada Ramírez, Adriana C; Arias, Ana Carolina Amaya; Eslava-Schmalbach, Javier H
2016-01-01
The aim of this study was to validate the PedsQL 4.0™ in Colombian children and adolescents using the Rasch model. The Paediatric Quality of Life Inventory (PedsQL 4.0™) has demonstrated to be a reliable and sensitive measurement to changes in health status, as well as being quick and easy to use. Validation study of measurement tools. The PedsQL 4.0™ was applied to a convenience sample of 375 children and adolescents between 5 and 17 years old and 500 caregivers of children between 2 and 18 years old in five Colombian cities. The psychometric properties were analysed according to the Rasch model, including adjustment, separation, and differential item functioning (DIF). The Rasch model provided adequate fits to data. The social dimension, for both versions, had greater difficulty than the physical health dimension. Internal consistency for the items was observed, while for individuals, the values of reliability and separation were lower than that established. The DIF occurred in very few variables, especially when comparing cities. The characteristic curves for the items presented disordered thresholds. The items had adequate internal consistency. Analysis showed adequate individual separation, but disordered thresholds were found in the response categories. No DIF was observed by sex or disease, but it is noteworthy that the DIF occurred between cities. Copyright © 2016 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.
Paz, Sylvia H.; Jones, Loretta; Calderón, José L.; Hays, Ron D.
2016-01-01
Background Depression and physical function are especially important health domains for the elderly. The Geriatric Depression Scale (GDS) and the Patient-Reported Outcomes Measurement Information System (PROMIS®) Physical Function Item Bank are two surveys commonly used to measure these domains. It is unclear if these two instruments adequately measure these aspects of health in minority elderly. Objective To estimate the readability of the GDS and PROMIS® Physical Function items and to assess their comprehensibility by a sample of African American and Latino elderly. Methods Readability was estimated using the Flesch-Kincaid (F-K) and Flesch-Reading-Ease (FRE) formulae for English versions, and a Spanish adaptation of the FRE formula for the Spanish versions. Comprehension of the GDS and PROMIS items by minority elderly was evaluated with 30 cognitive interviews. Results Readability estimates of a number of items in English and Spanish of the GDS and PROMIS physical functioning items exceed the recommended 5th grade level, or were rated as fairly difficult, difficult, or very difficult to read. Cognitive interviews revealed that many participants felt that more than the two (yes/no) GDS response options were needed to answer the questions. Wording of several PROMIS items was considered confusing and responses potentially uninterpretable because they were based on physical aids. Conclusions Problems with item wording and response options of the GDS and PROMIS Physical Function items may negatively affect reliability and validity of measurement when used with minority elderly. PMID:27599978
ITEM ANALYSIS OF THREE SPANISH NAMING TESTS: A CROSS-CULTURAL INVESTIGATION
de la Plata, Carlos Marquez; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C. Munro
2009-01-01
Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test’s construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (126 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided. PMID:19208960
Hoffman, D L; Dukes, E M
2008-01-01
Objective The current review describes how the health status profile of people with fibromyalgia (FM) compares to that of people in the general population and patients with other health conditions. Methods A review of 37 studies of FM that measured health status with the 36-item Medical Outcomes Study Short-Form Health Survey (SF-36) or the 12-item Short-Form Health Survey (SF-12). Results Studies performed worldwide showed that FM groups were significantly more impaired than people in the general population on all eight health status domains assessed. These domains include physical functioning, role functioning difficulties caused by physical problems, bodily pain, general health, vitality (energy vs. fatigue), social functioning, role functioning difficulties caused by emotional problems and mental health. FM groups had mental health summary scores that fell 1 standard deviation (SD) below the general population mean, and physical health summary scores that fell 2 SD below the general population mean. FM groups also had a poorer overall health status compared to those with other specific pain conditions. FM groups had similar or significantly lower (poorer) physical and mental health status scores compared to those with rheumatoid arthritis, osteoarthritis, osteoporosis, systemic lupus erythematosus, myofacial pain syndrome, primary Sjögren's syndrome and others. FM groups scored significantly lower than the pain condition groups mentioned above on domains of bodily pain and vitality. Health status impairments in pain and vitality are consistent with core features of FM. Conclusions People with FM had an overall health status burden that was greater in magnitude compared to people with other specific pain conditions that are widely accepted as impairing. Review Criteria Studies in this review were identified through a search of electronic databases (MEDLINE: 1990–2006; EMBASE: 1990–2006). Search terms included: ‘fibromyalgia’, ‘health status’, ‘quality of life’, ‘SF-36’ and ‘SF-12’. Reference lists from published articles were also searched. Studies were selected if they were published in the English language between 1990 and (March) 2006 and assessed health status with a validated version of the SF-36 or the SF-12. Message for the Clinic Although FM is a controversial construct, studies performed worldwide showed that the health status profile of people with FM was remarkably consistent. People with FM had significant impairments in both mental and physical health status domains. People with FM had a poorer overall health status than people with specific pain conditions that are widely accepted as impairing. PMID:18039330
Student Questionnaire. [Harvard Project Physics
ERIC Educational Resources Information Center
Welch, Wayne W.; Ahlgren, Andrew
This 60-item questionnaire was designed to gather general background information from students who had used the Harvard Project Physics curriculum. The instrument includes three 20-item subscales: (1) attitude toward physics, (2) career interest, and (3) student characteristics. Items are multiple choice (5 options), and the introductory material…
Item-focussed Trees for the Identification of Items in Differential Item Functioning.
Tutz, Gerhard; Berger, Moritz
2016-09-01
A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.
Busija, L; Buchbinder, R; Osborne, R H
2016-08-01
This study reports the development of the OsteoArthritis Questionnaire (OA-Quest) - a new measure designed to comprehensively capture the potentially modifiable burden of osteoarthritis. Item development was guided by the a priori conceptual framework of the Personal Burden of Osteoarthritis (PBO) which captures 8 dimensions of osteoarthritis burden (Physical distress, Fatigue, Physical limitations, Psychosocial distress, Physical de-conditioning, Financial hardship, Sleep disturbances, Lost productivity). One hundred and twenty three candidate items were pretested in a clinical sample of 18 osteoarthritis patients. The measurement properties of the OA-Quest were assessed with exploratory factor analysis (EFA), Rasch modelling, and confirmatory factor analysis (CFA) in a community-based sample (n = 792). EFA replicated 7 of the 8 PBO domains. An exception was PBO Fatigue domain, with items merging into the Physical distress subscale in the OA-Quest. Following item analysis, a 42-item 7-subscale questionnaire was constructed, measuring Physical distress (seven items, Cronbach's α = 0.93), Physical limitations (11 items, α = 0.95), Psychosocial distress (seven items, α = 0.93), Physical de-conditioning (four items, α = 0.87), Financial hardship (four items, α = 0.93), Sleep disturbances (five items, α = 0.96), and Lost productivity (four items α = 0.90). A highly restricted 7-factor CFA model had excellent fit with the data (χ(2)(113) = 316.36, P < 0.001; chi-square/degrees of freedom = 2.8; comparative fit index [CFI] = 0.97; root mean square error of approximation [RMSEA] = 0.07), supporting construct validity of the new measure. The OA-Quest is a new measure of osteoarthritis burden that is founded on a comprehensive conceptual model. It has strong evidence of construct validity and provides reliable measurement across a broad range of osteoarthritis burden. Copyright © 2016 Osteoarthritis Research Society International. Published by Elsevier Ltd. All rights reserved.
NON-SPECIFIC SYMPTOMS AND SCREENING OF NON-PSYCHOTIC MORBIDITY IN PRIMARY CARE1
Srinivasan, T.N.; Suresh, T.R.
1990-01-01
SUMMARY Much of the non-psychotic mental morbidity in primary care goes undetected by the primary care health personnel. This is often because of the non-specific somatic nature of the presenting complaints of these patients and the difficulty on the part of the primary care physician to elicit specific emotional symptoms to screen psychiatric problems. This paper describes the development of the 7-item Primary care Psychiatric Questionnaire (PPQ.) which, by requiring to elicit only the non-specific symptoms, could overcome this practical difficulty. This new screening method has been standardised against the Self Report Questionaaire—20-item version which is commonly used in primary care. PMID:21927432
Anxiety and depression in chronic hemodialysis: some somatopsychic determinants.
Jadoulle, V; Hoyois, P; Jadoul, M
2005-02-01
Depression and anxiety are so common in hemodialysis (HD) patients that we found it useful to study the respective contributions of the subjective somatic sensations and of the objective medical comorbidity to psychological distress. We also hypothesized that denial has a protective effect against anxiety and depression, and that alexithymia is, on the contrary, a risk factor. In a cross-sectional design, we investigated relationships between psychological distress and somatic complaints, Charlson comorbidity index, denial and alexithymia, in a group of 54 patients on incenter HD. They filled psychometric self-rated questionnaires in (State Anxiety Inventory, Hospital Anxiety and Depression Scale, 13-item Short Beck Depression Inventory, Kidney Disease Quality of Life Short Form, 20-item Toronto Alexithymia Scale). A principal component analysis allowed us to focus on HADS-total score, which was confirmed to be representative of anxio-depression. Then, correlational analyses and a stepwise regression analysis were performed. HADS-total score is inversely associated with the use of denial as a psychological defence mechanism (p < 0.001), and positively correlated with difficulties in identifying emotions (p < 0.001), with difficulties in expressing feelings (p < 0.05), and with the intensity of subjective somatic complaints (p < 0.001). On the contrary, it is not related to the somatic comorbidity. In the stepwise regression, the somatic complaints, the denial and the difficulties in recognizing emotions emerge as the three main variables related to the HADS-total score (p < 0.001). Subjective physical complaints are here associated with psychological distress in chronic HD patients, while objective organic comorbidity does not seem to influence their mood and anxiety status. Denial is an efficient coping style against negative emotions, but it can diminish compliance. So, the subjective perception of the disease seems to have an important impact on the anxiety and mood levels, which can also be influenced by the emotional regulation abilities.
Rodríguez-Díez, María Cristina; Alegre, Manuel; Díez, Nieves; Arbea, Leire; Ferrer, Marta
2016-02-03
The main factor that determines the selection of a medical specialty in Spain after obtaining a medical degree is the MIR ("médico interno residente", internal medical resident) exam. This exam consists of 235 multiple-choice questions with five options, some of which include images provided in a separate booklet. The aim of this study was to analyze the technical quality of the multiple-choice questions included in the MIR exam over the last five years. All the questions included in the exams from 2009 to 2013 were analyzed. We studied the proportion of questions including clinical vignettes, the number of items related to an image and the presence of technical flaws in the questions. For the analysis of technical flaws, we adapted the National Board of Medical Examiners (NBME) guidelines. We looked for 18 different issues included in the manual, grouped into two categories: issues related to testwiseness and issues related to irrelevant difficulties. The final number of questions analyzed was 1,143. The percentage of items based on clinical vignettes increased from 50% in 2009 to 56-58% in the following years (2010-2013). The percentage of items based on an image increased progressively from 10% in 2009 to 15% in 2012 and 2013. The percentage of items with at least one technical flaw varied between 68 and 72%. We observed a decrease in the percentage of items with flaws related to testwiseness, from 30% in 2009 to 20% in 2012 and 2013. While most of these issues decreased dramatically or even disappeared (such as the imbalance in the correct option numbers), the presence of non-plausible options remained frequent. With regard to technical flaws related to irrelevant difficulties, no improvement was observed; this is especially true with respect to negative stem questions and "hinged" questions. The formal quality of the MIR exam items has improved over the last five years with regard to testwiseness. A more detailed revision of the items submitted, checking systematically for the presence of technical flaws, could improve the validity and discriminatory power of the exam, without increasing its difficulty.
Refining a self-assessment of informatics competency scale using Mokken scaling analysis.
Yoon, Sunmoo; Shaffer, Jonathan A; Bakken, Suzanne
2015-01-01
Healthcare environments are increasingly implementing health information technology (HIT) and those from various professions must be competent to use HIT in meaningful ways. In addition, HIT has been shown to enable interprofessional approaches to health care. The purpose of this article is to describe the refinement of the Self-Assessment of Nursing Informatics Competencies Scale (SANICS) using analytic techniques based upon item response theory (IRT) and discuss its relevance to interprofessional education and practice. In a sample of 604 nursing students, the 93-item version of SANICS was examined using non-parametric IRT. The iterative modeling procedure included 31 steps comprising: (1) assessing scalability, (2) assessing monotonicity, (3) assessing invariant item ordering, and (4) expert input. SANICS was reduced to an 18-item hierarchical scale with excellent reliability. Fundamental skills for team functioning and shared decision making among team members (e.g. "using monitoring systems appropriately," "describing general systems to support clinical care") had the highest level of difficulty, and "demonstrating basic technology skills" had the lowest difficulty level. Most items reflect informatics competencies relevant to all health professionals. Further, the approaches can be applied to construct a new hierarchical scale or refine an existing scale related to informatics attitudes or competencies for various health professions.
Haggerty, Jeannie L.; Bouharaoui, Fatima; Santor, Darcy A.
2011-01-01
Evaluating the extent to which groups or subgroups of individuals differ with respect to primary healthcare experience depends on first ruling out the possibility of bias. Objective: To determine whether item or subscale performance differs systematically between French/English, high/low education subgroups and urban/rural residency. Method: A sample of 645 adult users balanced by French/English language (in Quebec and Nova Scotia, respectively), high/low education and urban/rural residency responded to six validated instruments: the Primary Care Assessment Survey (PCAS); the Primary Care Assessment Tool – Short Form (PCAT-S); the Components of Primary Care Index (CPCI); the first version of the EUROPEP (EUROPEP-I); the Interpersonal Processes of Care Survey, version II (IPC-II); and part of the Veterans Affairs National Outpatient Customer Satisfaction Survey (VANOCSS). We normalized subscale scores to a 0-to-10 scale and tested for between-group differences using ANOVA tests. We used a parametric item response model to test for differences between subgroups in item discriminability and item difficulty. We re-examined group differences after removing items with differential item functioning. Results: Experience of care was assessed more positively in the English-speaking (Nova Scotia) than in the French-speaking (Quebec) respondents. We found differential English/French item functioning in 48% of the 153 items: discriminability in 20% and differential difficulty in 28%. English items were more discriminating generally than the French. Removing problematic items did not change the differences in French/English assessments. Differential item functioning by high/low education status affected 27% of items, with items being generally more discriminating in high-education groups. Between-group comparisons were unchanged. In contrast, only 9% of items showed differential item functioning by geography, affecting principally the accessibility attribute. Removing problematic items reversed a previously non-significant finding, revealing poorer first-contact access in rural than in urban areas. Conclusion: Differential item functioning does not bias or invalidate French/English comparisons on subscales, but additional development is required to make French and English items equivalent. These instruments are relatively robust by educational status and geography, but results suggest potential differences in the underlying construct in low-education and rural respondents. PMID:23205035
Renormalization Group Theory, the Epsilon Expansion and Ken Wilson as I knew Him
NASA Astrophysics Data System (ADS)
Fisher, Michael E.
The tasks posed for renormalization group theory (RGT) within statistical physics by critical phenomena theory in the 1960's are set out briefly in contradistinction to quantum field theory (QFT), which was the origin for Ken Wilson's concerns. Kadanoff's 1966 block spin scaling picture and its difficulties are presented;Wilson's early vision of flows is described from the author's perspective. How Wilson's subsequent breakthrough ideas, published in 1971, led to the epsilon expansion and the resulting clarity is related. Concluding sections complete the general picture of flows in a space of Hamiltonians, universality and scaling. The article represents a 40% condensation (but with added items) of an earlier account: Rev. Mod. Phys. 70, 653-681 (1998).
Tong, Fang; Fu, Tong
2013-01-01
Objective To evaluate the differences in fluid intelligence tests between normal children and children with learning difficulties in China. Method PubMed, MD Consult, and other Chinese Journal Database were searched from their establishment to November 2012. After finding comparative studies of Raven measurements of normal children and children with learning difficulties, full Intelligent Quotation (FIQ) values and the original values of the sub-measurement were extracted. The corresponding effect model was selected based on the results of heterogeneity and parallel sub-group analysis was performed. Results Twelve documents were included in the meta-analysis, and the studies were all performed in mainland of China. Among these, two studies were performed at child health clinics, the other ten sites were schools and control children were schoolmates or classmates. FIQ was evaluated using a random effects model. WMD was −13.18 (95% CI: −16.50–−9.85). Children with learning difficulties showed significantly lower FIQ scores than controls (P<0.00001); Type of learning difficulty and gender differences were evaluated using a fixed-effects model (I2 = 0%). The sites and purposes of the studies evaluated here were taken into account, but the reasons of heterogeneity could not be eliminated; The sum IQ of all the subgroups showed considerable heterogeneity (I2 = 76.5%). The sub-measurement score of document A showed moderate heterogeneity among all documents, and AB, B, and E showed considerable heterogeneity, which was used in a random effect model. Individuals with learning difficulties showed heterogeneity as well. There was a moderate delay in the first three items (−0.5 to −0.9), and a much more pronounced delay in the latter three items (−1.4 to −1.6). Conclusion In the Chinese mainland, the level of fluid intelligence of children with learning difficulties was lower than that of normal children. Delayed development in sub-items of C, D, and E was more obvious. PMID:24236016
The Development of a Post Separation/Post Divorce Problems and Stress Scale.
ERIC Educational Resources Information Center
Raschke, Helen J.
Factors associated with the speed and level of difficulty with which individuals adjust to separation and divorce were investigated. A scale was developed to analyze these factors, and included items dealing with the subdimensions of stress and the perception of the persons involved. Factor analysis of the scale items as well as additional tests…
ERIC Educational Resources Information Center
Trace, Jonathan; Brown, James Dean; Janssen, Gerriet; Kozhevnikova, Liudmila
2017-01-01
Cloze tests have been the subject of numerous studies regarding their function and use in both first language and second language contexts (e.g., Jonz & Oller, 1994; Watanabe & Koyama, 2008). From a validity standpoint, one area of investigation has been the extent to which cloze tests measure reading ability beyond the sentence level.…
Some Considerations on the Partial Credit Model
ERIC Educational Resources Information Center
Verhelst, N. D.; Verstralen, H. H. F. M.
2008-01-01
The Partial Credit Model (PCM) is sometimes interpreted as a model for stepwise solution of polytomously scored items, where the item parameters are interpreted as difficulties of the steps. It is argued that this interpretation is not justified. A model for stepwise solution is discussed. It is shown that the PCM is suited to model sums of binary…
Language Effects in International Testing: The Case of PISA 2006 Science Items
ERIC Educational Resources Information Center
El Masri, Yasmine H.; Baird, Jo-Anne; Graesser, Art
2016-01-01
We investigate the extent to which language versions (English, French and Arabic) of the same science test are comparable in terms of item difficulty and demands. We argue that language is an inextricable part of the scientific literacy construct, be it intended or not by the examiner. This argument has considerable implications on methodologies…
Pick-N Multiple Choice-Exams: A Comparison of Scoring Algorithms
ERIC Educational Resources Information Center
Bauer, Daniel; Holzer, Matthias; Kopp, Veronika; Fischer, Martin R.
2011-01-01
To compare different scoring algorithms for Pick-N multiple correct answer multiple-choice (MC) exams regarding test reliability, student performance, total item discrimination and item difficulty. Data from six 3rd year medical students' end of term exams in internal medicine from 2005 to 2008 at Munich University were analysed (1,255 students,…
Item Mass and Complexity and the Arithmetic Computation of Students with Learning Disabilities.
ERIC Educational Resources Information Center
Cawley, John F.; Shepard, Teri; Smith, Maureen; Parmar, Rene S.
1997-01-01
The performance of 76 students (ages 10 to 15) with learning disabilities on four tasks of arithmetic computation within each of the four basic operations was examined. Tasks varied in difficulty level and number of strokes needed to complete all items. Intercorrelations between task sets and operations were examined as was the use of…
The Golden Rule Agreement is Psychometrically Defensible.
ERIC Educational Resources Information Center
Gonzalez-Tamayo, Eulogio
The agreement between the Educational Testing Service (ETS) and the Golden Rule Insurance Company of Illinois is interpreted as setting the general principles on which items must be selected to be included in a licensure test. These principles put a limit to the difficulty level of any item, and they also limit the size of the difference in…
ERIC Educational Resources Information Center
Dai, Yunyun
2013-01-01
Mixtures of item response theory (IRT) models have been proposed as a technique to explore response patterns in test data related to cognitive strategies, instructional sensitivity, and differential item functioning (DIF). Estimation proves challenging due to difficulties in identification and questions of effect size needed to recover underlying…
Generic ABILHAND Questionnaire Can Measure Manual Ability across a Variety of Motor Impairments
ERIC Educational Resources Information Center
Simone, Anna; Rota, Viviana; Tesio, Luigi; Perucca, Laura
2011-01-01
ABILHAND is, in its original version, a 46-item, 4-level questionnaire. It measures the difficulty perceived by patients with rheumatoid arthritis as they do various daily manual tasks. ABILHAND was originally built through Rasch analysis. In a later study, it was simplified to a generic 23-item, three-level questionnaire, showing both…
Solving Graphics Problems: Student Performance in Junior Grades
ERIC Educational Resources Information Center
Lowrie, Tom; Diezmann, Carmel M.
2007-01-01
The authors investigated the performance of 172 Grade 4 students (9 to 10 years) over 12 months on a 36-item test that comprised items from 6 distinct graphical languages (e.g., maps) commonly used to convey mathematical information. Results revealed (a) difficulties in Grade 4 students' capacity to decode a variety of graphics, (b) significant…
ERIC Educational Resources Information Center
Sweller, Naomi
2015-01-01
Individuals with autism have difficulty generalising information from one situation to another, a process that requires the learning of categories and concepts. Category information may be learned through: (1) classifying items into categories, or (2) predicting missing features of category items. Predicting missing features has to this point been…
HIV/AIDS knowledge among men who have sex with men: applying the item response theory.
Gomes, Raquel Regina de Freitas Magalhães; Batista, José Rodrigues; Ceccato, Maria das Graças Braga; Kerr, Lígia Regina Franco Sansigolo; Guimarães, Mark Drew Crosland
2014-04-01
To evaluate the level of HIV/AIDS knowledge among men who have sex with men in Brazil using the latent trait model estimated by Item Response Theory. Multicenter, cross-sectional study, carried out in ten Brazilian cities between 2008 and 2009. Adult men who have sex with men were recruited (n = 3,746) through Respondent Driven Sampling. HIV/AIDS knowledge was ascertained through ten statements by face-to-face interview and latent scores were obtained through two-parameter logistic modeling (difficulty and discrimination) using Item Response Theory. Differential item functioning was used to examine each item characteristic curve by age and schooling. Overall, the HIV/AIDS knowledge scores using Item Response Theory did not exceed 6.0 (scale 0-10), with mean and median values of 5.0 (SD = 0.9) and 5.3, respectively, with 40.7% of the sample with knowledge levels below the average. Some beliefs still exist in this population regarding the transmission of the virus by insect bites, by using public restrooms, and by sharing utensils during meals. With regard to the difficulty and discrimination parameters, eight items were located below the mean of the scale and were considered very easy, and four items presented very low discrimination parameter (< 0.34). The absence of difficult items contributed to the inaccuracy of the measurement of knowledge among those with median level and above. Item Response Theory analysis, which focuses on the individual properties of each item, allows measures to be obtained that do not vary or depend on the questionnaire, which provides better ascertainment and accuracy of knowledge scores. Valid and reliable scales are essential for monitoring HIV/AIDS knowledge among the men who have sex with men population over time and in different geographic regions, and this psychometric model brings this advantage.
ERIC Educational Resources Information Center
Kersten, Paula; Czuba, Karol; McPherson, Kathryn; Dudley, Margaret; Elder, Hinemoa; Tauroa, Robyn; Vandal, Alain
2016-01-01
This article synthesized evidence for the validity and reliability of the Strengths and Difficulties Questionnaire in children aged 3-5 years. A systematic review using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement guidelines was carried out. Study quality was rated using the Consensus-based Standards for the…
ERIC Educational Resources Information Center
Keller, Johannes
2007-01-01
Background: Stereotype threat research revealed that negative stereotypes can disrupt the performance of persons targeted by such stereotypes. This paper contributes to stereotype threat research by providing evidence that domain identification and the difficulty level of test items moderate stereotype threat effects on female students' maths…
Paz, Sylvia H; Jones, Loretta; Calderón, José L; Hays, Ron D
2017-02-01
Depression and physical function are particularly important health domains for the elderly. The Geriatric Depression Scale (GDS) and the Patient-Reported Outcomes Measurement Information System (PROMIS ® ) physical function item bank are two surveys commonly used to measure these domains. It is unclear if these two instruments adequately measure these aspects of health in minority elderly. The aim of this study was to estimate the readability of the GDS and PROMIS ® physical function items and to assess their comprehensibility using a sample of African American and Latino elderly. Readability was estimated using the Flesch-Kincaid and Flesch Reading Ease (FRE) formulae for English versions, and a Spanish adaptation of the FRE formula for the Spanish versions. Comprehension of the GDS and PROMIS ® items by minority elderly was evaluated with 30 cognitive interviews. Readability estimates of a number of items in English and Spanish of the GDS and PROMIS ® physical functioning items exceed the U.S. recommended 5th-grade threshold for vulnerable populations, or were rated as 'fairly difficult', 'difficult', or 'very difficult' to read. Cognitive interviews revealed that many participants felt that more than the two (yes/no) GDS response options were needed to answer the questions. Wording of several PROMIS ® items was considered confusing, and interpreting responses was problematic because they were based on using physical aids. Problems with item wording and response options of the GDS and PROMIS ® physical function items may reduce reliability and validity of measurement when used with minority elderly.
An analysis of the masking of speech by competing speech using self-report data.
Agus, Trevor R; Akeroyd, Michael A; Noble, William; Bhullar, Navjot
2009-01-01
Many of the items in the "Speech, Spatial, and Qualities of Hearing" scale questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol. 43, 85-99 (2004)] are concerned with speech understanding in a variety of backgrounds, both speech and nonspeech. To study if this self-report data reflected informational masking, previously collected data on 414 people were analyzed. The lowest scores (greatest difficulties) were found for the two items in which there were two speech targets, with successively higher scores for competing speech (six items), energetic masking (one item), and no masking (three items). The results suggest significant masking by competing speech in everyday listening situations.
Yao, Shih-Ying; Bull, Rebecca; Khng, Kiat Hui; Rahim, Anisa
2018-01-01
Understanding a child's ability to decode emotion expressions is important to allow early interventions for potential difficulties in social and emotional functioning. This study applied the Rasch model to investigate the psychometric properties of the NEPSY-II Affect Recognition subtest, a U.S. normed measure for 3-16 year olds which assesses the ability to recognize facial expressions of emotion. Data were collected from 1222 children attending preschools in Singapore. We first performed the Rasch analysis with the raw item data, and examined the technical qualities and difficulty pattern of the studied items. We subsequently investigated the relation of the estimated affect recognition ability from the Rasch analysis to a teacher-reported measure of a child's behaviors, emotions, and relationships. Potential gender differences were also examined. The Rasch model fits our data well. Also, the NEPSY-II Affect Recognition subtest was found to have reasonable technical qualities, expected item difficulty pattern, and desired association with the external measure of children's behaviors, emotions, and relationships for both boys and girls. Overall, findings from this study suggest that the NEPSY-II Affect Recognition subtest is a promising measure of young children's affect recognition ability. Suggestions for future test improvement and research were discussed.
Pre-Service Physics Teachers' Comprehension of Quantum Mechanical Concepts
ERIC Educational Resources Information Center
Didis, Nilufer; Eryilmaz, Ali; Erkoc, Sakir
2010-01-01
When quantum theory caused a paradigm shift in physics, it introduced difficulties in both learning and teaching of physics. Because of its abstract, counter-intuitive and mathematical structure, students have difficulty in learning this theory, and instructors have difficulty in teaching the concepts of the theory. This case study investigates…
Chang, Kwang-Hwa; Liao, Hua-Fang; Yen, Chia-Fan; Hwang, Ai-Wen; Chi, Wen-Chou; Escorpizo, Reuben; Liou, Tsan-Hon
2015-01-01
To explore the association between muscle power impairment and each World Health Organization Disability Assessment Schedule second edition (WHODAS 2.0) domain score among subjects with physical disability. Subjects (≥ 60 years) with physical disability related to neurological diseases, including 730 subjects with brain disease (BD) and 126 subjects with non-BD, were enrolled from a data bank of persons with disabilities from 1 July 2011 to 29 February 2012. Standardized WHODAS 2.0 scores ranging from 0 (least difficulty) to 100 (greatest difficulty) points were calculated for each domain. More than 50% of subjects with physical disability had the greatest difficulty in household activities and mobility. Muscle power impairment (adjusted odds ratios range among domains, 2.75-376.42, p < 0.001), age (1.38-4.81, p < 0.05), and speech impairment (1.94-5.80, p < 0.05) were associated with BD subjects experiencing the greatest difficulty in most WHODAS 2.0 domains. But a few associated factors were identified for the non-BD group in the study. Although the patterns of difficulty in most daily activities were similar between the BD and non-BD groups, factors associated with the difficulties differed between those two groups. Muscle power impairment, age and speech impairment were important factors associated with difficulties in subjects with BD-related physical disability. Older adults with physical disability often experience difficulties in household activities and mobility. Muscle power impairment is associated with difficulties in daily life in subjects with physical disability related to brain disease. Those subjects with brain disease who had older age, a greater degree of muscle power impairment, and the presence of speech impairment were at higher risk of experiencing difficulties in most daily activities.
Dalton, Megan; Davidson, Megan; Keating, Jenny
2011-01-01
Is the Assessment of Physiotherapy Practice (APP) a valid instrument for the assessment of entry-level competence in physiotherapy students? Cross-sectional study with Rasch analysis of initial (n=326) and validation samples (n=318). Students were assessed on completion of 4, 5, or 6-week clinical placements across one university semester. 298 clinical educators and 456 physiotherapy students at nine universities in Australia and New Zealand provided 644 completed APP instruments. APP data in both samples showed overall fit to a Rasch model of expected item functioning for interval scale measurement. Item 6 (Written communication) exhibited misfit in both samples, but was retained as an important element of competence. The hierarchy of item difficulty was the same in both samples with items related to professional behaviour and communication the easiest to achieve and items related to clinical reasoning the most difficult. Item difficulty was well targeted to person ability. No Differential Item Functioning was identified, indicating that the scale performed in a comparable way regardless of the student's age, gender or amount of prior clinical experience, and the educator's age, gender, or experience as an educator, or the type of facility, university, or clinical area. The instrument demonstrated unidimensionality confirming the appropriateness of summing the scale scores on each item to provide an overall score of clinical competence and was able to discriminate four levels of professional competence (Person Separation Index=0.96). Person ability and raw APP scores had a linear relationship (r(2)=0.99). Rasch analysis supports the interpretation that a student's APP score is an indication of their underlying level of professional competence in workplace practice. Copyright © 2011 Australian Physiotherapy Association. Published by .. All rights reserved.
Peterson, Alexander C; Sutherland, Jason M; Liu, Guiping; Crump, R Trafford; Karimuddin, Ahmer A
2018-06-01
The Fecal Incontinence Quality of Life Scale (FIQL) is a commonly used patient-reported outcome measure for fecal incontinence, often used in clinical trials, yet has not been validated in English since its initial development. This study uses modern methods to thoroughly evaluate the psychometric characteristics of the FIQL and its potential for differential functioning by gender. This study analyzed prospectively collected patient-reported outcome data from a sample of patients prior to colorectal surgery. Patients were recruited from 14 general and colorectal surgeons in Vancouver Coastal Health hospitals in Vancouver, Canada. Confirmatory factor analysis was used to assess construct validity. Item response theory was used to evaluate test reliability, describe item-level characteristics, identify local item dependence, and test for differential functioning by gender. 236 patients were included for analysis, with mean age 58 and approximately half female. Factor analysis failed to identify the lifestyle, coping, depression, and embarrassment domains, suggesting lack of construct validity. Items demonstrated low difficulty, indicating that the test has the highest reliability among individuals who have low quality of life. Five items are suggested for removal or replacement. Differential test functioning was minimal. This study has identified specific improvements that can be made to each domain of the Fecal Incontinence Quality of Life Scale and to the instrument overall. Formatting, scoring, and instructions may be simplified, and items with higher difficulty developed. The lifestyle domain can be used as is. The embarrassment domain should be significantly revised before use.
Lynskey, M T; Agrawal, A
2007-09-01
DSM-IV criteria for illicit drug abuse and dependence are largely based on criteria developed for alcohol use disorders and there is a lack of research evidence on the psychometric properties of these symptoms when applied to illicit drugs. This study utilizes data on abuse/dependence criteria for cannabis, cocaine, stimulants, sedatives, tranquilizers, opiates, hallucinogens and inhalants from the National Epidemiological Survey on Alcohol and Related Conditions (NESARC, n=43 093). Analyses included factor analysis to explore the dimensionality of illicit drug abuse and dependence criteria, calculation of item difficulty and discrimination within an item response framework and a descriptive analysis of 'diagnostic orphans': individuals meeting criteria for 1-2 dependence symptoms but not abuse. Rates of psychiatric disorders were compared across groups. Results favor a uni-dimensional construct for abuse/dependence on each of the eight drug classes. Factor loadings, item difficulty and discrimination were remarkably consistent across drug categories. For each drug category, between 29% and 51% of all individuals meeting criteria for at least one symptom did not receive a formal diagnosis of either abuse or dependence and were therefore classified as 'orphans'. Mean rates of disorder in these individuals suggested that illicit drug use disorders may be more adequately described along a spectrum of severity. While there were remarkable similarities across categories of illicit drugs, consideration of item difficulty suggested that some alterations to DSM regarding the relevant severity of specific abuse and dependence criteria may be warranted.
ERIC Educational Resources Information Center
Missouri State Dept. of Elementary and Secondary Education, Jefferson City.
This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to fifth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
An Ethical Issue Scale for Community Pharmacy Setting (EISP): Development and Validation.
Crnjanski, Tatjana; Krajnovic, Dusanka; Tadic, Ivana; Stojkov, Svetlana; Savic, Mirko
2016-04-01
Many problems that arise when providing pharmacy services may contain some ethical components and the aims of this study were to develop and validate a scale that could assess difficulties of ethical issues, as well as the frequency of those occurrences in everyday practice of community pharmacists. Development and validation of the scale was conducted in three phases: (1) generating items for the initial survey instrument after qualitative analysis; (2) defining the design and format of the instrument; (3) validation of the instrument. The constructed Ethical Issue scale for community pharmacy setting has two parts containing the same 16 items for assessing the difficulty and frequency thereof. The results of the 171 completely filled out scales were analyzed (response rate 74.89%). The Cronbach's α value of the part of the instrument that examines difficulties of the ethical situations was 0.83 and for the part of the instrument that examined frequency of the ethical situations was 0.84. Test-retest reliability for both parts of the instrument was satisfactory with all Interclass correlation coefficient (ICC) values above 0.6, (for the part that examines severity ICC = 0.809, for the part that examines frequency ICC = 0.929). The 16-item scale, as a self assessment tool, demonstrated a high degree of content, criterion, and construct validity and test-retest reliability. The results support its use as a research tool to asses difficulty and frequency of ethical issues in community pharmacy setting. The validated scale needs to be further employed on a larger sample of pharmacists.
The Utility of the Family Empowerment Scale With Custodial Grandmothers
Hayslip, Bert; Smith, Gregory C.; Montoro-Rodriguez, Julian; Streider, Frederick H.; Merchant, William
2016-01-01
The Family Empowerment Scale (FES) was developed specifically to assess empowerment in families with emotional disorders. Its relevance to custodial grandfamilies is reflected in the difficulties in grandchildren's social, emotional, and behavioral functioning, wherein such difficulties may be explained via either reactions to changes in their family structure or in their responses to the newly formed family unit. Utilizing 27 items derived from the 34-item version of the FES, which had represented differential levels of empowerment (family, service system, community) as indexed by one's attitudes, knowledge, and behavior, we explored the factor structure, internal consistency, construct, and convergent validity of the FES with grandparent caregivers. Three-hundred forty-three (M age = 58.45, SD = 8.22, n Caucasian = 152, n African American = 149, n Hispanic = 38) custodial grandmothers caring for grandchildren between ages 4 and 12 years completed the 27 FES items and various measures of their psychological well-being, grandchild psychological difficulties, emotional support, and parenting practices. Factor analysis revealed three factors that differed slightly from the originally proposed FES subscales: Parental Self-Efficacy/Self-Confidence, Service Activism, and Service Knowledge. Each of the factors was internally consistent, and derived factor scores were moderately interrelated, speaking to the question of convergent validity. The construct validity of these three factors was evidenced by meaningful patterns of statistically significant correlations with grandmothers’ psychological well-being, grandchild psychological difficulties, emotional support, and parenting practices. These factor scores were independent of grandmother age, health, and education. These findings suggest the newly identified FES factors to be valuable in understanding empowerment among grandmother caregivers. PMID:26452627
The Utility of the Family Empowerment Scale With Custodial Grandmothers.
Hayslip, Bert; Smith, Gregory C; Montoro-Rodriguez, Julian; Streider, Frederick H; Merchant, William
2017-03-01
The Family Empowerment Scale (FES) was developed specifically to assess empowerment in families with emotional disorders. Its relevance to custodial grandfamilies is reflected in the difficulties in grandchildren's social, emotional, and behavioral functioning, wherein such difficulties may be explained via either reactions to changes in their family structure or in their responses to the newly formed family unit. Utilizing 27 items derived from the 34-item version of the FES, which had represented differential levels of empowerment (family, service system, community) as indexed by one's attitudes, knowledge, and behavior, we explored the factor structure, internal consistency, construct, and convergent validity of the FES with grandparent caregivers. Three-hundred forty-three ( M age = 58.45, SD = 8.22, n Caucasian = 152, n African American = 149, n Hispanic = 38) custodial grandmothers caring for grandchildren between ages 4 and 12 years completed the 27 FES items and various measures of their psychological well-being, grandchild psychological difficulties, emotional support, and parenting practices. Factor analysis revealed three factors that differed slightly from the originally proposed FES subscales: Parental Self-Efficacy/Self-Confidence, Service Activism, and Service Knowledge. Each of the factors was internally consistent, and derived factor scores were moderately interrelated, speaking to the question of convergent validity. The construct validity of these three factors was evidenced by meaningful patterns of statistically significant correlations with grandmothers' psychological well-being, grandchild psychological difficulties, emotional support, and parenting practices. These factor scores were independent of grandmother age, health, and education. These findings suggest the newly identified FES factors to be valuable in understanding empowerment among grandmother caregivers.
Chan, Kitty S; Gross, Alden L; Pezzin, Liliana E; Brandt, Jason; Kasper, Judith D
2015-12-01
To harmonize measures of cognitive performance using item response theory (IRT) across two international aging studies. Data for persons ≥65 years from the Health and Retirement Study (HRS, N = 9,471) and the English Longitudinal Study of Aging (ELSA, N = 5,444). Cognitive performance measures varied (HRS fielded 25, ELSA 13); 9 were in common. Measurement precision was examined for IRT scores based on (a) common items, (b) common items adjusted for differential item functioning (DIF), and (c) DIF-adjusted all items. Three common items (day of date, immediate word recall, and delayed word recall) demonstrated DIF by survey. Adding survey-specific items improved precision but mainly for HRS respondents at lower cognitive levels. IRT offers a feasible strategy for harmonizing cognitive performance measures across other surveys and for other multi-item constructs of interest in studies of aging. Practical implications depend on sample distribution and the difficulty mix of in-common and survey-specific items. © The Author(s) 2015.
Paz, Sylvia H; Spritzer, Karen L; Morales, Leo S; Hays, Ron D
2013-09-01
To evaluate the equivalence of the PROMIS(®) physical functioning item bank by language of administration (English versus Spanish). The PROMIS(®) wave 1 English-language physical functioning bank consists of 124 items, and 114 of these were translated into Spanish. Item frequencies, means and standard deviations, item-scale correlations, and internal consistency reliability were calculated. The IRT assumption of unidimensionality was evaluated by fitting a single-factor confirmatory factor analytic model. IRT threshold and discrimination parameters were estimated using Samejima's Graded Response Model. DIF by language of administration was evaluated. Item means ranged from 2.53 (SD = 1.36) to 4.62 (SD = 0.82). Coefficient alpha was 0.99, and item-rest correlations ranged from 0.41 to 0.89. A one-factor model fits the data well (CFI = 0.971, TLI = 0.970, and RMSEA = 0.052). The slope parameters ranged from 0.45 ("Are you able to run 10 miles?") to 4.50 ("Are you able to put on a shirt or blouse?"). The threshold parameters ranged from -1.92 ("How much do physical health problems now limit your usual physical activities (such as walking or climbing stairs)?") to 6.06 ("Are you able to run 10 miles?"). Fifty of the 114 items were flagged for DIF based on an R(2) of 0.02 or above criterion. The expected total score was higher for Spanish- than English-language respondents. English- and Spanish-speaking subjects with the same level of underlying physical function responded differently to 50 of 114 items. This study has important implications in the study of physical functioning among diverse populations.
Hogge, Michaël; Adam, Stéphane; Collette, Fabienne
2008-07-01
The directed forgetting effect obtained with the item method is supposed to depend on both selective rehearsal of to-be-remembered (TBR) items and attentional inhibition of to-be-forgotten (TBF) items. In this study, we investigated the locus of the directed forgetting deficit in older adults by exploring the influence of recollection and familiarity-based retrieval processes on age-related differences in directed forgetting. Moreover, we explored the influence of processing speed, short-term memory capacity, thought suppression tendencies, and sensitivity to proactive interference on performance. The results indicated that older adults' directed forgetting difficulties are due to decreased recollection of TBR items, associated with increased automatic retrieval of TBF items. Moreover, processing speed and proactive interference appeared to be responsible for the decreased recall of TBR items.
Lim, Bee Chiu; Kueh, Yee Cheng; Arifin, Wan Nor; Ng, Kok Huan
2016-01-01
Background Heart disease knowledge is an important concept for health education, yet there is lack of evidence on proper validated instruments used to measure levels of heart disease knowledge in the Malaysian context. Methods A cross-sectional, survey design was conducted to examine the psychometric properties of the adapted English version of the Heart Disease Knowledge Questionnaire (HDKQ). Using proportionate cluster sampling, 788 undergraduate students at Universiti Sains Malaysia, Malaysia, were recruited and completed the HDKQ. Item analysis and confirmatory factor analysis (CFA) were used for the psychometric evaluation. Construct validity of the measurement model was included. Results Most of the students were Malay (48%), female (71%), and from the field of science (51%). An acceptable range was obtained with respect to both the difficulty and discrimination indices in the item analysis results. The difficulty index ranged from 0.12–0.91 and a discrimination index of ≥ 0.20 were reported for the final retained 23 items. The final CFA model showed an adequate fit to the data, yielding a 23-item, one-factor model [weighted least squares mean and variance adjusted scaled chi-square difference = 1.22, degrees of freedom = 2, P-value = 0.544, the root mean square error of approximation = 0.03 (90% confidence interval = 0.03, 0.04); close-fit P-value = > 0.950]. Conclusion Adequate psychometric values were obtained for Malaysian undergraduate university students using the 23-item, one-factor model of the adapted HDKQ. PMID:27660543
Lim, Bee Chiu; Kueh, Yee Cheng; Arifin, Wan Nor; Ng, Kok Huan
2016-07-01
Heart disease knowledge is an important concept for health education, yet there is lack of evidence on proper validated instruments used to measure levels of heart disease knowledge in the Malaysian context. A cross-sectional, survey design was conducted to examine the psychometric properties of the adapted English version of the Heart Disease Knowledge Questionnaire (HDKQ). Using proportionate cluster sampling, 788 undergraduate students at Universiti Sains Malaysia, Malaysia, were recruited and completed the HDKQ. Item analysis and confirmatory factor analysis (CFA) were used for the psychometric evaluation. Construct validity of the measurement model was included. Most of the students were Malay (48%), female (71%), and from the field of science (51%). An acceptable range was obtained with respect to both the difficulty and discrimination indices in the item analysis results. The difficulty index ranged from 0.12-0.91 and a discrimination index of ≥ 0.20 were reported for the final retained 23 items. The final CFA model showed an adequate fit to the data, yielding a 23-item, one-factor model [weighted least squares mean and variance adjusted scaled chi-square difference = 1.22, degrees of freedom = 2, P-value = 0.544, the root mean square error of approximation = 0.03 (90% confidence interval = 0.03, 0.04); close-fit P-value = > 0.950]. Adequate psychometric values were obtained for Malaysian undergraduate university students using the 23-item, one-factor model of the adapted HDKQ.
ERIC Educational Resources Information Center
Liao, Chi-Wen; Livingston, Samuel A.
2008-01-01
Randomly equivalent forms (REF) of tests in listening and reading for nonnative speakers of English were created by stratified random assignment of items to forms, stratifying on item content and predicted difficulty. The study included 50 replications of the procedure for each test. Each replication generated 2 REFs. The equivalence of those 2…
An Information Analysis of 2-, 3-, and 4-Word Verbal Discrimination Learning.
ERIC Educational Resources Information Center
Arima, James K.; Gray, Francis D.
Information theory was used to qualify the difficulty of verbal discrimination (VD) learning tasks and to measure VD performance. Words for VD items were selected with high background frequency and equal a priori probabilities of being selected as a first response. Three VD lists containing only 2-, 3-, or 4-word items were created and equated for…
ERIC Educational Resources Information Center
Hamadneh, Iyad Mohammed
2015-01-01
This study aimed at investigating the impact changing of escape alternative position in multiple-choice test on the psychometric properties of a test and it's items parameters (difficulty, discrimination & guessing), and estimation of examinee ability. To achieve the study objectives, a 4-alternative multiple choice type achievement test…
ERIC Educational Resources Information Center
Chen, Chieh-Yu; Chen, Ching-I; Squires, Jane; Bian, Xiaoyan; Heo, Kay H.; Filgueiras, Alberto; Kalinina, Svetlana; Samarina, Larissa; Ermolaeva, Evgeniya; Xie, Huichao; Yu, Ting-Ying; Wu, Pei-Fang; Landeira-Fernandez, Jesus
2017-01-01
Ages & Stages Questionnaires: Social-Emotional (ASQ:SE) is a widely used screening instrument for detecting social-emotional difficulties in infants and young children. To use a screening instrument across cultures and countries, it is necessary to identify potential item-level biases and ensure item equivalence. This study investigated the…
ERIC Educational Resources Information Center
Goldhammer, Frank
2015-01-01
The main challenge of ability tests relates to the difficulty of items, whereas speed tests demand that test takers complete very easy items quickly. This article proposes a conceptual framework to represent how performance depends on both between-person differences in speed and ability and the speed-ability compromise within persons. Related…
ERIC Educational Resources Information Center
Chan, David W.
2010-01-01
Data of item responses to the Impossible Figures Task (IFT) from 492 Chinese primary, secondary, and university students were analyzed using the dichotomous Rasch measurement model. Item difficulty estimates and person ability estimates located on the same logit scale revealed that the pooled sample of Chinese students, who were relatively highly…
Leonard, Laurence B; Deevy, Patricia; Fey, Marc E; Bredin-Oja, Shelley L
2013-04-01
This study examined sentence comprehension in children with specific language impairment (SLI) in a manner designed to separate the contribution of cognitive capacity from the effects of syntactic structure. Nineteen children with SLI, 19 typically developing children matched for age (TD-A), and 19 younger typically developing children (TD-Y) matched according to sentence comprehension test scores responded to sentence comprehension items that varied in either length or their demands on cognitive capacity, based on the nature of the foils competing with the target picture. The TD-A children were accurate across all item types. The SLI and TD-Y groups were less accurate than the TD-A group on items with greater length and, especially, on items with the greatest demands on cognitive capacity. The types of errors were consistent with failure to retain details of the sentence apart from syntactic structure. The difficulty in the more demanding conditions seemed attributable to interference. Specifically, the children with SLI and the TD-Y children appeared to have difficulty retaining details of the target sentence when the information reflected in the foils closely resembled the information in the target sentence.
Both younger and older adults have difficulty updating emotional memories.
Nashiro, Kaoru; Sakaki, Michiko; Huffman, Derek; Mather, Mara
2013-03-01
The main purpose of the study was to examine whether emotion impairs associative memory for previously seen items in older adults, as previously observed in younger adults. Thirty-two younger adults and 32 older adults participated. The experiment consisted of 2 parts. In Part 1, participants learned picture-object associations for negative and neutral pictures. In Part 2, they learned picture-location associations for negative and neutral pictures; half of these pictures were seen in Part 1 whereas the other half were new. The dependent measure was how many locations of negative versus neutral items in the new versus old categories participants remembered in Part 2. Both groups had more difficulty learning the locations of old negative pictures than of new negative pictures. However, this pattern was not observed for neutral items. Despite the fact that older adults showed overall decline in associative memory, the impairing effect of emotion on updating associative memory was similar between younger and older adults.
Oude Voshaar, Martijn Ah; Ten Klooster, Peter M; Taal, Erik; Krishnan, Eswar; van de Laar, Mart Afj
2012-03-05
Patient-reported physical function is an established outcome domain in clinical studies in rheumatology. To overcome the limitations of the current generation of questionnaires, the Patient-Reported Outcomes Measurement Information System (PROMIS®) project in the USA has developed calibrated item banks for measuring several domains of health status in people with a wide range of chronic diseases. The aim of this study was to translate and cross-culturally adapt the PROMIS physical function item bank to the Dutch language and to pretest it in a sample of patients with arthritis. The items of the PROMIS physical function item bank were translated using rigorous forward-backward protocols and the translated version was subsequently cognitively pretested in a sample of Dutch patients with rheumatoid arthritis. Few issues were encountered in the forward-backward translation. Only 5 of the 124 items to be translated had to be rewritten because of culturally inappropriate content. Subsequent pretesting showed that overall, questions of the Dutch version were understood as they were intended, while only one item required rewriting. Results suggest that the translated version of the PROMIS physical function item bank is semantically and conceptually equivalent to the original. Future work will be directed at creating a Dutch-Flemish final version of the item bank to be used in research with Dutch speaking populations.
ERIC Educational Resources Information Center
Kearns, Devin M.; Steacy, Laura M.; Compton, Donald L.; Gilbert, Jennifer K.; Goodwin, Amanda P.; Cho, Eunsoo; Lindstrom, Esther R.; Collins, Alyson A.
2016-01-01
Comprehensive models of derived polymorphemic word recognition skill in developing readers, with an emphasis on children with reading difficulty (RD), have not been developed. The purpose of the present study was to model individual differences in polymorphemic word recognition ability at the item level among 5th-grade children (N = 173)…
ERIC Educational Resources Information Center
Palmieri, Patrick A.; Smith, Gregory C.
2007-01-01
The authors examined the structural validity of the parent informant version of the Strengths and Difficulties Questionnaire (SDQ) with a sample of 733 custodial grandparents. Three models of the SDQ's factor structure were evaluated with confirmatory factor analysis based on the item covariance matrix. Although indices of fit were good across all…
ERIC Educational Resources Information Center
Oruç Ertürk, Nesrin; Mumford, Simon E.
2017-01-01
This study, conducted by two researchers who were also multiple-choice question (MCQ) test item writers at a private English-medium university in an English as a foreign language (EFL) context, was designed to shed light on the factors that influence test-takers' perceptions of difficulty in English for academic purposes (EAP) vocabulary, with the…
NASA Technical Reports Server (NTRS)
Canuto, V.
1975-01-01
The papers deal with the role of magnetism in astrophysics and the properties of matter in the presence of unusually large magnetic fields. Topics include a quantum-mechanical treatment of high-energy charged particles radiating in a homogeneous magnetic field, the solution and properties of the Dirac equation for magnetic fields of any strength up to 10 to the 13th power gauss, experimental difficulties encountered and overcome in generating megagauss fields, the effect of strong radiation damping for an ultrarelativistic charge in an external electromagnetic field, magnetic susceptibilities of nuclei and elementary particles, and Compton scattering in strong external electromagnetic fields. Other papers examine static uniform electric and magnetic polarizabilities of the vacuum in arbitrarily strong magnetic fields, quantum-mechanical processes in neutron stars, basic ideas of mean-field magnetohydrodynamics, helical MHD turbulence, relations between cosmic and laboratory plasma physics, and insights into the nature of magnetism provided by relativity and cosmology. Individual items are announced in this issue.
An Evaluation of Different Statistical Targets for Assembling Parallel Forms in Item Response Theory
Ali, Usama S.; van Rijn, Peter W.
2015-01-01
Assembly of parallel forms is an important step in the test development process. Therefore, choosing a suitable theoretical framework to generate well-defined test specifications is critical. The performance of different statistical targets of test specifications using the test characteristic curve (TCC) and the test information function (TIF) was investigated. Test length, the number of test forms, and content specifications are considered as well. The TCC target results in forms that are parallel in difficulty, but not necessarily in terms of precision. Vice versa, test forms created using a TIF target are parallel in terms of precision, but not necessarily in terms of difficulty. As sometimes the focus is either on TIF or TCC, differences in either difficulty or precision can arise. Differences in difficulty can be mitigated by equating, but differences in precision cannot. In a series of simulations using a real item bank, the two-parameter logistic model, and mixed integer linear programming for automated test assembly, these differences were found to be quite substantial. When both TIF and TCC are combined into one target with manipulation to relative importance, these differences can be made to disappear.
Kraft, Pål; Rise, Jostein; Sutton, Stephen; Røysamb, Espen
2005-09-01
A study was conducted to explore (a) the dimensional structure of perceived behavioural control (PBC), (b) the conceptual basis of perceived difficulty items, and (c) how PBC components and instrumental and affective attitudes, respectively, relate to intention and behaviour. The material stemmed from a two-wave study of Norwegian graduate students (N = 227 for the prediction of intention and N = 110 for the prediction of behaviour). Data were analysed using confirmatory factor analysis (CFA) and multiple regression by the application of structural equation modelling (SEM). CFA suggested that PBC could be conceived of as consisting of three separate but interrelated factors (perceived control, perceived confidence and perceived difficulty), or as two separate but interrelated factors representing self-efficacy (measured by perceived difficulty and perceived confidence or by just perceived confidence) and perceived control. However, the perceived difficulty items also overlapped substantially with affective attitude. Perceived confidence was a strong predictor of exercise intention but not of recycling intention. Perceived control, however, was a strong predictor of recycling intention but not exercise intention. Affective attitudes but not instrumental attitudes were identified as substantial predictors of intentions. The findings suggest that at least under some circumstances it may be inadequate to measure PBC by means of perceived difficulty. One possible consequence may be that the role of PBC as a predictor of intention is somewhat overestimated, whereas the role of (affective) attitude may be similarly underestimated.
Andersson, Helle Wessel; Bjørngaard, Johan Håkon; Kaspersen, Silje Lill; Wang, Catharina E A; Skre, Ingunn; Dahl, Thomas
2010-05-01
The aim was to examine the prevalence of mental health difficulties and prejudices toward mental illness among adolescents, and to analyze possible school and school class effects on these issues. The sample comprised 4,046 pupils (16-19 years) in 257 school classes from 45 Norwegian upper secondary schools. The estimated response rate among the pupils was about 96%. Self-reported mental health difficulties were measured with a four-item scale that covered emotional and behavioral difficulties. Prejudiced attitudes toward mental illness were assessed using a nine-item scale. Multilevel regression analysis was used to estimate the contribution of factors at the individual level, and at the school and class levels. Most of the variance in self-reported mental health difficulties and prejudices was accounted for by individual level factors (92-94%). However, there were statistically significant school and class level effects (P < 0.01), confounded by socioeconomic factors. Mental health difficulties were commonly reported, more often by females than males (P < 0.01). Difficulties with emotions and attention were the two main problem areas, with definite to severe difficulties being reported by 19 and 21% of the females, and by 9 and 16% of the males, respectively. Prejudices were reported more often by males than females (P < 0.01). Both self-reported mental health difficulties and prejudiced attitudes were related to educational program, living situation, and parental education (P < 0.01). The relatively high prevalences of mental health difficulties and prejudiced attitudes toward mental illness among adolescents indicate a need for effective mental health intervention programs. Targeted intervention strategies should be considered when there is evidence of a high number of risk factors in schools and school classes. Furthermore, the gender differences found in self-reported mental health difficulties and prejudices suggest a need for gender-differentiated programs.
Melin, Eva O; Svensson, Ralph; Thunander, Maria; Hillman, Magnus; Thulesius, Hans O; Landin-Olsson, Mona
2017-01-01
Obesity is linked to cardiovascular diseases and increasingly common in type 1 diabetes mellitus (T1DM) since the introduction of intensified insulin therapy. Our main aim was to explore associations between obesity and depression, anxiety, alexithymia and self-image measures and to control for lifestyle variables in a sample of persons with T1DM. Secondary aims were to explore associations between abdominal and general obesity and cardiovascular complications in T1DM. Cross sectional study of 284 persons with T1DM (age 18-59 years, men 56%), consecutively recruited from one secondary care hospital diabetes clinic in Sweden. Assessments were performed with self-report instruments (Hospital Anxiety and Depression Scale, Toronto Alexithymia Scale-20 items and Structural Analysis of Social Behavior). Anthropometrics and blood samples were collected for this study and supplemented with data from the patients' medical records. Abdominal obesity was defined as waist circumference men/women (meters): ≥1.02/≥0.88, and general obesity as BMI ≥30 kg/m 2 for both genders. Abdominal obesity was chosen in the analyses due to the high association with cardiovascular complications. Different explanatory logistic regression models were elaborated for the associations and calibrated and validated for goodness of fit with the data variables. The prevalence of abdominal obesity was 49/284 (17%), men/women: 8%/29% ( P < 0.001). Abdominal obesity was associated with women (AOR 4.9), physical inactivity (AOR 3.1), alexithymia (AOR 2.6) and age (per year) (AOR 1.04). One of the three alexithymia sub factors, "difficulty identifying feelings" (AOR 3.1), was associated with abdominal obesity. Gender analyses showed that abdominal obesity in men was associated with "difficulty identifying feelings" (AOR 7.7), and in women with use of antidepressants (AOR 4.3) and physical inactivity (AOR 3.6). Cardiovascular complications were associated with abdominal obesity (AOR 5.2). Alexithymia, particularly the alexithymia subfactor "difficulty identifying feelings", physical inactivity, and women, as well as cardiovascular complications were associated with abdominal obesity. As abdominal obesity is detrimental in diabetes due to its association with cardiovascular complications, our results suggest two risk factor treatment targets: increased emotional awareness and increased physical activity.
An analysis of the DuPage County Regional Office of Education physics exam
NASA Astrophysics Data System (ADS)
Muehsler, Hans
In 2009, the DuPage County Regional Office of Education (ROE) tasked volunteer physics teachers with creating a basic skills physics exam reflecting what the participants valued and shared in common across curricula. Mechanics, electricity & magnetism (E&M), and wave phenomena emerged as the primary constructs. The resulting exam was intended for first-exposure physics students. The most recently completed version was psychometrically assessed for unidimensionality within the constructs using a robust WLS structural equation model and for reliability. An item analysis using a 3-PL IRT model was performed on the mechanics items and a 2-PL IRT model was performed on the E&M and waves items; a distractor analysis was also performed on all items. Lastly, differential item functioning (DIF) and differential test functioning (DTF) analyses, using the Mantel-Haenszel procedure, were performed using gender, ethnicity, year in school, ELL, physics level, and math level as groupings.
Nie, Guangning; Yang, Hongyan; Liu, Jian; Zhao, ChunMei; Wang, Xiaoyun
2017-01-01
Abstract Objective: The Menopause-Specific Quality-of-Life (MENQOL) questionnaire was developed as a specific tool to measure the health-related quality-of-life of postmenopausal women. Thus far, the Chinese version questionnaire has not been subjected to psychometric assessment with a large sample. This study aims to evaluate the validity and reliability of the Chinese version of the MENQOL specific to postmenopausal women in China. Methods: A total of 1,137 menopausal symptomatic and 491 menopausal asymptomatic women from eight cities in China were recruited using a convenience sampling method. Psychometric properties were evaluated by descriptive statistics, validity, and reliability. Reliability was assessed for each subscale of the MENQOL through internal consistency reliability with Cronbach's α and intersubscale correlations. Item-domain correlations, principal components analysis (PCA), and confirmatory factor analysis were performed to determine construct validity. t tests were used to compare the differences between the menopausal symptomatic and asymptomatic women and to evaluate the discriminate validity. Pearson correlation coefficients were calculated between MENQOL scores and the Kupperman index to assess criterion-related validity. Results: The most common symptoms in Chinese menopausal symptomatic women were “experiencing poor memory” (94.4%), “feeling tired or worn out” (93.8%), “aching in muscle and joints” (89.4%), “low backache” (86.9%), “decrease in physical strength” (86.6%), “aches in back of neck or head” (86.2%), “difficulty sleeping” (83.6%), “accomplishing less than I used to” (83.4%), “feeling a lack of energy” (83.3%), “change in your sexual desire” (81%), and “hot flash” (80.7%) among others. The symptoms of “increased facial hair” were rarely seen (9.9%). The vasomotor domain, as well as psychosocial, physical, and sexual domains showed high reliability (Cronbach's α 0.84, 0.87, 0.89, and 0.86, respectively). Item-domain correlation analysis showed that all items correlated more strongly with their own domains than with other domains. In the PCA, after deleting the “increased facial hair” item, items in the vasomotor, sexual, and psychosocial subscales loaded on their respective domains by and large, and items in the physical subscale divided into two factors. The PCA revealed a latent structure of the Chinese version of MENQOL nearly identical to the original MENQOL domains. The confirmatory factor analysis demonstrated that the questionnaire fits well with a four-domain model. The MENQOL can discriminate between menopausal symptomatic women with asymptomatic women as it showed good discriminate validity. Criterion-related validity was confirmed by a significant correlation between MENQOL scores and the Kupperman index. Conclusions: This study showed that Chinese version of MENQOL has good psychometric properties and would be suitable to measure the health-related quality-of-life of Chinese menopausal women except for item 21 (increased facial hair). PMID:27922934
Chow, Ronald; Tsao, May; Pulenzas, Natalie; Zhang, Liying; Sahgal, Arjun; Cella, David; Soliman, Hany; Danjoux, Cyril; DeAngelis, Carlo; Vuong, Sherlyn; Chow, Edward
2016-01-01
The purpose was to examine the baseline characteristics, symptoms and quality of life (QOL) in patients who receive different treatments for brain metastases. Eligible patients were divided and analysed based on their treatment: whole brain radiotherapy (WBRT) alone versus stereotactic radiosurgery (SRS) or neurosurgery with or without WBRT. The Functional Assessment of Cancer Therapy-Brain (FACT-Br) items were grouped according to different domains for summary scores. The domains used for summary scores were physical, social/family, emotional, functional well-being (FWB) and additional concerns. A total of 120 patients were enrolled, with 37 treated with WBRT alone and 83 with SRS or neurosurgery with or without WBRT. Of the 50 baseline FACT-Br items, only five items (I feel ill; I get support from my friends; I worry about dying; I have difficulty expressing my thoughts, I am able to put my thoughts into action) were statistically worse in patients treated with WBRT alone (P<0.05). Patients who received SRS or surgery with or without WBRT had statistically (P<0.05) higher scores for the FWB domain, additional concerns domain, and FACT-G total scores, indicating better QOL. Patients selected for WBRT alone reported statistically different baseline QOL as compared to patients who were treated with SRS or neurosurgery (with or without WBRT).
2012-01-01
Background Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQLTM 4.0 Generic Core Scales. Methods The PedsQLTM 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. Results The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. Conclusions This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients. PMID:22414135
Differential Item Functioning Analysis of the 2003-04 NHANES Physical Activity Questionnaire
ERIC Educational Resources Information Center
Gao, Yong; Zhu, Weimo
2011-01-01
Using differential item functioning (DIF) analyses, this study examined whether there were any DIF items in the National Health and Nutrition Examination Survey (NHANES) physical activity (PA) questionnaire. A subset of adult data from the 2003-04 NHANES study (n = 3,083) was used. PA items related to respondents' occupational, transportation,…
ERIC Educational Resources Information Center
Missouri State Dept. of Elementary and Secondary Education, Jefferson City.
This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to ninth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
Shen, Linjun; Li, Feiming; Wattleworth, Roberta; Filipetto, Frank
2010-10-01
The Comprehensive Osteopathic Medical Licensing Examination conducted a trial of multimedia items in the 2008-2009 Level 3 testing cycle to determine (1) if multimedia items were able to test additional elements of medical knowledge and skills and (2) how to develop effective multimedia items. Forty-four content-matched multimedia and text multiple-choice items were randomly delivered to Level 3 candidates. Logistic regression and paired-samples t tests were used for pairwise and group-level comparisons, respectively. Nine pairs showed significant differences in either difficulty or/and discrimination. Content analysis found that, if text narrations were less direct, multimedia materials could make items easier. When textbook terminologies were replaced by multimedia presentations, multimedia items could become more difficult. Moreover, a multimedia item was found not uniformly difficult for candidates at different ability levels, possibly because multimedia and text items tested different elements of a same concept. Multimedia items may be capable of measuring some constructs different from what text items can measure. Effective multimedia items with reasonable psychometric properties can be intentionally developed.
Crins, Martine H P; van der Wees, Philip J; Klausch, Thomas; van Dulmen, Simone A; Roorda, Leo D; Terwee, Caroline B
2018-01-01
The Patient-Reported Outcomes Measurement Information System (PROMIS) is a universally applicable set of instruments, including item banks, short forms and computer adaptive tests (CATs), measuring patient-reported health across different patient populations. PROMIS CATs are highly efficient and the use in practice is considered feasible with little administration time, offering standardized and routine patient monitoring. Before an item bank can be used as CAT, the psychometric properties of the item bank have to be examined. Therefore, the objective was to assess the psychometric properties of the Dutch-Flemish PROMIS Physical Function item bank (DF-PROMIS-PF) in Dutch patients receiving physical therapy. Cross-sectional study. 805 patients >18 years, who received any kind of physical therapy in primary care in the past year, completed the full DF-PROMIS-PF (121 items). Unidimensionality was examined by Confirmatory Factor Analysis and local dependence and monotonicity were evaluated. A Graded Response Model was fitted. Construct validity was examined with correlations between DF-PROMIS-PF T-scores and scores on two legacy instruments (SF-36 Health Survey Physical Functioning scale [SF36-PF10] and the Health Assessment Questionnaire Disability-Index [HAQ-DI]). Reliability (standard errors of theta) was assessed. The results for unidimensionality were mixed (scaled CFI = 0.924, TLI = 0.923, RMSEA = 0.045, 1th factor explained 61.5% of variance). Some local dependence was found (8.2% of item pairs). The item bank showed a broad coverage of the physical function construct (threshold-parameters range: -4.28-2.33) and good construct validity (correlation with SF36-PF10 = 0.84 and HAQ-DI = -0.85). Furthermore, the DF-PROMIS-PF showed greater reliability over a broader score-range than the SF36-PF10 and HAQ-DI. The psychometric properties of the DF-PROMIS-PF item bank are sufficient. The DF-PROMIS-PF can now be used as short forms or CAT to measure the level of physical function of physiotherapy patients.
Marfeo, Elizabeth E.; Ni, Pengsheng; Haley, Stephen M.; Jette, Alan M.; Bogusz, Kara; Meterko, Mark; McDonough, Christine M.; Chan, Leighton; Brandt, Diane E.; Rasch, Elizabeth K.
2014-01-01
Objectives To develop a broad set of claimant-reported items to assess behavioral health functioning relevant to the Social Security disability determination processes, and to evaluate the underlying structure of behavioral health functioning for use in development of a new functional assessment instrument. Design Cross-sectional. Setting Community. Participants Item pools of behavioral health functioning were developed, refined, and field-tested in a sample of persons applying for Social Security disability benefits (N=1015) who reported difficulties working due to mental or both mental and physical conditions. Interventions None. Main Outcome Measure Social Security Administration Behavioral Health (SSA-BH) measurement instrument Results Confirmatory factor analysis (CFA) specified that a 4-factor model (self-efficacy, mood and emotions, behavioral control, and social interactions) had the optimal fit with the data and was also consistent with our hypothesized conceptual framework for characterizing behavioral health functioning. When the items within each of the four scales were tested in CFA, the fit statistics indicated adequate support for characterizing behavioral health as a unidimensional construct along these four distinct scales of function. Conclusion This work represents a significant advance both conceptually and psychometrically in assessment methodologies for work related behavioral health. The measurement of behavioral health functioning relevant to the context of work requires the assessment of multiple dimensions of behavioral health functioning. Specifically, we identified a 4-factor model solution that represented key domains of work related behavioral health functioning. These results guided the development and scale formation of a new SSA-BH instrument. PMID:23548542
Marfeo, Elizabeth E; Ni, Pengsheng; Haley, Stephen M; Jette, Alan M; Bogusz, Kara; Meterko, Mark; McDonough, Christine M; Chan, Leighton; Brandt, Diane E; Rasch, Elizabeth K
2013-09-01
To develop a broad set of claimant-reported items to assess behavioral health functioning relevant to the Social Security disability determination processes, and to evaluate the underlying structure of behavioral health functioning for use in development of a new functional assessment instrument. Cross-sectional. Community. Item pools of behavioral health functioning were developed, refined, and field tested in a sample of persons applying for Social Security disability benefits (N=1015) who reported difficulties working because of mental or both mental and physical conditions. None. Social Security Administration Behavioral Health (SSA-BH) measurement instrument. Confirmatory factor analysis (CFA) specified that a 4-factor model (self-efficacy, mood and emotions, behavioral control, social interactions) had the optimal fit with the data and was also consistent with our hypothesized conceptual framework for characterizing behavioral health functioning. When the items within each of the 4 scales were tested in CFA, the fit statistics indicated adequate support for characterizing behavioral health as a unidimensional construct along these 4 distinct scales of function. This work represents a significant advance both conceptually and psychometrically in assessment methodologies for work-related behavioral health. The measurement of behavioral health functioning relevant to the context of work requires the assessment of multiple dimensions of behavioral health functioning. Specifically, we identified a 4-factor model solution that represented key domains of work-related behavioral health functioning. These results guided the development and scale formation of a new SSA-BH instrument. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Modifying the test of understanding graphs in kinematics
NASA Astrophysics Data System (ADS)
Zavala, Genaro; Tejeda, Santa; Barniol, Pablo; Beichner, Robert J.
2017-12-01
In this article, we present several modifications to the Test of Understanding Graphs in Kinematics. The most significant changes are (i) the addition and removal of items to achieve parallelism in the objectives (dimensions) of the test, thus allowing comparisons of students' performance that were not possible with the original version, and (ii) changes to the distractors of some of the original items that represent the most frequent alternative conceptions. The final modified version (after an iterative process involving four administrations of test variations over two years) was administered to 471 students of an introductory university physics course at a large private university in Mexico. When analyzing the final modified version of the test it was found that the added items satisfied the statistical tests of difficulty, discriminatory power, and reliability; also, that the great majority of the modified distractors were effective in terms of their frequency selection and discriminatory power; and, that the final modified version of the test satisfied the reliability and discriminatory power criteria as well as the original test. Here, we also show the use of the new version of the test, presenting a new analysis of students' understanding not possible to do before with the original version of the test, specifically regarding the objectives and items that in the new version meet parallelisms. Finally, in the PhysPort project (physport.org), we present the final modified version of the test. It can be used by teachers and researchers to assess students' understanding of graphs in kinematics, as well as their learning about them.
Durning, Steven J; Dong, Ting; Artino, Anthony R; van der Vleuten, Cees; Holmboe, Eric; Schuwirth, Lambert
2015-08-01
An ongoing debate exists in the medical education literature regarding the potential benefits of pattern recognition (non-analytic reasoning), actively comparing and contrasting diagnostic options (analytic reasoning) or using a combination approach. Studies have not, however, explicitly explored faculty's thought processes while tackling clinical problems through the lens of dual process theory to inform this debate. Further, these thought processes have not been studied in relation to the difficulty of the task or other potential mediating influences such as personal factors and fatigue, which could also be influenced by personal factors such as sleep deprivation. We therefore sought to determine which reasoning process(es) were used with answering clinically oriented multiple-choice questions (MCQs) and if these processes differed based on the dual process theory characteristics: accuracy, reading time and answering time as well as psychometrically determined item difficulty and sleep deprivation. We performed a think-aloud procedure to explore faculty's thought processes while taking these MCQs, coding think-aloud data based on reasoning process (analytic, nonanalytic, guessing or combination of processes) as well as word count, number of stated concepts, reading time, answering time, and accuracy. We also included questions regarding amount of work in the recent past. We then conducted statistical analyses to examine the associations between these measures such as correlations between frequencies of reasoning processes and item accuracy and difficulty. We also observed the total frequencies of different reasoning processes in the situations of getting answers correctly and incorrectly. Regardless of whether the questions were classified as 'hard' or 'easy', non-analytical reasoning led to the correct answer more often than to an incorrect answer. Significant correlations were found between self-reported recent number of hours worked with think-aloud word count and number of concepts used in the reasoning but not item accuracy. When all MCQs were included, 19 % of the variance of correctness could be explained by the frequency of expression of these three think-aloud processes (analytic, nonanalytic, or combined). We found evidence to support the notion that the difficulty of an item in a test is not a systematic feature of the item itself but is always a result of the interaction between the item and the candidate. Use of analytic reasoning did not appear to improve accuracy. Our data suggest that individuals do not apply either System 1 or System 2 but instead fall along a continuum with some individuals falling at one end of the spectrum.
The "Finding Physics" Project: Recognizing and Exploring Physics outside the Classroom
ERIC Educational Resources Information Center
Beck, Judith; Perkins, James
2016-01-01
Students in introductory physics classes often have difficulty recognizing the relevance of physics concepts outside the confines of the physics classroom, lab, and textbook. Even though textbooks and instructors often provide examples of physics applications from a wide array of areas, students have difficulty relating physics to their own lives.…
How patients and clinicians make meaning of physical suffering in mental health evaluations.
Carson, Nicholas J; Katz, Arlene M; Alegría, Margarita
2016-10-01
Clinicians in community mental health settings frequently evaluate individuals suffering from physical health problems. How patients make meaning of such "comorbidity" can affect mental health in ways that may be influenced by cultural expectations and by the responses of clinicians, with implications for delivering culturally sensitive care. A sample of 30 adult mental health intakes exemplifying physical illness assessment was identified from a larger study of patient-provider communication. The recordings of patient-provider interactions were coded using an information checklist containing 21 physical illness items. Intakes were analyzed for themes of meaning making by patients and responses by clinicians. Post-diagnostic interviews with these patients and clinicians were analyzed in similar fashion. Clinicians facilitated disclosures of physical suffering to varying degrees and formulated them in the context of the culture of mental health services. Patients discussed their perceptions of what was at stake in their experience of physical illness: existential loss, embodiment, and limits on the capacity to work and on their sense of agency. The experiences of physical illness, mental health difficulties, and social stressors were described as mutually reinforcing. In mental health intakes, patients attributed meaning to the negative effects of physical health problems in relation to mental health functioning and social stressors. Decreased capacity to work was a particularly salient concern. The complexity of these patient-provider interactions may best be captured by a sociosomatic formulation that addresses the meaning of physical and mental illness in relation to social stressors. © The Author(s) 2016.
Wan, Li-ping; He, Run-lian; Ai, Yong-mei; Zhang, Hui-min; Xing, Min; Yang, Lin; Song, Yan-long; Yu, Hong-mei
2013-07-01
To introduce the Item Function Analysis(IFA) of Quality of Life- Alzheimer's disease(QOL-AD)Chinese version and to explore the feasibility of its application on Chinese patients with AD. Two hundred AD patients were interviewed and assessed by QOL-AD, through the stratified cluster sampling method. Multilog 7.03. was used for Item Function Analysis. Difference scale(a), difficulty scale(b)and Item Characteristic Curve(ICC) of each item of QOL-AD were provided. Different scales of the item 1, 7 were below 0.6, while all the others were above 0.6. As for ICC. The first and last lines for the other items were monotonic in which the two in between were in inverted V-shape, with very steep slopes, except for the item 1 and 7. Results form the IFA showed that QOL-AD was applicable to be used in the Chinese patients with AD.
The PROactive innovative conceptual framework on physical activity
Dobbels, Fabienne; de Jong, Corina; Drost, Ellen; Elberse, Janneke; Feridou, Chryssoula; Jacobs, Laura; Rabinovich, Roberto; Frei, Anja; Puhan, Milo A.; de Boer, Willem I.; van der Molen, Thys; Williams, Kate; Pinnock, Hillary; Troosters, Thierry; Karlsson, Niklas; Kulich, Karoly; Rüdell, Katja; Brindicci, Caterina; Higenbottam, Tim; Troosters, Thierry; Dobbels, Fabienne; Decramer, Marc; Tabberer, Margaret; Rabinovich, Roberto A; MacNee, William; Vogiatzis, Ioannis; Polkey, Michael; Hopkinson, Nick; Garcia-Aymerich, Judith; Puhan, Milo; Frei, Anja; van der Molen, Thys; de Jong, Corina; de Boer, Pim; Jarrod, Ian; McBride, Paul; Kamel, Nadia; Rudell, Katja; Wilson, Frederick J.; Ivanoff, Nathalie; Kulich, Karoly; Glendenning, Alistair; Karlsson, Niklas X.; Corriol-Rohou, Solange; Nikai, Enkeleida; Erzen, Damijan
2014-01-01
Although physical activity is considered an important therapeutic target in chronic obstructive pulmonary disease (COPD), what “physical activity” means to COPD patients and how their perspective is best measured is poorly understood. We designed a conceptual framework, guiding the development and content validation of two patient reported outcome (PRO) instruments on physical activity (PROactive PRO instruments). 116 patients from four European countries with diverse demographics and COPD phenotypes participated in three consecutive qualitative studies (63% male, age mean±sd 66±9 years, 35% Global Initiative for Chronic Obstructive Lung Disease stage III–IV). 23 interviews and eight focus groups (n = 54) identified the main themes and candidate items of the framework. 39 cognitive debriefings allowed the clarity of the items and instructions to be optimised. Three themes emerged, i.e. impact of COPD on amount of physical activity, symptoms experienced during physical activity, and adaptations made to facilitate physical activity. The themes were similar irrespective of country, demographic or disease characteristics. Iterative rounds of appraisal and refinement of candidate items resulted in 30 items with a daily recall period and 34 items with a 7-day recall period. For the first time, our approach provides comprehensive insight on physical activity from the COPD patients’ perspective. The PROactive PRO instruments’ content validity represents the pivotal basis for empirically based item reduction and validation. PMID:25034563
The PROactive innovative conceptual framework on physical activity.
Dobbels, Fabienne; de Jong, Corina; Drost, Ellen; Elberse, Janneke; Feridou, Chryssoula; Jacobs, Laura; Rabinovich, Roberto; Frei, Anja; Puhan, Milo A; de Boer, Willem I; van der Molen, Thys; Williams, Kate; Pinnock, Hillary; Troosters, Thierry; Karlsson, Niklas; Kulich, Karoly; Rüdell, Katja
2014-11-01
Although physical activity is considered an important therapeutic target in chronic obstructive pulmonary disease (COPD), what "physical activity" means to COPD patients and how their perspective is best measured is poorly understood. We designed a conceptual framework, guiding the development and content validation of two patient reported outcome (PRO) instruments on physical activity (PROactive PRO instruments). 116 patients from four European countries with diverse demographics and COPD phenotypes participated in three consecutive qualitative studies (63% male, age mean±sd 66±9 years, 35% Global Initiative for Chronic Obstructive Lung Disease stage III-IV). 23 interviews and eight focus groups (n = 54) identified the main themes and candidate items of the framework. 39 cognitive debriefings allowed the clarity of the items and instructions to be optimised. Three themes emerged, i.e. impact of COPD on amount of physical activity, symptoms experienced during physical activity, and adaptations made to facilitate physical activity. The themes were similar irrespective of country, demographic or disease characteristics. Iterative rounds of appraisal and refinement of candidate items resulted in 30 items with a daily recall period and 34 items with a 7-day recall period. For the first time, our approach provides comprehensive insight on physical activity from the COPD patients' perspective. The PROactive PRO instruments' content validity represents the pivotal basis for empirically based item reduction and validation. ©ERS 2014.
2014-01-01
Background Chewing khat leaves is often accompanied by tobacco use. We assessed aspects of tobacco use and explored factors associated with tobacco use patterns (frequency of use per week) among khat chewers who used tobacco only when chewing khat (“simultaneous tobacco and khat users”, STKU). Methods A sample of 204 male khat chewers was recruited during random visits to khat outlets. Data collected included socio-demographic items, tobacco use and khat chewing behaviours. Both psychological and physical dependence on khat were assessed using the Severity of Psychological Dependence on Khat (SDS-Khat) Scale, the Diagnostic Statistical Manual IV (DSM-IV) and adapted items from the Fagerström Test for Nicotine Dependence (chewing even when ill, and difficulty in abstaining from khat chewing for an entire week). Descriptive statistics and non-parametric analyses were conducted. Results Of the 204 khat chewers, 35% were khat chewers only, 20% were STKU, and the remainder were daily cigarette smokers. The mean age of STKU was 38.12 (±14.05) years. Fifty seven percent of STKU smoked tobacco and chewed khat for two days per week and 43% smoked and chewed more frequently (three to six days: 33%, daily: 10%). Three quarters (74%) were former daily tobacco users. Khat chewing initiated tobacco smoking among 45% of STKU and 71% reported attempts to quit tobacco smoking during khat chew. Among STKU, smoking tobacco for more than two days per week was significantly associated (p < 0.05) with psychological dependence (increased levels of SDS-Khat), physical dependence (increased levels of DSM-IV symptoms, chewing even when ill, difficulty in abstaining from chewing for an entire week and self-reported health conditions) and behavioural factors (e.g. amount of khat chewed in typical khat session). Conclusions Khat chewing may promote different patterns of tobacco smoking, initiate and sustain tobacco smoking, and trigger tobacco cessation relapses among STKU. Increased frequency of tobacco smoking among STKU was linked to psycho-physical and behavioural factors. Further investigation within large and representative samples of both sexes of STKU in different contexts should be considered for health research and policy development. Khat chewing should be considered when designing tobacco prevention uptake, cessation interventions and relapse prevention programmes. PMID:24885131
Li, Kin-Kit; Cardinal, Bradley J; Vuchinich, Samuel
2009-03-01
This study examined the effect of health worry (i.e., cognitive aspect of anxiety resulting from concern for health) on walking difficulty in a nationally representative sample (N = 7,527) of older adults (M age = 76.83 years). The study further tested whether physical activity mediates the effect of health worry on walking difficulty in a 6-year follow-up design. Results of a mediation analysis using structural equation modeling showed that people with a high degree of health worry engaged in less physical activity (beta = -.24, p < .001), and people who participated in less physical activity were more likely to report walking difficulty at the 6-year follow-up (beta = -.22, p < .001). There was a significant indirect effect from health worry to walking difficulty through physical activity (beta = .05, p < .001), controlling for demographic, psychosocial, and health related factors. Results suggested that inducing threat and worry may not be effective for physical activity promotion in the older population. More promising coping and regulation strategies are discussed.
ERIC Educational Resources Information Center
Schlingman, Wayne M.; Prather, Edward E.; Wallace, Colin S.; Brissenden, Gina; Rudolph, Alexander L.
2012-01-01
This paper is the first in a series of investigations into the data from the recent national study using the Light and Spectroscopy Concept Inventory (LSCI). In this paper, we use classical test theory to form a framework of results that will be used to evaluate individual item difficulties, item discriminations, and the overall reliability of the…
Kerner, Matthew S; Kalinski, Michael I
2002-08-01
Using the Theory of Planned Behavior as a framework, the Attitude to Leisure-time Physical Activity, Expectations of Others, Perceived Control, and Intention of Engage in Leisure-time Physical Activity scales were developed for use among high school students. The study population included 20 boys and 68 girls 13 to 17 years of age (for boys, M = 15.1 yr., SD = 1.0; for girls, M = 15.0 yr., SD = 1.1). Generation of items and the establishment of content validity were performed by professionals in exercise physiology, physical education, and clinical psychology. Each scale item was phrased in a Likert-type format. Both unipolar and bipolar scales with seven response choices were developed. Following the pilot testing and subsequent revisions, 32 items were retained in the Attitude to Leisure-time Physical Activity scale, 10 items were retained in the Expectations of Others scale, 3 items were retained in the Perceived Control Scale, and 24 items were retained in the Intention to Engage in Leisure-time Physical Activity scale. Coefficients indicated adequate stability and internal consistency with alpha ranging from .81 to .96. Studies of validities are underway, after which scales would be made available to those interested in intervention techniques for promoting positive attitudes toward physical fitness, perception of control over engaging in leisure-time physical activities, and good intentions to engage in leisure-time physical activities. The present results are encouraging.
NASA Astrophysics Data System (ADS)
Balta, Nuri; Mason, Andrew J.; Singh, Chandralekha
2016-06-01
Students' attitudes and approaches to physics problem solving can impact how well they learn physics and how successful they are in solving physics problems. Prior research in the U.S. using a validated Attitude and Approaches to Problem Solving (AAPS) survey suggests that there are major differences between students in introductory physics and astronomy courses and physics experts in terms of their attitudes and approaches to physics problem solving. Here we discuss the validation, administration, and analysis of data for the Turkish version of the AAPS survey for high school and university students in Turkey. After the validation and administration of the Turkish version of the survey, the analysis of the data was conducted by grouping the data by grade level, school type, and gender. While there are no statistically significant differences between the averages of various groups on the survey, overall, the university students in Turkey were more expertlike than vocational high school students. On an item by item basis, there are statistically differences between the averages of the groups on many items. For example, on average, the university students demonstrated less expertlike attitudes about the role of equations and formulas in problem solving, in solving difficult problems, and in knowing when the solution is not correct, whereas they displayed more expertlike attitudes and approaches on items related to metacognition in physics problem solving. A principal component analysis on the data yields item clusters into which the student responses on various survey items can be grouped. A comparison of the responses of the Turkish and American university students enrolled in algebra-based introductory physics courses shows that on more than half of the items, the responses of these two groups were statistically significantly different, with the U.S. students on average responding to the items in a more expertlike manner.
2018-01-01
Objective To investigate the psychometric properties of the activities of daily living (ADL) instrument used in the analysis of Korean Longitudinal Study of Ageing (KLoSA) dataset. Methods A retrospective study was carried out involving 2006 KLoSA records of community-dwelling adults diagnosed with stroke. The ADL instrument used for the analysis of KLoSA included 17 items, which were analyzed using Rasch modeling to develop a robust outcome measure. The unidimensionality of the ADL instrument was examined based on confirmatory factor analysis with a one-factor model. Item-level psychometric analysis of the ADL instrument included fit statistics, internal consistency, precision, and the item difficulty hierarchy. Results The study sample included a total of 201 community-dwelling adults (1.5% of the Korean population with an age over 45 years; mean age=70.0 years, SD=9.7) having a history of stroke. The ADL instrument demonstrated unidimensional construct. Two misfit items, money management (mean square [MnSq]=1.56, standardized Z-statistics [ZSTD]=2.3) and phone use (MnSq=1.78, ZSTD=2.3) were removed from the analysis. The remaining 15 items demonstrated good item fit, high internal consistency (person reliability=0.91), and good precision (person strata=3.48). The instrument precisely estimated person measures within a wide range of theta (−4.75 logits < θ < 3.97 logits) and a reliability of 0.9, with a conceptual hierarchy of item difficulty. Conclusion The findings indicate that the 15 ADL items met Rasch expectations of unidimensionality and demonstrated good psychometric properties. It is proposed that the validated ADL instrument can be used as a primary outcome measure for assessing longitudinal disability trajectories in the Korean adult population and can be employed for comparative analysis of international disability across national aging studies. PMID:29765888
Psychometrics of the self-report safe driving behavior measure for older adults.
Classen, Sherrilene; Wen, Pey-Shan; Velozo, Craig A; Bédard, Michel; Winter, Sandra M; Brumback, Babette; Lanford, Desiree N
2012-01-01
We investigated the psychometric properties of the 68-item Safe Driving Behavior Measure (SDBM) with 80 older drivers, 80 caregivers, and 2 evaluators from two sites. Using Rasch analysis, we examined unidimensionality and local dependence; rating scale; item- and person-level psychometrics; and item hierarchy of older drivers, caregivers, and driving evaluators who had completed the SDBM. The evidence suggested the SDBM is unidimensional, but pairs of items showed local dependency. Across the three rater groups, the data showed good person (≥3.4) and item (≥3.6) separation as well as good person (≥.93) and item reliability (≥.92). Cronbach's α was ≥.96, and few items were misfitting. Some of the items did not follow the hypothesized order of item difficulty. The SDBM classified the older drivers into six ability levels, but to fully calibrate the instrument it must be refined in terms of its items (e.g., item exclusion) and then tested among participants of lesser ability. Copyright © 2012 by the American Occupational Therapy Association, Inc.
Procedures to develop a computerized adaptive test to assess patient-reported physical functioning.
McCabe, Erin; Gross, Douglas P; Bulut, Okan
2018-06-07
The purpose of this paper is to demonstrate the procedures to develop and implement a computerized adaptive patient-reported outcome (PRO) measure using secondary analysis of a dataset and items from fixed-format legacy measures. We conducted secondary analysis of a dataset of responses from 1429 persons with work-related lower extremity impairment. We calibrated three measures of physical functioning on the same metric, based on item response theory (IRT). We evaluated efficiency and measurement precision of various computerized adaptive test (CAT) designs using computer simulations. IRT and confirmatory factor analyses support combining the items from the three scales for a CAT item bank of 31 items. The item parameters for IRT were calculated using the generalized partial credit model. CAT simulations show that reducing the test length from the full 31 items to a maximum test length of 8 items, or 20 items is possible without a significant loss of information (95, 99% correlation with legacy measure scores). We demonstrated feasibility and efficiency of using CAT for PRO measurement of physical functioning. The procedures we outlined are straightforward, and can be applied to other PRO measures. Additionally, we have included all the information necessary to implement the CAT of physical functioning in the electronic supplementary material of this paper.
Madanat, Hala; Merrill, Ray M
2006-01-01
The purpose of this study was to investigate physical activity levels across the five stages of change for physical activity and to identify motivational factors for physical activity according to these stages of change among college students in Amman, Jordan. Analyses were based on a cross-sectional survey of 431 students, with a mean age of 21.1 (SD=0.16) and 67.5% female. Based on the recommendation that physical activity requires at least 30 minutes of physical activity 3 or more days per week, men were more likely than women to classify themselves in later stages: 7.3% vs. 9.5% in the precontemplation stage, 17.4% vs. 14.7% in the contemplation stage, 50.0% vs. 63.5% in the preparation stage, 9.4% vs. 5.6% in the action stage, and 15.9% vs. 6.7% in the maintenance stage [X2(4) = 14.04, p = 0.0072]. Seven potential motivational items for physical activity were assessed using factor analysis: experience better self-worth, prevent chronic disease, relieve stress, stay in shape, longevity, recreation/fun, and social benefits. Two factor groupings were identified from these items. The first factor included the first five items, labeled as "Physical and Mental". The second factor included the last two items, labeled as "Social and Recreational." "Physical and Mental" items compared with "Social and Recreational" items were most likely to motivate physical activity across the stages of change for physical activity. The strongest motivator of physical activity was to stay in shape. The weakest motivator of physical activity was for social reasons. The influence of the intermediate motivational factors was slightly affected by the students' stage of change for physical activity. Motivators for physical activity did not differ according to sex. These results provide important information about the motivational factors for physical activity for college-aged students in Jordan that can be useful in developing effective physical activity intervention programs.
Canada, Brice; Stephan, Yannick; Jaconelli, Alban; Duberstein, Paul R
2016-01-01
Prior studies of age-restricted samples have demonstrated that, in older adulthood, neuroticism is negatively associated with difficulties performing specific daily activities. No studies of neuroticism and physical functioning have been conducted on life-span samples. This study tested the hypothesis that the relationship between neuroticism and physical functioning is stronger in older people compared with younger and middle-aged adults. Data were obtained from 2 independent French samples (n = 1,132 and 1,661 for Samples 1 and 2, respectively) ranging in age from 18 to 97. In addition to reporting sociodemographics, participants completed the Big Five Inventory, the physical functioning scale of the 36-Item Short Form Health Survey, and measures of disease burden. In both samples, regression analysis indicated that neuroticism is more negatively associated with physical functioning with advancing age, controlling for gender, marital status, disease burden, and educational attainment. In life-span samples of more than 2,700 adults, neuroticism was more strongly associated with worse physical functioning among older people compared with younger and middle-aged adults. Longitudinal research is needed to confirm this finding and to identify potential mediators. © The Author 2014. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Land, Stephanie R; Warren, Graham W; Crafts, Jennifer L; Hatsukami, Dorothy K; Ostroff, Jamie S; Willis, Gordon B; Chollette, Veronica Y; Mitchell, Sandra A; Folz, Jasmine N M; Gulley, James L; Szabo, Eva; Brandon, Thomas H; Duffy, Sonia A; Toll, Benjamin A
2016-06-01
To the authors' knowledge, there are currently no standardized measures of tobacco use and secondhand smoke exposure in patients diagnosed with cancer, and this gap hinders the conduct of studies examining the impact of tobacco on cancer treatment outcomes. The objective of the current study was to evaluate and refine questionnaire items proposed by an expert task force to assess tobacco use. Trained interviewers conducted cognitive testing with cancer patients aged ≥21 years with a history of tobacco use and a cancer diagnosis of any stage and organ site who were recruited at the National Institutes of Health Clinical Center in Bethesda, Maryland. Iterative rounds of testing and item modification were conducted to identify and resolve cognitive issues (comprehension, memory retrieval, decision/judgment, and response mapping) and instrument navigation issues until no items warranted further significant modification. Thirty participants (6 current cigarette smokers, 1 current cigar smoker, and 23 former cigarette smokers) were enrolled from September 2014 to February 2015. The majority of items functioned well. However, qualitative testing identified wording ambiguities related to cancer diagnosis and treatment trajectory, such as "treatment" and "surgery"; difficulties with lifetime recall; errors in estimating quantities; and difficulties with instrument navigation. Revisions to item wording, format, order, response options, and instructions resulted in a questionnaire that demonstrated navigational ease as well as good question comprehension and response accuracy. The Cancer Patient Tobacco Use Questionnaire (C-TUQ) can be used as a standardized item set to accelerate the investigation of tobacco use in the cancer setting. Cancer 2016;122:1728-34. © 2016 American Cancer Society. © 2016 American Cancer Society.
ERIC Educational Resources Information Center
Jackson, Allen W.; Morrow, James R., Jr.; Bowles, Heather R.; FitzGerald, Shannon J.; Blair, Steven N.
2007-01-01
Valid measurement of physical activity is important for studying the risks for morbidity and mortality. The purpose of this study was to examine evidence of construct validity of two similar single-response items assessing physical activity via self-report. Both items are based on the stages of change model. The sample was 687 participants (men =…
A New Clinical Pain Knowledge Test for Nurses: Development and Psychometric Evaluation.
Bernhofer, Esther I; St Marie, Barbara; Bena, James F
2017-08-01
All nurses care for patients with pain, and pain management knowledge and attitude surveys for nurses have been around since 1987. However, no validated knowledge test exists to measure postlicensure clinicians' knowledge of the core competencies of pain management in current complex patient populations. To develop and test the psychometric properties of an instrument designed to measure pain management knowledge of postlicensure nurses. Psychometric instrument validation. Four large Midwestern U.S. hospitals. Registered nurses employed full time and part time August 2015 to April 2016, aged M = 43.25 years; time as RN, M = 16.13 years. Prospective survey design using e-mail to invite nurses to take an electronic multiple choice pain knowledge test. Content validity of initial 36-item test "very good" (95.1% agreement). Completed tests that met analysis criteria, N = 747. Mean initial test score, 69.4% correct (range 27.8-97.2). After revision/removal of 13 unacceptable questions, mean test score was 50.4% correct (range 8.7-82.6). Initial test item percent difficulty range was 15.2%-98.1%; discrimination values range, 0.03-0.50; final test item percent difficulty range, 17.6%-91.1%, discrimination values range, -0.04 to 1.04. Split-half reliability final test was 0.66. A high decision consistency reliability was identified, with test cut-score of 75%. The final 23-item Clinical Pain Knowledge Test has acceptable discrimination, difficulty, decision consistency, reliability, and validity in the general clinical inpatient nurse population. This instrument will be useful in assessing pain management knowledge of clinical nurses to determine gaps in education, evaluate knowledge after pain management education, and measure research outcomes. Copyright © 2017 American Society for Pain Management Nursing. Published by Elsevier Inc. All rights reserved.
EXTENDING THE FLOOR AND THE CEILING FOR ASSESSMENT OF PHYSICAL FUNCTION
Fries, James F.; Lingala, Bharathi; Siemons, Liseth; Glas, Cees A. W.; Cella, David; Hussain, Yusra N; Bruce, Bonnie; Krishnan, Eswar
2014-01-01
Objective The objective of the current study was to improve the assessment of physical function by improving the precision of assessment at the floor (extremely poor function) and at the ceiling (extremely good health) of the health continuum. Methods Under the NIH PROMIS program, we developed new physical function floor and ceiling items to supplement the existing item bank. Using item response theory (IRT) and the standard PROMIS methodology, we developed 30 floor items and 26 ceiling items and administered them during a 12-month prospective observational study of 737 individuals at the extremes of health status. Change over time was compared across anchor instruments and across items by means of effect sizes. Using the observed changes in scores, we back-calculated sample size requirements for the new and comparison measures. Results We studied 444 subjects with chronic illness and/or extreme age, and 293 generally fit subjects including athletes in training. IRT analyses confirmed that the new floor and ceiling items outperformed reference items (p<0.001). The estimated post-hoc sample size requirements were reduced by a factor of two to four at the floor and a factor of two at the ceiling. Conclusion Extending the range of physical function measurement can substantially improve measurement quality, can reduce sample size requirements and improve research efficiency. The paradigm shift from Disability to Physical Function includes the entire spectrum of physical function, signals improvement in the conceptual base of outcome assessment, and may be transformative as medical goals more closely approach societal goals for health. PMID:24782194
Dima, Alexandra Lelia; Schulz, Peter Johannes
2017-01-01
Background The eHealth Literacy Scale (eHEALS) is a tool to assess consumers’ comfort and skills in using information technologies for health. Although evidence exists of reliability and construct validity of the scale, less agreement exists on structural validity. Objective The aim of this study was to validate the Italian version of the eHealth Literacy Scale (I-eHEALS) in a community sample with a focus on its structural validity, by applying psychometric techniques that account for item difficulty. Methods Two Web-based surveys were conducted among a total of 296 people living in the Italian-speaking region of Switzerland (Ticino). After examining the latent variables underlying the observed variables of the Italian scale via principal component analysis (PCA), fit indices for two alternative models were calculated using confirmatory factor analysis (CFA). The scale structure was examined via parametric and nonparametric item response theory (IRT) analyses accounting for differences between items regarding the proportion of answers indicating high ability. Convergent validity was assessed by correlations with theoretically related constructs. Results CFA showed a suboptimal model fit for both models. IRT analyses confirmed all items measure a single dimension as intended. Reliability and construct validity of the final scale were also confirmed. The contrasting results of factor analysis (FA) and IRT analyses highlight the importance of considering differences in item difficulty when examining health literacy scales. Conclusions The findings support the reliability and validity of the translated scale and its use for assessing Italian-speaking consumers’ eHealth literacy. PMID:28400356
Echeverri, Margarita; Anderson, David; Nápoles, Anna María
2016-01-01
This article describes the adaptation and initial validation of the Cancer Health Literacy Test (CHLT) for Spanish speakers. A cross-sectional field test of the Spanish version of the CHLT (CHLT-30-DKspa) was conducted among healthy Latinos in Louisiana. Diagonally weighted least squares was used to confirm the factor structure. Item response analysis using 2-parameter logistic estimates was used to identify questions that may require modification to avoid bias. Cronbach's alpha coefficients estimated scale internal consistency reliability. Analysis of variance was used to test for significant differences in CHLT-30-DKspa scores by gender, origin, age and education. The mean CHLT-30-DKspa score (N = 400) was 17.13 (range = 0-30, SD = 6.65). Results confirmed a unidimensional structure, χ(2)(405) = 461.55, p = .027, comparative fit index = .993, Tucker-Lewis index = .992, root mean square error of approximation = .0180. Cronbach's alpha was .88. Items Q1-High Calorie and Q15-Tumor Spread had the lowest item-scale correlations (.148 and .288, respectively) and standardized factor loadings (.152 and .302, respectively). Items Q19-Smoking Risk, Q8-Palliative Care, and Q1-High Calorie had the highest item difficulty parameters (difficulty = 1.12, 1.21, and 2.40, respectively). Results generally support the applicability of the CHLT-30-DKspa for healthy Spanish-speaking populations, with the exception of 4 items that need to be deleted or revised and further studied: Q1, Q8, Q15, and Q19.
Jeong, Eunju; Lesiuk, Teresa L
2011-01-01
Impairments in attention are commonly seen in individuals with traumatic brain injury (TBI). While visual attention assessment measurements have been rigorously developed and frequently used in cognitive neurorehabilitation, there is a paucity of auditory attention assessment measurements for patients with TBI. The purpose of this study was to field test a researcher-developed Music-based Attention Assessment (MAA), a melodic contour identification test designed to assess three different types of attention (i.e., sustained attention, selective attention, and divided attention), for patients with TBI. Additionally, this study aimed to evaluate the readability and comprehensibility of the test items and to examine the preliminary psychometric properties of the scale and test items. Fifteen patients diagnosed with TBI completed 3 different series of tasks in which they were required to identify melodic contours. The resulting data showed that (a) test items in each of the 3 subtests were found to have an easy to moderate level of item difficulty and an acceptable to high level of item discrimination, and (b) the musical characteristics (i.e., contour, congruence, and pitch interference) were found to be associated with the level of item difficulty, and (c) the internal consistency of the MAA as computed by Cronbach's alpha was .95. Subsequent studies using a larger sample of typical participants, along with individuals with TBI, are needed to confirm construct validity and internal consistency of the MAA. In addition, the authors recommend examination of criterion validity of the MAA as correlated with current neuropsychological attention assessment measurements.
... of items, gradual buildup of clutter in living spaces and difficulty discarding things are usually the first ... for which there is no immediate need or space. By middle age, symptoms are often severe and ...
NASA Astrophysics Data System (ADS)
Marshall, Jill A.; Hagedorn, Eric A.; O'Connor, Jerry
2009-06-01
We report the results of an analysis of the Texas Assessment of Knowledge and Skills (TAKS) designed to determine whether the TAKS is a valid indicator of whether students know and can do physics at the level necessary for success in future coursework, STEM careers, and life in a technological society. We categorized science items from the 2003 and 2004 10th and 11th grade TAKS by content area(s) covered, knowledge and skills required to select the correct answer, and overall quality. We also analyzed a 5000 student sample of item-level results from the 2004 11th grade exam, performing full-information factor analysis, calculating classical test indices, and determining each item's response curve using item response theory. Triangulation of our results revealed strengths and weaknesses of the different methods of analysis. The TAKS was found to be only weakly indicative of physics preparation and we make recommendations for increasing the validity of standardized physics testing.
Detecting Gender Bias Through Test Item Analysis
NASA Astrophysics Data System (ADS)
González-Espada, Wilson J.
2009-03-01
Many physical science and physics instructors might not be trained in pedagogically appropriate test construction methods. This could lead to test items that do not measure what they are intended to measure. A subgroup of these items might show bias against some groups of students. This paper describes how the author became aware of potentially biased items against females in his examinations, which led to the exploration of fundamental issues related to item validity, gender bias, and differential item functioning, or DIF. A brief discussion of DIF in the context of university courses, as well as practical suggestions to detect possible gender-biased items, follows.
Herbolsheimer, Florian; Riepe, Matthias W; Peter, Richard
2018-02-21
Numerous studies have reported weak or moderate correlations between self-reported and accelerometer-assessed physical activity. One explanation is that self-reported physical activity might be biased by demographic, cognitive or other factors. Cognitive function is one factor that could be associated with either overreporting or underreporting of daily physical activity. Difficulties in remembering past physical activities might result in recall bias. Thus, the current study examines whether the cognitive function is associated with differences between self-reported and accelerometer-assessed physical activity. Cross-sectional data from the population-based Activity and Function in the Elderly in Ulm study (ActiFE) were used. A total of 1172 community-dwelling older adults (aged 65-90 years) wore a uniaxial accelerometer (activPAL unit) for a week. Additionally, self-reported physical activity was assessed using the LASA Physical Activity Questionnaire (LAPAQ). Cognitive function was measured with four items (immediate memory, delayed memory, recognition memory, and semantic fluency) from the Consortium to Establish a Registry for Alzheimer's Disease Total Score (CERAD-TS). Mean differences of self-reported and accelerometer-assessed physical activity (MPA) were associated with cognitive function in men (r s = -.12, p = .002) but not in women. Sex-stratified multiple linear regression analyses showed that MPA declined with high cognitive function in men (β = -.13; p = .015). Results suggest that self-reported physical activity should be interpreted with caution in older populations, as cognitive function was one factor that explained the differences between objective and subjective physical activity measurements.
Evaluation of five guidelines for option development in multiple-choice item-writing.
Martínez, Rafael J; Moreno, Rafael; Martín, Irene; Trigo, M Eva
2009-05-01
This paper evaluates certain guidelines for writing multiple-choice test items. The analysis of the responses of 5013 subjects to 630 items from 21 university classroom achievement tests suggests that an option should not differ in terms of heterogeneous content because such error has a slight but harmful effect on item discrimination. This also occurs with the "None of the above" option when it is the correct one. In contrast, results do not show the supposedly negative effects of a different-length option, the use of specific determiners, or the use of the "All of the above" option, which not only decreases difficulty but also improves discrimination when it is the correct option.
Kerner, Matthew S
2005-06-01
Using the theory of planned behavior as a conceptual framework, scales assessing Attitude to Leisure-time Physical Activity, Expectations of Others, Perceived Control, and Intention to Engage in Leisure-time Physical Activity were developed for use among middle-school students. The study sample included 349 boys and 400 girls, 10 to 14 years of age (M=11.9 yr., SD=.9). Unipolar and bipolar scales with seven response choices were developed, with each scale item phrased in a Likert-type format. Following revisions, 22 items were retained in the Attitude to Leisure-time Physical Activity Scale, 10 items in the Expectations of Others Scale, 3 items in the Perceived Control Scale, and 17 items in the Intention to Engage in Leisure-time Physical Activity Scale. Adequate internal consistency was indicated by standardized coefficients alpha ranging from .75 to .89. Current results must be extended to assess discriminant and predictive validities and to check various reliabilities with new samples, then evaluation of intervention techniques for promotion of positive attitudes about leisure-time physical activity, including perception of control and intentions to engage in leisure-time physical activity.
Three controversies over item disclosure in medical licensure examinations.
Park, Yoon Soo; Yang, Eunbae B
2015-01-01
In response to views on public's right to know, there is growing attention to item disclosure - release of items, answer keys, and performance data to the public - in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations - 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure - by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers' right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration.
Applying automatic item generation to create cohesive physics testlets
NASA Astrophysics Data System (ADS)
Mindyarto, B. N.; Nugroho, S. E.; Linuwih, S.
2018-03-01
Computer-based testing has created the demand for large numbers of items. This paper discusses the production of cohesive physics testlets using an automatic item generation concepts and procedures. The testlets were composed by restructuring physics problems to reveal deeper understanding of the underlying physical concepts by inserting a qualitative question and its scientific reasoning question. A template-based testlet generator was used to generate the testlet variants. Using this methodology, 1248 testlet variants were effectively generated from 25 testlet templates. Some issues related to the effective application of the generated physics testlets in practical assessments were discussed.
ERIC Educational Resources Information Center
Kizilcik, Hasan Sahin; Yavas, Pervin Ünlü
2017-01-01
The aim of this study is to identify the opinions of pre-service physics teachers about the difficulties in introductory quantum physics topics. In this study conducted with twenty-five pre-service physics teachers, the case study method was used. The participants were interviewed about introductory quantum physics topics. The interviews were…
Influence of cognitive function on quality of life in anorexia nervosa patients.
Hamatani, Sayo; Tomotake, Masahito; Takeda, Tomoya; Kameoka, Naomi; Kawabata, Masashi; Kubo, Hiroko; Tada, Yukio; Tomioka, Yukiko; Watanabe, Shinya; Inoshita, Masatoshi; Kinoshita, Makoto; Ohta, Masashi; Ohmori, Tetsuro
2017-05-01
The purpose of this study was to elucidate determinants of quality of life (QOL) in anorexia nervosa (AN) patients. Twenty-one female patients with AN participated in the study. QOL was assessed with the 36-Item Short Form Health Survey (SF-36), and cognitive function was evaluated using the Wisconsin Card Sorting Test Keio version, the Rey Complex Figure Test, and the Social Cognition Screening Questionnaire. Clinical symptoms were evaluated with the Beck Depression Inventory-II, the State-Trait Anxiety Inventory-Form JYZ (STAI-JYZ), and the Maudsley Obsessive Compulsive Inventory. The Difficulty Maintaining Set score of the Wisconsin Card Sorting Test Keio version was negatively correlated to the SF-36 Physical Component Summary. Scores of the Beck Depression Inventory-II and the STAI-JYZ State and Trait were negatively correlated to the SF-36 Mental Component Summary (MCS), and the Central Coherence Index 30-min Delayed Recall score of the Rey Complex Figure Test was positively correlated with the MCS. Stepwise regression analysis showed that the Difficulty Maintaining Set score was an independent predictor of the Physical Component Summary and scores for Central Coherence Index 30-min Delayed Recall and the STAI-JYZ Trait-predicted MCS. These results suggest that not only trait anxiety but also poor central coherence and impaired ability to maintain new rule worsen AN patients' QOL. © 2016 The Authors. Psychiatry and Clinical Neurosciences © 2016 Japanese Society of Psychiatry and Neurology.
Sharkey, J; Johnson, C M; Dean, W R
2012-08-01
Although homebound older adults are at increased risk for poor nutritional health and adverse nutrition-related outcomes, little attention has focused on the tasks involved in meal preparation and consumption and the influence of those tasks on dietary intake. We examined the self-reported dietary intake from 3, 24-h dietary recalls and physical limitations in meal preparation and consumption (LMPC) activities from a randomly recruited sample of 345 homebound older men and women. Ordered logistic regression was used to examine the correlation of demographic characteristics and 6 activities with relative intakes of key musculoskeletal nutrients (calcium, vitamin D, magnesium, and phosphorus). At least 70% reported not meeting ⅔ recommended intakes for calcium and vitamin D; 12.5% failed to achieve ⅔ recommended intakes in at least three of the four nutrients. More than 12% of the sample reported it was very difficult or they were unable to perform at least 3 LMPC tasks. Regression results indicated that reporting the greatest LMPC increased the odds for lower intake of musculoskeletal nutrients. Independent of sociodemographic characteristics, self-reported difficulty in meal preparation and consumption was associated with lower dietary intakes of musculoskeletal nutrients. These results suggest the need to assess difficulty in meal preparation and consumption for the growing population of homebound older adults who participate in supplemental nutrition programs. This brief, 6-item measure may help identify older adults at risk of poor nutritional health and declining function.
Influence of burnout and sleep difficulties on the quality of life among medical students.
Pagnin, Daniel; de Queiroz, Valéria
2015-01-01
This study assessed the influence of burnout dimensions and sleep difficulties on the quality of life among preclinical-phase medical school students. Data were collected from 193 students through their completion of the World Health Organization Quality of Life Instrument, the Maslach Burnout Inventory-Student Survey, the Mini-Sleep Questionnaire, the Social Readjustment Rating Scale, and the Beck Depression Inventory. This survey performed hierarchical multiple regressions to quantify the effects of emotional exhaustion, cynicism, academic efficacy, and sleep difficulties on the physical, psychological, social, and environmental components of an individual's quality of life. The influence of confounding variables, such as gender, stress load, and depressive symptoms, were controlled in the statistical analyses. Physical health decreased when emotional exhaustion and sleep difficulties increased. Psychological well-being also decreased when cynicism and sleep difficulties increased. Burnout and sleep difficulties together explained 22 and 21 % of the variance in the physical and psychological well-being, respectively. On the other hand, physical health, psychological well-being, and social relationships increased when the sense of academic efficacy increased. Physical and psychological well-being are negatively associated with emotional exhaustion, cynicism, and sleep difficulties in students in the early phase of medical school. To improve the quality of life of these students, a significant effort should be directed towards burnout and sleep difficulties.
Correlates of physical function among stroke survivors: an examination of the 2015 BRFSS.
Ilunga Tshiswaka, D; Seals, S R; Raghavan, P
2018-02-01
To identify the characteristics of stroke survivors with poor physical function. Cross-sectional. Secondary data analyses were performed with the 2015 Behavioral Risk Factor Surveillance System data set. Unadjusted and adjusted logistic regressions were employed to determine the correlates of poor physical function in stroke survivors. Self-reported difficulty with walking and stairs was used as a proxy for physical function. Characteristics such as age, race, sex, difficulty doing errands alone, difficult dressing or bathing alone, health care coverage, time since last routine checkup, and reported financial difficulty with regard to health care access were examined as contributing factors to physical function. Approximately half of all stroke survivors reported having difficulty with walking and stairs (50.3%). As expected, the odds of reporting difficulty with walking and stairs were higher among stroke survivors aged 40 years and above (p < 0.0001). Interestingly, black/African American and multiracial respondents had higher odds of reporting difficulty with walking and stairs than whites, whereas Hispanic respondents had lower odds of reporting difficulty with walking and stairs than whites (p < 0.0001). Further analyses revealed that the disparity of physical function was preserved (p < 0.0001) after adjusting for age, race, sex, education level, family income, marital status, employment status, health insurance status, affordability of healthcare, and length of time from last doctor's visit. There were racial/ethnic disparities in physical function. Specifically, blacks/ African Americans had a 5.6% increase in the odds of reporting difficulty with walking and stairs than whites. Moreover, Hispanics reported significantly fewer problems than whites. Overall, similar sociocultural patterns in non-stroke and stroke populations were observed in this study. Copyright © 2017 The Royal Society for Public Health. Published by Elsevier Ltd. All rights reserved.
Selective loss of verbal imagery.
Mehta, Z; Newcombe, F
1996-05-01
This single case study of the ability to generate verbal and non-verbal imagery in a woman who sustained a gunshot wound to the brain reports a significant difficulty in generating images of word shapes but not a significant problem in generating object images. Further dissociation, however, was observed in her ability to generate images of living vs non-living material. She made more errors in imagery and factual information tasks for non-living items than for living items. This pattern contrasts with our previous report of the agnosic patient, M.S., who had severe difficulty in generating images of living material, whereas his ability to image the shape of words was comparable to that of normal control subjects. Furthermore, with regard to the generation of images of living compared with non-living material, M.S. shows more errors with living than nonliving items. In contrast, the present patient, S.M., made significantly more errors with non-living relative to living items. There appear to be two types of double dissociation which reinforce the growing evidence of dissociable impairments in the ability to generate images for different types of verbal and non-verbal material. Such dissociations, presumably related to sensory and cognitive processing demands, address the problem of the neural basis of imagery.
Psychometric properties of a scale to measure alexithymia.
Blanchard, E B; Arena, J G; Pallmeyer, T P
1981-01-01
Four studies were conducted on a sample of 230 undergraduates to determine the psychometric properties of a measure of alexithymia, the Schalling-Sifneos Scale. In the first study it was found that scores on the scale are approximately normally distributed for each sex with 8.2% of males and 1.8% of females in the alexithymia range. In the second study a factor analysis of the scale revealed three distinct factors: (1) 'difficulty in expression of feelings'; (2) 'the importance of feelings especially about people'; (3) 'day-dreaming or introspection'. In the second factor analytic study, scores from several standard psychological tests on the same subjects were introduced with the scale items. Two factors in this analysis were comprised almost entirely of the other test scores: a 'general psychological distress factor' and a 'concerns about physical symptoms factor'. The other two factors were similar to factors 1 and 2 above in terms of items. The Rathus Assertiveness Scale loaded positively on the equivalent of factor 1. In the lst study, it was shown that Schalling-Sifneos Scale score is relatively orthogonal to other psychological tests with the exception of a Psychosomatic Symptom Checklist and thus is measuring something other than depression, anxiety, etc.
de Castro, A B; Rue, Tessa; Takeuchi, David T
2010-01-01
This study examined the associations between employment frustration and both self-rated physical health (SRPH) and self-rated mental health (SRMH) among Asian American immigrants. A cross-sectional quantitative analysis was conducted utilizing data from 1,181 Asian immigrants participating in the National Latino and Asian American Study. Employment frustration was measured by self-report of having difficulty finding the work one wants because of being of Asian descent. SRPH and SRMH were each assessed using a global one-item measure, with responses ranging from poor to excellent. Control variables included gender, age, ethnicity, education, occupation, income, whether immigrated for employment, years in the United States, English proficiency, and a general measure for everyday discrimination. Ordered logistic regression showed that employment frustration was negatively associated with SRPH. This relationship, however, was no longer significant in multivariate models including English proficiency. The negative association between employment frustration and SRMH persisted even when including all control variables. The findings suggest that Asian immigrants in the United States who experience employment frustration report lower levels of both physical and mental health. However, English proficiency may attenuate the relationship of employment frustration with physical health. © 2010 Wiley Periodicals, Inc.
An Analysis of the Connectedness to Nature Scale Based on Item Response Theory.
Pasca, Laura; Aragonés, Juan I; Coello, María T
2017-01-01
The Connectedness to Nature Scale (CNS) is used as a measure of the subjective cognitive connection between individuals and nature. However, to date, it has not been analyzed at the item level to confirm its quality. In the present study, we conduct such an analysis based on Item Response Theory. We employed data from previous studies using the Spanish-language version of the CNS, analyzing a sample of 1008 participants. The results show that seven items presented appropriate indices of discrimination and difficulty, in addition to a good fit. The remaining six have inadequate discrimination indices and do not present a good fit. A second study with 321 participants shows that the seven-item scale has adequate levels of reliability and validity. Therefore, it would be appropriate to use a reduced version of the scale after eliminating the items that display inappropriate behavior, since they may interfere with research results on connectedness to nature.
Short-term memory in autism spectrum disorder.
Poirier, Marie; Martin, Jonathan S; Gaigg, Sebastian B; Bowler, Dermot M
2011-02-01
Three experiments examined verbal short-term memory in comparison and autism spectrum disorder (ASD) participants. Experiment 1 involved forward and backward digit recall. Experiment 2 used a standard immediate serial recall task where, contrary to the digit-span task, items (words) were not repeated from list to list. Hence, this task called more heavily on item memory. Experiment 3 tested short-term order memory with an order recognition test: Each word list was repeated with or without the position of 2 adjacent items swapped. The ASD group showed poorer performance in all 3 experiments. Experiments 1 and 2 showed that group differences were due to memory for the order of the items, not to memory for the items themselves. Confirming these findings, the results of Experiment 3 showed that the ASD group had more difficulty detecting a change in the temporal sequence of the items. (c) 2010 APA, all rights reserved.
A new item response theory model to adjust data allowing examinee choice
Costa, Marcelo Azevedo; Braga Oliveira, Rivert Paulo
2018-01-01
In a typical questionnaire testing situation, examinees are not allowed to choose which items they answer because of a technical issue in obtaining satisfactory statistical estimates of examinee ability and item difficulty. This paper introduces a new item response theory (IRT) model that incorporates information from a novel representation of questionnaire data using network analysis. Three scenarios in which examinees select a subset of items were simulated. In the first scenario, the assumptions required to apply the standard Rasch model are met, thus establishing a reference for parameter accuracy. The second and third scenarios include five increasing levels of violating those assumptions. The results show substantial improvements over the standard model in item parameter recovery. Furthermore, the accuracy was closer to the reference in almost every evaluated scenario. To the best of our knowledge, this is the first proposal to obtain satisfactory IRT statistical estimates in the last two scenarios. PMID:29389996
Development of the Serenity Scale.
Roberts, K T; Aspy, C B
1993-01-01
Serenity is a sustained inner peace. Nurses can use knowledge about serenity to help clients cope with harsh circumstances. The Serenity Scale is a 40-item self-report, summated scale that evaluates clients' serenity status. Critical attributes, identified by serenity experts, served as the theoretical framework. Sixty-five items were given to 542 male and female subjects age 20 to 95 (73% Caucasians and 27% minority) from varying income and educational levels yielding an alpha of .93. Forty items (SS.V2) were extracted for further analysis. The alpha coefficient was .92 with item-to-total correlations ranging from .25 to .67. Item means ranged from 2.6-3.7 (grand mean = 3.4). A principal components factor analysis with varimax rotation revealed nine factors explaining 58.2% of the variance. Limitations are that SS.V2 has not been tested with an independent sample and subjects with low educational levels had difficulty with some items.
Introductory Physics Students' Physics and Mathematics Epistemologies
ERIC Educational Resources Information Center
Scanlon, Erin M.
2017-01-01
The purpose of this three study dissertation is to investigate why students are enrolled in introductory physics courses experience difficulties in being successful; one possible source of their difficulties is related to their epistemology. In order to investigate students' epistemologies about mathematics and physics, students were observed…
Psychological distress, television viewing, and physical activity in children aged 4 to 12 years.
Hamer, Mark; Stamatakis, Emmanuel; Mishra, Gita
2009-05-01
Sedentary behavior and physical activity may be independent risk factors for psychological distress in adolescents, although there is no existing information for children. We examined the cross-sectional association between psychological distress, television and screen entertainment time, and physical activity levels among a representative sample of children aged 4 to 12 years from the 2003 Scottish Health Survey. Participants were 1486 boys and girls (mean age: 8.5 +/- 2.3 years). Parents answered on behalf of children who were required to be present. The parents completed the Strengths and Difficulties Questionnaire and information on television and screen entertainment time, physical activity, and dietary intake of their children. An abnormally high Strengths and Difficulties Questionnaire total difficulties score (20-40) was found in 4.2% of the sample. Approximately 25% of the children were exposed to television and screen entertainment at least 3 hours/day. In general linear models, television and screen entertainment time per week and physical activity levels were independently associated with the Strengths and Difficulties Questionnaire total difficulties score after adjustment for age, gender, area deprivation level, single-parent status, medical conditions, and various dietary intake indicators. There was also an additive interaction effect showing that the combination of high television and screen entertainment time and low physical activity was associated with the highest Strengths and Difficulties Questionnaire score. Higher television and screen entertainment exposure (>2.7 hours/day) alone resulted in a 24% increase in the Strengths and Difficulties Questionnaire score in comparison with lower television and screen entertainment exposure (<1.6 hours/day), although when combined with low physical activity this resulted in a 46% increase. Higher levels of television and screen entertainment time and low physical activity levels interact to increase psychological distress in young children.
Reading Ability and Print Exposure: Item Response Theory Analysis of the Author Recognition Test
Moore, Mariah; Gordon, Peter C.
2015-01-01
In the Author Recognition Test (ART) participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, with this predictive ability generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. This large-scale study (1012 college student participants) used Item Response Theory (IRT) to analyze item (author) characteristics to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and to optimize scoring of the ART. Factor analysis suggests a potential two factor structure of the ART differentiating between literary vs. popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of time spent encoding words as measured using eye-tracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Further, they show that frequency data can be used to select items of appropriate difficulty and that frequency data from corpora based on particular time periods and types of text may allow test adaptation for different populations. PMID:25410405
Reading ability and print exposure: item response theory analysis of the author recognition test.
Moore, Mariah; Gordon, Peter C
2015-12-01
In the author recognition test (ART), participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, and this predictive ability is generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. In this large-scale study (1,012 college student participants), we used item response theory (IRT) to analyze item (author) characteristics in order to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and optimize scoring of the ART. Factor analysis suggested a potential two-factor structure of the ART, differentiating between literary and popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of the time spent encoding words, as measured using eyetracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Furthermore, they show that frequency data can be used to select items of appropriate difficulty, and that frequency data from corpora based on particular time periods and types of texts may allow adaptations of the test for different populations.
Sousa, Renata M; Dewey, Michael E; Acosta, Daisy; Jotheeswaran, AT; Castro-Costa, Erico; Ferri, Cleusa P; Guerra, Mariella; Huang, Yueqin; Jacob, KS; Pichardo, Juana Guillermina Rodriguez; Ramírez, Nayeli Garcia; Rodriguez, Juan Llibre; Rodriguez, Marina Calvo; Salas, Aquiles; Sosa, Ana Luisa; Williams, Joseph; Prince, Martin J
2010-01-01
We evaluated the psychometric properties of the 12-item interviewer-administered screener version of the World Health Organization Disability Assessment Schedule – version II (WHODAS II) among older people living in seven low- and middle-income countries. Principal component analysis (PCA), confirmatory factor analysis (CFA) and Mokken analyses were carried out to test for unidimensionality, hierarchical structure, and measurement invariance across 10/66 Dementia Research Group sites. PCA generated a one-factor solution in most sites. In CFA, the two-factor solution generated in Dominican Republic fitted better for all sites other than rural China. The two factors were not easily interpretable, and may have been an artefact of differing item difficulties. Strong internal consistency and high factor loadings for the one-factor solution supported unidimensionality. Furthermore, the WHODAS II was found to be a ‘strong’ Mokken scale. Measurement invariance was supported by the similarity of factor loadings across sites, and by the high between-site correlations in item difficulties. The Mokken results strongly support that the WHODAS II 12-item screener is a unidimensional and hierarchical scale confirming to item response theory (IRT) principles, at least at the monotone homogeneity model level. More work is needed to assess the generalizability of our findings to different populations. Copyright © 2010 John Wiley & Sons, Ltd. PMID:20104493
Powell, Sarah R.; Fuchs, Lynn S.
2014-01-01
According to national mathematics standards, algebra instruction should begin at kindergarten and continue through elementary school. Most often, teachers address algebra in the elementary grades with problems related to solving equations or understanding functions. With 789 2nd- grade students, we administered (a) measures of calculations and word problems in the fall and (b) an assessment of pre-algebraic reasoning, with items that assessed solving equations and functions, in the spring. Based on the calculation and word-problem measures, we placed 148 students into 1 of 4 difficulty status categories: typically performing, calculation difficulty, word-problem difficulty, or difficulty with calculations and word problems. Analyses of variance were conducted on the 148 students; path analytic mediation analyses were conducted on the larger sample of 789 students. Across analyses, results corroborated the finding that word-problem difficulty is more strongly associated with difficulty with pre-algebraic reasoning. As an indicator of later algebra difficulty, word-problem difficulty may be a more useful predictor than calculation difficulty, and students with word-problem difficulty may require a different level of algebraic reasoning intervention than students with calculation difficulty. PMID:25309044
Methodology for the development and calibration of the SCI-QOL item banks
Tulsky, David S.; Kisala, Pamela A.; Victorson, David; Choi, Seung W.; Gershon, Richard; Heinemann, Allen W.; Cella, David
2015-01-01
Objective To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Methods Individual interviews (n = 44) and focus groups (n = 65 individuals with SCI and n = 42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n = 877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n = 245) to assess test-retest reliability and stability. Participants and Procedures A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. Results We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury – Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. Conclusions The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM. PMID:26010963
Methodology for the development and calibration of the SCI-QOL item banks.
Tulsky, David S; Kisala, Pamela A; Victorson, David; Choi, Seung W; Gershon, Richard; Heinemann, Allen W; Cella, David
2015-05-01
To develop a comprehensive, psychometrically sound, and conceptually grounded patient reported outcomes (PRO) measurement system for individuals with spinal cord injury (SCI). Individual interviews (n=44) and focus groups (n=65 individuals with SCI and n=42 SCI clinicians) were used to select key domains for inclusion and to develop PRO items. Verbatim items from other cutting-edge measurement systems (i.e. PROMIS, Neuro-QOL) were included to facilitate linkage and cross-population comparison. Items were field tested in a large sample of individuals with traumatic SCI (n=877). Dimensionality was assessed with confirmatory factor analysis. Local item dependence and differential item functioning were assessed, and items were calibrated using the item response theory (IRT) graded response model. Finally, computer adaptive tests (CATs) and short forms were administered in a new sample (n=245) to assess test-retest reliability and stability. A calibration sample of 877 individuals with traumatic SCI across five SCI Model Systems sites and one Department of Veterans Affairs medical center completed SCI-QOL items in interview format. We developed 14 unidimensional calibrated item banks and 3 calibrated scales across physical, emotional, and social health domains. When combined with the five Spinal Cord Injury--Functional Index physical function banks, the final SCI-QOL system consists of 22 IRT-calibrated item banks/scales. Item banks may be administered as CATs or short forms. Scales may be administered in a fixed-length format only. The SCI-QOL measurement system provides SCI researchers and clinicians with a comprehensive, relevant and psychometrically robust system for measurement of physical-medical, physical-functional, emotional, and social outcomes. All SCI-QOL instruments are freely available on Assessment CenterSM.
Exploratory Item Classification Via Spectral Graph Clustering
Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang
2017-01-01
Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476
Identifying items to assess methodological quality in physical therapy trials: a factor analysis.
Armijo-Olivo, Susan; Cummings, Greta G; Fuentes, Jorge; Saltaji, Humam; Ha, Christine; Chisholm, Annabritt; Pasichnyk, Dion; Rogers, Todd
2014-09-01
Numerous tools and individual items have been proposed to assess the methodological quality of randomized controlled trials (RCTs). The frequency of use of these items varies according to health area, which suggests a lack of agreement regarding their relevance to trial quality or risk of bias. The objectives of this study were: (1) to identify the underlying component structure of items and (2) to determine relevant items to evaluate the quality and risk of bias of trials in physical therapy by using an exploratory factor analysis (EFA). A methodological research design was used, and an EFA was performed. Randomized controlled trials used for this study were randomly selected from searches of the Cochrane Database of Systematic Reviews. Two reviewers used 45 items gathered from 7 different quality tools to assess the methodological quality of the RCTs. An exploratory factor analysis was conducted using the principal axis factoring (PAF) method followed by varimax rotation. Principal axis factoring identified 34 items loaded on 9 common factors: (1) selection bias; (2) performance and detection bias; (3) eligibility, intervention details, and description of outcome measures; (4) psychometric properties of the main outcome; (5) contamination and adherence to treatment; (6) attrition bias; (7) data analysis; (8) sample size; and (9) control and placebo adequacy. Because of the exploratory nature of the results, a confirmatory factor analysis is needed to validate this model. To the authors' knowledge, this is the first factor analysis to explore the underlying component items used to evaluate the methodological quality or risk of bias of RCTs in physical therapy. The items and factors represent a starting point for evaluating the methodological quality and risk of bias in physical therapy trials. Empirical evidence of the association among these items with treatment effects and a confirmatory factor analysis of these results are needed to validate these items. © 2014 American Physical Therapy Association.
Ginieri-Coccossis, M; Triantafillou, E; Tomaras, V; Soldatos, C; Mavreas, V; Christodoulou, G
2012-01-01
Τhe present study examines main psychometric properties of the World Health Organisation (WHO) quality of life (QoL) instrument, the WHOQOL-BREF with the inclusion of four national items. Participants were 425 adult native Greek speaking, grouped into patients with physical disorders, psychiatric disorders and healthy individuals. Participants were administered WHOQOL-BREF and 23 national items, the General Health Questionnaire (GHQ-28) and the Life Satisfaction Index (LSI). Confirmatory factor analysis produced acceptable fit values for the original model of 26 items within the four WHOQOL domains: physical health, psychological health, social relationships and environment. Testing for the fit of national items within this model, the results indicated four new items with the most satisfactory fit indices and were thus included forming a 30-items version. The national items refer to: (a) nutrition, (b) satisfaction with work (both loaded in the physical health domain), (c) home life and (d) social life (both loaded in the social relationships domain). Statistical tests were applied to the 26- and 30-items versions producing satisfactory results, with the 30-items version showing slightly better values. Furthermore, results on the 30-items version included: (a) internal consistency, which was found satisfactory, with alpha values ranging from α=0.67-0.81, while the inclusion of new items produced higher alpha values in physical health and social relationships domains, (b) construct validity with good item-domain correlations, as well as strong correlations between domain scores, (c) convergent validity, which was very satisfactory, showing good correlations with GHQ-28 and LSI, (d) discriminant validity, showing instrument's ability to detect QoL differences between healthy and unhealthy participants, and between physically ill and psychiatric patients, and (e) test-retest reliability, with ICC scores in excess of 0.80 obtaining for all domains. The WHOQOL-BREF Greek version was found to perform well with sick and healthy participants, demonstrating satisfactory psychometric properties. Use of the instrument may be recommended for clinical and general populations, for service or intervention evaluation, as well as for cross-cultural clinical trials.
NASA Astrophysics Data System (ADS)
Williamson, Kathryn
2014-01-01
The topic of Newtonian gravity offers a unique perspective from which to investigate and encourage conceptual change because it is something with which everyone has daily experience, and because it is taught in two courses that reach a variety of students - introductory college astronomy (‘Astro 101’) and physics (‘Phys 101’). Informed by the constructivist theory of learning, this study characterizes and measures Astro 101 and Phys 101 students’ understanding of Newtonian gravity within four conceptual domains - Directionality, Force Law, Independence of Other Forces, and Threshold. A phenomenographic analysis of student-supplied responses to open-ended questions about gravity resulted in characterization of students’ alternative models and misapplications of the scientific model. These student difficulties informed the development of a multiple-choice assessment instrument, the Newtonian Gravity Concept Inventory (NGCI). Classical Test Theory (CTT), student interviews, and expert review show that the NGCI is a reliable and valid tool for assessing both Astro 101 and Phys 101 students’ understanding of gravity. Furthermore, the NGCI can provide extensive and robust information about differences between Astro 101 and Phys 101 students and curricula. Comparing and contrasting CTT values and response patterns shows qualitative differences in each of the four conceptual domains. Additionally, performing an Item Response Theory (IRT) analysis calibrates item parameters for all Astro 101 and Phys 101 courses and provides Newtonian gravity ability estimates for each student. Physics students show significantly higher pre- and post-instruction IRT abilities than astronomy students, but they show approximately equal gains. Linear regression models that control for student characteristics and classroom dynamics show that: (1) differences in post-instruction abilities are most influenced by students’ pre-instruction abilities and the level of interactivity in the classroom, and (2) there is no differential effect of the astronomy curriculum compared to the physics curriculum on student’s overall post-instruction Newtonian gravity abilities.
Quantifying the physical, social and attitudinal environment of children with cerebral palsy.
Dickinson, Heather O; Colver, Allan
2011-01-01
To develop an instrument to represent the availability of needed environmental features (EFs) in the physical, social and attitudinal environment of home, school and community for children with cerebral palsy. Following a literature review and qualitative studies, the European Child Environment Questionnaire (ECEQ) was developed to capture whether EFs needed by children with cerebral palsy were available to them: 24, 24 and 12 items related to the physical, social and attitudinal environments, respectively. The ECEQ was administered to parents of 818 children with cerebral palsy aged 8-12 years, in seven European countries. A domain structure was developed using factor analysis. Parents responded to 98% of items. Seven items were omitted from statistical models as the EFs they referred to were available to most children who needed them; two items were omitted as they did not fit well into plausible domains. The final domains, based on 51 items, were: Transport, Physical - home, Physical - community, Physical - school, Social support - home, Social support - community, Attitudes - family and friends, Attitudes - teachers and therapists, Attitudes - classmates. ECEQ was acceptable to parents and can be used to assess both the access children with cerebral palsy have to the EFs that they need and how available individual EFs are.
Chien, Chi-Wen; Brown, Ted; McDonald, Rachael
2012-04-01
The Assessment of Children's Hand Skills is a new assessment that utilises a naturalistic observational method to capture children's real-life hand skill performance when engaged at various types of daily activities in everyday living contexts. The Assessment of Children's Hand Skills is designed for use with 2- to 12-year-old children with a range of disabilities or health conditions. The study aimed to investigate construct validity of the Assessment of Children's Hand Skills in Australian children. Rasch analysis was used to examine internal construct validity of the Assessment of Children's Hand Skills in a mixed sample of 53 children with disabilities (including autism spectrum disorder, developmental/genetic disorders and physical disabilities) and 85 typically developing children. External construct validity was examined by correlating with three questionnaires evaluating daily living skills and hand skills. Rasch goodness-of-fit analysis suggested that all 22 activity items and 19 of 20 hand skill items in the Assessment of Children's Hand Skills measured a single construct. The Assessment of Children's Hand Skills items were placed in a clinically meaningful hierarchy from easy to hard, and the difficulty range of the items also matched the majority of children with disabilities and typically developing preschool-aged children. Moderate to high correlations (0.59 ≤ Spearman's ρ coefficients ≤ 0.89, P < 0.01) were found with the assessments of daily living and fine motor skills. This study provided preliminary evidence supporting the construct validity of the Assessment of Children's Hand Skills for its clinical application in assessing children's real-life hand skill performance in Australian contexts. © 2012 The Authors Australian Occupational Therapy Journal © 2012 Occupational Therapy Australia.
Choi, Bongkyoo; Kurowski, Alicia; Bond, Meg; Baker, Dean; Clays, Els; De Bacquer, Dirk; Punnett, Laura
2012-01-01
The construct validity of the Job Content Questionnaire (JCQ) psychological demands scale in relationship to physical demands has been inconsistent. This study aims to test quantitatively and qualitatively whether the scale validity differs by occupation. Hierarchical clustering analyses of 10 JCQ psychological and physical demands items were conducted in 61 occupations from two datasets: one of non-faculty workers at a university in the United States (6 occupations with 208 total workers) and the other of a Belgian working population (55 occupations with 13,039 total workers). The psychological and physical demands items overlapped in 13 of 61 occupation-stratified clustering analyses. Most of the overlaps occurred in physically-demanding occupations and involved the two psychological demands items, 'work fast' and 'work hard'. Generally, the scale reliability was low in such occupations. Additionally, interviews with eight university workers revealed that workers interpreted the two psychological demands items differently by the nature of their tasks. The scale validity was occupation-differential. The JCQ psychological job demands scale as a job demand measure has been used worldwide in many studies. This study indicates that the wordings of the 'work fast' and 'work hard' items of the scale need to be reworded enough to differentiate mental and physical job demands as intended, 'psychological.'
ERIC Educational Resources Information Center
Kawahara, Jun-ichiro; Enns, James T.
2009-01-01
When observers try to identify successive targets in a visual stream at a rate of 100 ms per item, accuracy for the 2nd target is impaired for intertarget lags of 100-500 ms. Yet, when the same stream is presented more rapidly (e.g., 50 ms per item), this pattern reverses and a 1st-target deficit is obtained. M. C. Potter, A. Staub, and D. H.…
Validation of Physics Standardized Test Items
NASA Astrophysics Data System (ADS)
Marshall, Jill
2008-10-01
The Texas Physics Assessment Team (TPAT) examined the Texas Assessment of Knowledge and Skills (TAKS) to determine whether it is a valid indicator of physics preparation for future course work and employment, and of the knowledge and skills needed to act as an informed citizen in a technological society. We categorized science items from the 2003 and 2004 10th and 11th grade TAKS by content area(s) covered, knowledge and skills required to select the correct answer, and overall quality. We also analyzed a 5000 student sample of item-level results from the 2004 11th grade exam using standard statistical methods employed by test developers (factor analysis and Item Response Theory). Triangulation of our results revealed strengths and weaknesses of the different methods of analysis. The TAKS was found to be only weakly indicative of physics preparation and we make recommendations for increasing the validity of standardized physics testing..
When students can choose easy, medium, or hard homework problems
NASA Astrophysics Data System (ADS)
Teodorescu, Raluca E.; Seaton, Daniel T.; Cardamone, Caroline N.; Rayyan, Saif; Abbott, Jonathan E.; Barrantes, Analia; Pawl, Andrew; Pritchard, David E.
2012-02-01
We investigate student-chosen, multi-level homework in our Integrated Learning Environment for Mechanics [1] built using the LON-CAPA [2] open-source learning system. Multi-level refers to problems categorized as easy, medium, and hard. Problem levels were determined a priori based on the knowledge needed to solve them [3]. We analyze these problems using three measures: time-per-problem, LON-CAPA difficulty, and item difficulty measured by item response theory. Our analysis of student behavior in this environment suggests that time-per-problem is strongly dependent on problem category, unlike either score-based measures. We also found trends in student choice of problems, overall effort, and efficiency across the student population. Allowing students choice in problem solving seems to improve their motivation; 70% of students worked additional problems for which no credit was given.
Sources of Interactional Problems in a Survey of Racial/Ethnic Discrimination
Johnson, Timothy P.; Shariff-Marco, Salma; Willis, Gordon; Cho, Young Ik; Breen, Nancy; Gee, Gilbert C.; Krieger, Nancy; Grant, David; Alegria, Margarita; Mays, Vickie M.; Williams, David R.; Landrine, Hope; Liu, Benmei; Reeve, Bryce B.; Takeuchi, David; Ponce, Ninez A.
2014-01-01
Cross-cultural variability in respondent processing of survey questions may bias results from multiethnic samples. We analyzed behavior codes, which identify difficulties in the interactions of respondents and interviewers, from a discrimination module contained within a field test of the 2007 California Health Interview Survey. In all, 553 (English) telephone interviews yielded 13,999 interactions involving 22 items. Multilevel logistic regression modeling revealed that respondent age and several item characteristics (response format, customized questions, length, and first item with new response format), but not race/ethnicity, were associated with interactional problems. These findings suggest that item function within a multi-cultural, albeit English language, survey may be largely influenced by question features, as opposed to respondent characteristics such as race/ethnicity. PMID:26166949
Nathan, Nicole; Wolfenden, Luke; Morgan, Philip J; Bell, Andrew C; Barker, Daniel; Wiggers, John
2013-06-13
Valid tools measuring characteristics of the school environment associated with the physical activity and dietary behaviours of children are needed to accurately evaluate the impact of initiatives to improve school environments. The aim of this study was to assess the validity of Principal self-report of primary school healthy eating and physical activity environments. Primary school Principals (n = 42) in New South Wales, Australia were invited to complete a telephone survey of the school environment; the School Environment Assessment Tool - SEAT. Equivalent observational data were collected by pre-service teachers located within the school. The SEAT, involved 65 items that assessed food availability via canteens, vending machines and fundraisers and the presence of physical activity facilities, equipment and organised physical activities. Kappa statistics were used to assess agreement between the two measures. Almost 70% of the survey demonstrated moderate to almost perfect agreement. Substantial agreement was found for 10 of 13 items assessing foods sold for fundraising, 3 of 6 items assessing physical activity facilities of the school, and both items assessing organised physical activities that occurred at recess and lunch and school sport. Limited agreement was found for items assessing foods sold through canteens and access to small screen recreation. The SEAT provides researchers and policy makers with a valid tool for assessing aspects of the school food and physical activity environment.
Spinal cord injury rehabilitation patient and physical therapist perspective: a pilot study.
Sliwinski, Martha M; Smith, Ryan; Wood, Andrea
2016-01-01
The objectives of this retrospective observational study were to explore physical therapists' perceived involvement of patients with SCI in physical therapy (PT) rehabilitation, second to explore individuals with SCI perceived involvement in PT rehabilitation, third to compare how patients and physical therapists perceive involvement in PT rehabilitation and last to explore patients' perceived involvement with satisfaction with life (SWL). This study was conducted in the United States. Two 11-item questionnaires were designed one for physical therapists and one for patients. The items were rated on a Likert-type agreement scale. Thirty physical therapists completed the patient involvement questionnaire for physical therapists and nine individuals with SCI completed the patient involvement questionnaire and SWL scale. We certify that all applicable governmental and institutional guidelines were followed during the course of this research. The results indicated that both physical therapists and patients were overall in agreement that patients were involved in their PT rehabilitation on most items. The two items that received the lowest Likert scores by the therapists and patients were friends and family involvement in therapy and gender-related issues. The item, individualized patient goals, received the largest discrepancy between therapists and patients. The sample size was too small to observe a trend with SWL and perceived involvement. Patients and PTs from this pilot overall agree patients are included in treatment; however, the discrepancy in scores related to individualized goals requires further research.
Diviani, Nicola; Dima, Alexandra Lelia; Schulz, Peter Johannes
2017-04-11
The eHealth Literacy Scale (eHEALS) is a tool to assess consumers' comfort and skills in using information technologies for health. Although evidence exists of reliability and construct validity of the scale, less agreement exists on structural validity. The aim of this study was to validate the Italian version of the eHealth Literacy Scale (I-eHEALS) in a community sample with a focus on its structural validity, by applying psychometric techniques that account for item difficulty. Two Web-based surveys were conducted among a total of 296 people living in the Italian-speaking region of Switzerland (Ticino). After examining the latent variables underlying the observed variables of the Italian scale via principal component analysis (PCA), fit indices for two alternative models were calculated using confirmatory factor analysis (CFA). The scale structure was examined via parametric and nonparametric item response theory (IRT) analyses accounting for differences between items regarding the proportion of answers indicating high ability. Convergent validity was assessed by correlations with theoretically related constructs. CFA showed a suboptimal model fit for both models. IRT analyses confirmed all items measure a single dimension as intended. Reliability and construct validity of the final scale were also confirmed. The contrasting results of factor analysis (FA) and IRT analyses highlight the importance of considering differences in item difficulty when examining health literacy scales. The findings support the reliability and validity of the translated scale and its use for assessing Italian-speaking consumers' eHealth literacy. ©Nicola Diviani, Alexandra Lelia Dima, Peter Johannes Schulz. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 11.04.2017.
Rodrigues-Bigaton, Delaine; de Castro, Ester M; Pires, Paulo F
Rasch analysis has been used in recent studies to test the psychometric properties of a questionnaire. The conditions for use of the Rasch model are one-dimensionality (assessed via prior factor analysis) and local independence (the probability of getting a particular item right or wrong should not be conditioned upon success or failure in another). To evaluate the dimensionality and the psychometric properties of the Fonseca anamnestic index (FAI), such as the fit of the data to the model, the degree of difficulty of the items, and the ability to respond in patients with myogenous temporomandibular disorder (TMD). The sample consisted of 94 women with myogenous TMD, diagnosed by the Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD), who answered the FAI. For the factor analysis, we applied the Kaiser-Meyer-Olkin test, Bartlett's sphericity, Spearman's correlation, and the determinant of the correlation matrix. For extraction of the factors/dimensions, an eigenvalue >1.0 was used, followed by oblique oblimin rotation. The Rasch analysis was conducted on the dimension that showed the highest proportion of variance explained. Adequate sample "n" and FAI multidimensionality were observed. Dimension 1 (primary) consisted of items 1, 2, 3, 6, and 7. All items of dimension 1 showed adequate fit to the model, being observed according to the degree of difficulty (from most difficult to easiest), respectively, items 2, 1, 3, 6, and 7. The FAI presented multidimensionality with its main dimension consisting of five reliable items with adequate fit to the composition of its structure. Copyright © 2017 Associação Brasileira de Pesquisa e Pós-Graduação em Fisioterapia. Publicado por Elsevier Editora Ltda. All rights reserved.
Sakakibara, Brodie M.; Miller, William C.; Backman, Catherine L.
2012-01-01
Objective To explore shortened response formats for use with the Activities-specific Balance Confidence scale and then: 1) evaluate the unidimensionality of the scale; 2) evaluate the item difficulty; 3) evaluate the scale for redundancy and content gaps; and 4) evaluate the item standard error of measurement (SEM) and internal consistency reliability among aging individuals (≥50 years) with a lower-limb amputation living in the community. Design Secondary analysis of cross-sectional survey and chart review data. Setting Out-patient amputee clinics, Ontario, Canada. Participants Four hundred forty eight community living adults, at least 50 years old (mean = 68 years), who have used a prosthesis for at least 6 months for a major unilateral lower limb amputation. Three hundred twenty five (72.5%) were men. Intervention N/a Main Outcome Measure(s) Activities-specific Balance Confidence Scale. Results A 5-option response format outperformed 4- and 6-option formats. Factor analyses confirmed a unidimensional scale. The distance between response options is not the same for all items on the scale, evident by the Partial Credit Model (PCM) having a better fit to the data than the Rating Scale Model. Two items, however, did not fit the PCM within statistical reason. Revising the wording of the two items may resolve the misfit, and improve the construct validity and lower the SEM. Overall, the difficulty of the scale’s items is appropriate for use with aging individuals with lower-limb amputation, and is most reliable (Cronbach ∝ = 0.94) for use with individuals with moderately low balance confidence levels. Conclusions The ABC-scale with a simplified 5-option response format is a valid and reliable measure of balance confidence for use with individuals aging with a lower limb amputation. PMID:21704978
Cairnduff, Victoria; Dean, Moira; Koidis, Anastasios
2016-09-01
Food preparation and storage behaviors in the home deviating from the "best practice" food safety recommendations may result in foodborne illnesses. Currently, there are limited tools available to fully evaluate the consumer knowledge, perceptions, and behavior in the area of refrigerator safety. The current study aimed to develop a valid and reliable tool in the form of a questionnaire, the Consumer Refrigerator Safety Questionnaire (CRSQ), for assessing systematically all these aspects. Items relating to refrigerator safety knowledge (n =17), perceptions (n =46), and reported behavior (n =30) were developed and pilot tested by an expert reference group and various consumer groups to assess face and content validity (n =20), item difficulty and consistency (n =55), and construct validity (n =23). The findings showed that the CRSQ has acceptable face and content validity with acceptable levels of item difficulty. Item consistency was observed for 12 of 15 in refrigerator safety knowledge. Further, all 5 of the subscales of consumer perceptions of refrigerator safety practices relating to risk of developing foodborne disease showed acceptable internal consistency (Cronbach's α value > 0.8). Construct validity of the CRSQ was shown to be very good (P = 0.022). The CRSQ exhibited acceptable test-retest reliability at 14 days with the majority of knowledge items (93.3%) and reported behavior items (96.4%) having correlation coefficients of greater than 0.70. Overall, the CRSQ was deemed valid and reliable in assessing refrigerator safety knowledge and behavior; therefore, it has the potential for future use in identifying groups of individuals at increased risk of deviating from recommended refrigerator safety practices, as well as the assessment of refrigerator safety knowledge and behavior for use before and after an intervention.
Fries, J F; Bruce, B; Bjorner, J; Rose, M
2006-01-01
Objectives Patient reported outcomes (PROs) have become standard study endpoints. However, little attention has been given to using item improvement to advance PRO performance which could improve precision, clarity, patient relevance, and information content of “physical function/disability” items and thus the performance of resulting instruments. Methods The present study included1860 physical function/disability items from 165 instruments. Item formulations were assessed by frequency of use, modified Delphi consensus, respondent judgement of clarity and importance, and item response theory (IRT). Data from 1100 rheumatoid arthritis, osteoarthritis, and normal ageing subjects, using qualitative item review, focus groups, cognitive interviews, and patient survey were used to achieve a unique item pool that was clear, reliable, sensitive to change, readily translatable, devoid of floor and ceiling limitations, contained unidimensional subdomains, and had maximal information content. Results A “present tense” time frame was used most frequently, better understood, more readily translated, and more directly estimated the latent trait of disability. Items in the “past tense” had 80–90% false negatives (p<0.001). The best items were brief, clear, and contained a single construct. Responses with four to five options were preferred by both experts and respondents. The term physical function may be preferable to the term disability because of fewer floor effects. IRT analyses of “disability” suggest four independent subdomains (mobility, dexterity, axial, and compound) with factor loadings of 0.81–0.99. Conclusions Major improvement in performance of items and instruments is possible, and may have the effect of substantially reducing sample size requirements for clinical trials. PMID:17038464
Barile, John P; Horner-Johnson, Willi; Krahn, Gloria; Zack, Matthew; Miranda, David; DeMichele, Kimberly; Ford, Derek; Thompson, William W
2016-10-01
The Short Form Health Survey (SF-36) and the Centers for Disease Control and Prevention (CDC) Healthy Days items are well known measures of health-related quality of life. The validity of the SF-36 for older adults and those with disabilities has been questioned. Assess the extent to which the SF-36 and the Centers for Disease Control and Prevention (CDC) Healthy Days items measure the same aspects of health; whether the SF-36 and the CDC unhealthy days items are invariant across gender, functional status, or the presence of chronic health conditions of older adults; and whether each of the SF-36's eight subscales is independently associated with the CDC Healthy Days items. We analyzed data from 66,269 adult Medicare advantage members age 65 and older. We used confirmatory factor analyses and regression modeling to test associations between the CDC Healthy Days items and subscales of the SF-36. The CDC Healthy Days items were associated with the SF-36 global measures of physical and mental health. The CDC physically unhealthy days item was associated with the SF-36 subscales for bodily pain, physical role limitations, and general health, while the CDC mentally unhealthy days item was associated with the SF-36 subscales for mental health, emotional role limitations, vitality and social functioning. The SF-36 physical functioning subscale was not independently associated with either of the CDC Healthy Days items. The CDC Healthy Days items measure similar domains as the SF-36 but appear to assess HRQOL without regard to limitations in functioning. Copyright © 2016 Elsevier Inc. All rights reserved.
Barile, John P.; Horner-Johnson, Willi; Krahn, Gloria; Zack, Matthew; Miranda, David; DeMichele, Kimberly; Ford, Derek; Thompson, William W.
2017-01-01
Background The Short Form Health Survey (SF-36) and the Centers for Disease Control and Prevention (CDC) Healthy Days items are well known measures of health-related quality of life. The validity of the SF-36 for older adults and those with disabilities has been questioned. Objective Assess the extent to which the SF-36 and the Centers for Disease Control and Prevention (CDC) Healthy Days items measure the same aspects of health; whether the SF-36 and the CDC unhealthy days items are invariant across gender, functional status, or the presence of chronic health conditions of older adults; and whether each of the SF-36’s eight subscales is independently associated with the CDC Healthy Days items. Methods We analyzed data from 66,269 adult Medicare advantage members age 65 and older. We used confirmatory factor analyses and regression modeling to test associations between the CDC Healthy Days items and subscales of the SF-36. Results The CDC Healthy Days items were associated with the SF-36 global measures of physical and mental health. The CDC physically unhealthy days item was associated with the SF-36 subscales for bodily pain, physical role limitations, and general health, while the CDC mentally unhealthy days item was associated with the SF-36 subscales for mental health, emotional role limitations, vitality and social functioning. The SF-36 physical functioning subscale was not independently associated with either of the CDC Healthy Days items. Conclusions The CDC Healthy Days items measure similar domains as the SF-36 but appear to assess HRQOL without regard to limitations in functioning. PMID:27259343
Houston, Megan N; Hoch, Johanna M; Van Lunen, Bonnie L; Hoch, Matthew C
2015-11-01
The Disablement in the Physically Active scale (DPA) is a generic patient-reported outcome designed to evaluate constructs of disability in physically active populations. The purpose of this study was to analyze the DPA scale structure for summary components. Four hundred and fifty-six collegiate athletes completed a demographic form and the DPA. A principal component analysis (PCA) was conducted with oblique rotation. Factors with eigenvalues >1 that explained >5 % of the variance were retained. The PCA revealed a two-factor structure consistent with paradigms used to develop the original DPA. Items 1-12 loaded on Factors 1 and Items 13-16 loaded on Factor 2. Items 1-12 pertain to impairment, activity limitations, and participation restrictions. Items 13-16 address psychosocial and emotional well-being. Consideration of item content suggested Factor 1 concerned physical function, while Factor 2 concerned mental well-being. Thus, items clustered around Factor 1 and 2 were identified as physical (DPA-PSC) and mental (DPA-MSC) summary components, respectively. Together, the factors accounted for 65.1 % of the variance. The PCA revealed a two-factor structure for the DPA that resulted in DPA-PSC and DPA-MSC. Analyzing the DPA as separate constructs may provide distinct information that could help to prescribe treatment and rehabilitation strategies.
Takasaki, Hiroshi; Treleaven, Julia; Johnston, Venerina; Jull, Gwendolen
2013-08-15
Cross-sectional. To conduct a preliminary analysis of the physical, cognitive, and psychological domains contributing to self-reported driving difficulty after adjusting for neck pain, dizziness, and relevant demographics in chronic whiplash-associated disorders (WAD) using hierarchical regression modeling. Pain is a risk factor for car crashes, and dizziness may affect fitness to drive. Both symptoms are common in chronic WAD and difficulty driving is a common complaint in this group. Chronic WAD is often accompanied by physical, cognitive, and psychological impairments. These impairments may contribute to self-reported driving difficulty beyond neck pain, dizziness, and relevant demographics. Forty individuals with chronic WAD participated. Dependent variables were the magnitude of self-reported driving difficulty assessed in the strategic, tactical, and operational levels of the Neck Pain Driving Index. Three models were developed to assess the contributions of independent variables (physical, cognitive, and psychological domains) to each of the 3 dependent variables after adjusting for neck pain intensity, dizziness, and driving demographics. The measures included were: physical domain-range and maximum speed of head rotation, performances during gaze stability, eye-head coordination, and visual dependency tests; cognitive domain-self-reported cognitive symptoms including fatigue and the trail making tests; and psychological domain-general stress, traumatic stress, depression, and fear of neck movements and driving. Symptom duration was relevant to driving difficulty in the strategic and tactical levels. The cognitive domain increased statistical power to estimate the strategic and operational levels (P < 0.1) beyond other contributors. The physical domain increased statistical power to estimate the tactical level (P < 0.1) beyond other contributors. Physical and cognitive impairments independently contributed to self-reported driving difficulty in chronic WAD beyond neck pain, dizziness, and symptom duration. 3.
Bansal, Minakshi; Sharma, Kamlesh K; Vatsa, Manju; Bakhshi, Sameer
2013-05-01
Data on quality of life (QOL) specifically in maintenance therapy of acute lymphoblastic leukemia (ALL) are minimal. This study was done to assess various items listed in domains of QOL (physical, emotional, social and school health domains) of children with ALL during maintenance therapy, and compare the same with those of their siblings and other healthy children. Forty children on maintenance therapy of ALL, 40 siblings and 40 healthy children were assessed for QOL by child self-report using PedsQL 4.0 Generic Core in the local language. Means were computed and compared for each domain with one-way analysis of variance (ANOVA), wherein higher values reflected better QOL. Overall QOL of children with ALL in maintenance therapy (77.16 ± 10.98) was significantly poorer than that of siblings (93.56 ± 4.41) and healthy children (93.02 ± 3.76) (p < 0.001), but their abilities of self-care, household work, exercise, attentiveness, memory and homework were unaffected. There was significantly higher absenteeism due to sickness and hospital visits, and increased emotional problems (fear, anger, sleeping problems) among children with ALL. In the social health domain, children with ALL reported difficulty in maintaining friendships and competing. QOL of siblings was as good as that of healthy children in physical, social and school health domains, but they had increased emotional problems such as anger and sadness. Healthy children reported significantly higher future worries and bullying than children with ALL and siblings. This study validated that the QOL of children with ALL during maintenance therapy was significantly poorer than that of siblings and healthy children. The study identified various items in each domain of QOL that were affected in these children, and thus would assist in guiding healthcare professionals to focus on these specific items so as to improve their overall QOL.
Three controversies over item disclosure in medical licensure examinations
Park, Yoon Soo; Yang, Eunbae B.
2015-01-01
In response to views on public's right to know, there is growing attention to item disclosure – release of items, answer keys, and performance data to the public – in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations – 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure – by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers’ right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration. PMID:26374693
[Perceptions on item disclosure for the Korean medical licensing examination].
Yang, Eunbae B
2015-09-01
This study analyzed the perceptions of medical students and faculty regarding disclosure of test items on the Korean medical licensing examination. I conducted a survey of medical students from medical colleges and professional medical schools nationwide. Responses were analyzed from 718 participants as well as 69 faculty members who participated in creating the medical licensing examination item sets. Data were analyzed using descriptive statistics and the chi-square test. It is important to maintain test quality and to keep the test items unavailable to the public. There are also concerns among students that disclosure of test items would prompt increasing difficulty of test items (48.3%). Further, few students found it desirable to disclose test items regardless of any considerations (28.5%). The professors, who had experience in designing the test items, also expressed their opposition to test item disclosure (60.9%). It is desirable not to disclose the test items of the Korean medical licensing examination to the public on the condition that students are provided with a sufficient amount of information regarding the examination. This is so that the exam can appropriately identify candidates with the required qualifications.
Rose, M; Bjorner, J B; Becker, J; Fries, J F; Ware, J E
2008-01-01
The Patient-Reported Outcomes Measurement Information System (PROMIS) was initiated to improve precision, reduce respondent burden, and enhance the comparability of health outcomes measures. We used item response theory (IRT) to construct and evaluate a preliminary item bank for physical function assuming four subdomains. Data from seven samples (N=17,726) using 136 items from nine questionnaires were evaluated. A generalized partial credit model was used to estimate item parameters, which were normed to a mean of 50 (SD=10) in the US population. Item bank properties were evaluated through Computerized Adaptive Test (CAT) simulations. IRT requirements were fulfilled by 70 items covering activities of daily living, lower extremity, and central body functions. The original item context partly affected parameter stability. Items on upper body function, and need for aid or devices did not fit the IRT model. In simulations, a 10-item CAT eliminated floor and decreased ceiling effects, achieving a small standard error (< 2.2) across scores from 20 to 50 (reliability >0.95 for a representative US sample). This precision was not achieved over a similar range by any comparable fixed length item sets. The methods of the PROMIS project are likely to substantially improve measures of physical function and to increase the efficiency of their administration using CAT.
Development of an Easy-to-Use Tool for the Assessment of Emergency Department Physical Design.
Majidi, Alireza; Tabatabaey, Ali; Motamed, Hassan; Motamedi, Maryam; Forouzanfar, Mohammad Mehdi
2014-01-01
Physical design of the emergency department (ED) has an important effect on its role and function. To date, no guidelines have been introduced to set the standards for the construction of EDs in Iran. In this study, we aim to devise an easy-to-use tool based on the available literature and expert opinion for the quick and effective assessment of EDs in regards to their physical design. For this purpose, based on current literature on emergency design, a comprehensive checklist was developed. Then, this checklist was analyzed by a panel consisting of heads of three major EDs and contradicting items were decided. 178 crude items were derived from available literature. The Items were categorized in to three major domains of Physical space, Equipment, and Accessibility. The final checklist approved by the panel consisted of 163 items categorized into six domains. Each item was phrased as a "Yes or No" question for ease of analysis, meaning that the criterion is either met or not.
Physics 300 Provincial Examination.
ERIC Educational Resources Information Center
Manitoba Dept. of Education and Training, Winnipeg.
This document consists of the physics 300 provincial examination (English version), a separate "provincial summary report" on the results of giving the test, and a separate French language version of the examination. This physics examination contains a 53-item multiple choice section and an 12 item free response section. Subsections of…
Changes in prevalence of subjective fatigue during 14-day 6° head-down bed rest
NASA Astrophysics Data System (ADS)
Hirayanagi, Kaname; Natsuno, Toyoki; Shiozawa, Tomoki; Yamaguchi, Nobuhisa; Watanabe, Yoriko; Suzuki, Satomi; Iwase, Satoshi; Mano, Tadaaki; Yajima, Kazuyoshi
2009-06-01
The present study examines the prevalence of subjective fatigue in young healthy males during 14 days of 6° head-down bed rest (HDBR) by using a multidimensional questionnaire. Forty-one subjects completed the Subjective Fatigue Scale questionnaire to assess the fatigue-related complaints and symptoms. The questionnaire is composed of three sections, with 10 items each. The sections measured drowsiness and dullness (Section 1), difficulty in concentration (Section 2), and the projection of physical disintegration (Section 3). The subjects answered simple questions between 1400 and 1700 on 6 measurement days before and during the HDBR period. The prevalence rate of low back pain was markedly high (80.5%) on the second day and more than 50% in the first half of the HDBR period, and any complaints related to either a lack of sleep or a deterioration in the quality of sleep continued until the end of the HDBR period. Our findings may be useful in developing preventive strategies against physical and mental fatigue associated with prolonged HDBR, horizontal bed rest, and microgravity environments.
NASA Astrophysics Data System (ADS)
Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan
2016-12-01
This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
Is Your Neighborhood Designed to Support Physical Activity? A Brief Streetscape Audit Tool.
Sallis, James F; Cain, Kelli L; Conway, Terry L; Gavand, Kavita A; Millstein, Rachel A; Geremia, Carrie M; Frank, Lawrence D; Saelens, Brian E; Glanz, Karen; King, Abby C
2015-09-03
Macro level built environment factors (eg, street connectivity, walkability) are correlated with physical activity. Less studied but more modifiable microscale elements of the environment (eg, crosswalks) may also affect physical activity, but short audit measures of microscale elements are needed to promote wider use. This study evaluated the relation of a 15-item neighborhood environment audit tool with a full version of the tool to assess neighborhood design on physical activity in 4 age groups. From the 120-item Microscale Audit of Pedestrian Streetscapes (MAPS) measure of street design, sidewalks, and street crossings, we developed the 15-item version (MAPS-Mini) on the basis of associations with physical activity and attribute modifiability. As a sample of a likely walking route, MAPS-Mini was conducted on a 0.25-mile route from participant residences toward the nearest nonresidential destination for children (n = 758), adolescents (n = 897), younger adults (n = 1,655), and older adults (n = 367). Active transportation and leisure physical activity were measured with age-appropriate surveys, and accelerometers provided objective physical activity measures. Mixed-model regressions were conducted for each MAPS item and a total environment score, adjusted for demographics, participant clustering, and macrolevel walkability. Total scores of MAPS-Mini and the 120-item MAPS correlated at r = .85. Total microscale environment scores were significantly related to active transportation in all age groups. Items related to active transport in 3 age groups were presence of sidewalks, curb cuts, street lights, benches, and buffer between street and sidewalk. The total score was related to leisure physical activity and accelerometer measures only in children. The MAPS-Mini environment measure is short enough to be practical for use by community groups and planning agencies and is a valid substitute for the full version that is 8 times longer.
Mueller, Evelyn A; Bengel, Juergen; Wirtz, Markus A
2013-12-01
This study aimed to develop a self-description assessment instrument to measure work performance in patients with musculoskeletal diseases. In terms of the International Classification of Functioning, Disability and Health (ICF), work performance is defined as the degree of meeting the work demands (activities) at the actual workplace (environment). To account for the fact that work performance depends on the work demands of the job, we strived to develop item banks that allow a flexible use of item subgroups depending on the specific work demands of the patients' jobs. Item development included the collection of work tasks from literature and content validation through expert surveys and patient interviews. The resulting 122 items were answered by 621 patients with musculoskeletal diseases. Exploratory factor analysis to ascertain dimensionality and Rasch analysis (partial credit model) for each of the resulting dimensions were performed. Exploratory factor analysis resulted in four dimensions, and subsequent Rasch analysis led to the following item banks: 'impaired productivity' (15 items), 'impaired cognitive performance' (18), 'impaired coping with stress' (13) and 'impaired physical performance' (low physical workload 20 items, high physical workload 10 items). The item banks exhibited person separation indices (reliability) between 0.89 and 0.96. The assessment of work performance adds the activities component to the more commonly employed participation component of the ICF-model. The four item banks can be adapted to specific jobs where necessary without losing comparability of person measures, as the item banks are based on Rasch analysis.
Saudek, Kris; Treat, Robert
2015-01-01
Purpose At our institution, speculation amongst medical students and faculty exists as to whether team-based learning (TBL) can improve scores on high-stakes examinations over traditional didactic lectures. Faculty with experience using TBL developed and piloted a required TBL blood disorders (BD) module for third-year medical students on their pediatric clerkship. The purpose of this study is to analyze the BD scores from the NBME subject exams before and after the introduction of the module. Methods We analyzed institutional and national item difficulties for BD items from the NBME pediatrics content area item analysis reports from 2011 to 2014 before (pre) and after (post) the pilot (October 2012). Total scores of 590 NBME subject examination students from examinee performance profiles were analyzed pre/post. t-Tests and Cohen's d effect sizes were used to analyze item difficulties for institutional versus national scores and pre/post comparisons of item difficulties and total scores. Results BD scores for our institution were 0.65 (±0.19) compared to 0.62 (±0.15) nationally (P=0.346; Cohen's d=0.15). The average of post-consecutive BD scores for our students was 0.70(±0.21) compared to examinees nationally [0.64 (±0.15)] with a significant mean difference (P=0.031; Cohen's d=0.43). The difference in our institutions pre [0.65 (±0.19)] and post [0.70 (±0.21)] BD scores trended higher (P=0.391; Cohen's d=0.27). Institutional BD scores were higher than national BD scores for both pre and post, with an effect size that tripled from pre to post scores. Institutional BD scores increased after the use of the TBL module, while overall exam scores remained steadily above national norms. Conclusions Institutional BD scores were higher than national BD scores for both pre and post, with an effect size that tripled from pre to post scores. Institutional BD scores increased after the use of the TBL module, while overall exam scores remained steadily above national norms.
USDA-ARS?s Scientific Manuscript database
Theoretically, increased levels of physical activity self-efficacy (PASE) should lead to increased physical activity, but few studies have reported this effect among youth. This failure may be at least partially attributable to measurement limitations. In this study, Item Response Modeling (IRM) was...
Construct Validation of the Physics Metacognition Inventory
ERIC Educational Resources Information Center
Taasoobshirazi, Gita; Farley, John
2013-01-01
The 24-item Physics Metacognition Inventory was developed to measure physics students' metacognition for problem solving. Items were classified into eight subcomponents subsumed under two broader components: knowledge of cognition and regulation of cognition. The students' scores on the inventory were found to be reliable and related to students'…
Assessment Results Following Inquiry and Traditional Physics Laboratory Activities
ERIC Educational Resources Information Center
Bryan, Joel Arthur
2006-01-01
Preservice elementary teachers in a conceptual physics course were given multiple resources to use during several inquiry activities in order to investigate how materials were chosen, used, and valued. These students performed significantly better on assessment items related to the inquiry physics activities than on items related to traditional…
ERIC Educational Resources Information Center
Chen, Hanwei; Cui, Zhongmin; Zhu, Rongchun; Gao, Xiaohong
2010-01-01
The most critical feature of a common-item nonequivalent groups equating design is that the average score difference between the new and old groups can be accurately decomposed into a group ability difference and a form difficulty difference. Two widely used observed-score linear equating methods, the Tucker and the Levine observed-score methods,…
Monclús Cols, Ester; Nicolás Ocejo, David; Sánchez Sánchez, Miquel; Ortega Romero, Mar
2015-02-01
To detect the problems hospital emergency room staff have when prescribing and administering antibiotics. A 14-item questionnaire was designed to assess staff members' knowledge of the importance of starting antibiotic treatment promptly, assigning appropriate dosing intervals, adjusting for renal function, and switching to oral therapy. Agreement with each item was expressed on a 5-point Likert scale. Items with a rate of appropriate response of less than 75% were targeted for specific attention. Two hundred questionnaires were distributed to the staff and 150 were returned completed (response rate, 75%). The following items were targeted for attention based on rates of appropriate response of less than 75%: clear medical orders (65%), understanding the implication of early empirical antibiotic therapy on prognosis in serious infections (67%), estimation of the prevalence of renal insufficiency (42%), assumption that a creatinine serum level under < 1.6 mg/dL is safe (33%), use of glomerular filtration rate to adjust dose according to renal function (47%), and an understanding of switching from intravenous to oral treatment (60%). This study revealed the difficulties medical and nursing staff have in prescribing and administering antibiotics in a hospital emergency department. The results can facilitate improvements in antibiotic therapy by pinpointing areas to target for specific training interventions or the design of electronic prescribing aids.
Improving measures of work-related physical functioning.
McDonough, Christine M; Ni, Pengsheng; Peterik, Kara; Marfeo, Elizabeth E; Marino, Molly E; Meterko, Mark; Rasch, Elizabeth K; Brandt, Diane E; Jette, Alan M; Chan, Leighton
2017-03-01
To expand content of the physical function domain of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration's (SSA) disability determination process. Newly developed questions were administered to 3532 recent SSA applicants for work disability benefits and 2025 US adults. Factor analyses and item response theory (IRT) methods were used to calibrate and link the new items to the existing WD-FAB, and computer-adaptive test simulations were conducted. Factor and IRT analyses supported integration of 44 new items into three existing WD-FAB scales and the addition of a new 11-item scale (Community Mobility). The final physical function domain consisting of: Basic Mobility (56 items), Upper Body Function (34 items), Fine Motor Function (45 items), and Community Mobility (11 items) demonstrated acceptable psychometric properties. The WD-FAB offers an important tool for enhancement of work disability determination. The FAB could provide relevant information about work-related functioning for initial assessment of claimants; identifying denied applicants who may benefit from interventions to improve work and health outcomes; enhancing periodic review of work disability beneficiaries; and assessing outcomes for policies, programs and services targeting people with work disability.
Improving Measures of Work-Related Physical Functioning
McDonough, Christine M.; Ni, Pengsheng; Peterik, Kara; Marfeo, Elizabeth E.; Marino, Molly E.; Meterko, Mark; Rasch, Elizabeth K; Brandt, Diane E.; Jette, Alan M; Chan, Leighton
2016-01-01
Purpose To expand content of the physical function domain of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration’s (SSA) disability determination process. Methods Newly developed questions were administered to 3,532 recent SSA applicants for work disability benefits and 2,025 US adults. Factor analyses and item response theory (IRT) methods were used to calibrate and link the new items to existing WD-FAB, and computer-adaptive test simulations were conducted. Results Factor and IRT analyses supported integration of 44 new items into 3 existing WD-FAB scales and the addition of a new 11-item scale (Community Mobility). The final physical function domain consisting of: Basic Mobility (56 items), Upper Body Function (34 items), Fine Motor Function (45 items), and Community Mobility (11 items) demonstrated acceptable psychometric properties. Conclusions The WD-FAB offers an important tool for enhancement of work disability determination. The FAB could provide relevant information about work-related functioning for initial assessment of claimants, identifying denied applicants who may benefit from interventions to improve work and health outcomes; enhancing periodic review of work disability beneficiaries; and assessing outcomes for policies, programs and services targeting people with work disability. PMID:28005243
ERIC Educational Resources Information Center
Rosenblatt, Rebecca
2012-01-01
Here I present my work identifying and addressing student difficulties with several materials science and physics topics. In the first part of this thesis, I present my work identifying student difficulties and misconceptions about the directional relationships between net force, velocity, and acceleration in one dimension. This is accomplished…
Perez, Kathryn E.; Hiatt, Anna; Davis, Gregory K.; Trujillo, Caleb; French, Donald P.; Terry, Mark; Price, Rebecca M.
2013-01-01
The American Association for the Advancement of Science 2011 report Vision and Change in Undergraduate Biology Education encourages the teaching of developmental biology as an important part of teaching evolution. Recently, however, we found that biology majors often lack the developmental knowledge needed to understand evolutionary developmental biology, or “evo-devo.” To assist in efforts to improve evo-devo instruction among undergraduate biology majors, we designed a concept inventory (CI) for evolutionary developmental biology, the EvoDevoCI. The CI measures student understanding of six core evo-devo concepts using four scenarios and 11 multiple-choice items, all inspired by authentic scientific examples. Distracters were designed to represent the common conceptual difficulties students have with each evo-devo concept. The tool was validated by experts and administered at four institutions to 1191 students during preliminary (n = 652) and final (n = 539) field trials. We used student responses to evaluate the readability, difficulty, discriminability, validity, and reliability of the EvoDevoCI, which included items ranging in difficulty from 0.22–0.55 and in discriminability from 0.19–0.38. Such measures suggest the EvoDevoCI is an effective tool for assessing student understanding of evo-devo concepts and the prevalence of associated common conceptual difficulties among both novice and advanced undergraduate biology majors. PMID:24297293
Interaction between numbers and size during visual search.
Krause, Florian; Bekkering, Harold; Pratt, Jay; Lindemann, Oliver
2017-05-01
The current study investigates an interaction between numbers and physical size (i.e. size congruity) in visual search. In three experiments, participants had to detect a physically large (or small) target item among physically small (or large) distractors in a search task comprising single-digit numbers. The relative numerical size of the digits was varied, such that the target item was either among the numerically large or small numbers in the search display and the relation between numerical and physical size was either congruent or incongruent. Perceptual differences of the stimuli were controlled by a condition in which participants had to search for a differently coloured target item with the same physical size and by the usage of LCD-style numbers that were matched in visual similarity by shape transformations. The results of all three experiments consistently revealed that detecting a physically large target item is significantly faster when the numerical size of the target item is large as well (congruent), compared to when it is small (incongruent). This novel finding of a size congruity effect in visual search demonstrates an interaction between numerical and physical size in an experimental setting beyond typically used binary comparison tasks, and provides important new evidence for the notion of shared cognitive codes for numbers and sensorimotor magnitudes. Theoretical consequences for recent models on attention, magnitude representation and their interactions are discussed.
Kulich, Károly; Keininger, Dorothy L; Tiplady, Brian; Banerji, Donald
2015-01-01
Symptoms, particularly dyspnea, and activity limitation, have an impact on the health status and the ability to function normally in patients with chronic obstructive pulmonary disease (COPD). To develop an electronic patient diary (eDiary), qualitative patient interviews were conducted from 2009 to 2010 to identify relevant symptoms and degree of bother due to symptoms. The eDiary was completed by a subset of 209 patients with moderate-to-severe COPD in the 26-week QVA149 SHINE study. Two morning assessments (since awakening and since the last assessment) and one evening assessment were made each day. Assessments covered five symptoms ("shortness of breath," "phlegm/mucus," "chest tightness," "wheezing," and "coughing") and two impact items ("bothered by COPD" and "difficulty with activities") and were scored on a 10-point numeric scale. Patient compliance with the eDiary was 90.4% at baseline and 81.3% at week 26. Correlations between shortness of breath and impact items were >0.95. Regression analysis showed that shortness of breath was a highly significant (P<0.0001) predictor of impact items. Exploratory factor analysis gave a single factor comprising all eDiary items, including both symptoms and impact items. Shortness of breath, the total score (including five symptoms and two impact items), and the five-item symptom score from the eDiary performed well, with good consistency and reliability. The eDiary showed good sensitivity to change, with a 0.6 points reduction in the symptoms scores (on a 0-10 point scale) representing a meaningful change. The eDiary was found to be valid, reliable, and responsive. The high correlations obtained between "shortness of breath" and the ratings of "bother" and "difficulty with activities" confirmed the relevance of this symptom in patients with COPD. Future studies will be required to explore further psychometric properties and their ability to differentiate between COPD treatments.
The stroke impairment assessment set: its internal consistency and predictive validity.
Tsuji, T; Liu, M; Sonoda, S; Domen, K; Chino, N
2000-07-01
To study the scale quality and predictive validity of the Stroke Impairment Assessment Set (SIAS) developed for stroke outcome research. Rasch analysis of the SIAS; stepwise multiple regression analysis to predict discharge functional independence measure (FIM) raw scores from demographic data, the SIAS scores, and the admission FIM scores; cross-validation of the prediction rule. Tertiary rehabilitation center in Japan. One hundred ninety stroke inpatients for the study of the scale quality and the predictive validity; a second sample of 116 stroke inpatients for the cross-validation study. Mean square fit statistics to study the degree of fit to the unidimensional model; logits to express item difficulties; discharge FIM scores for the study of predictive validity. The degree of misfit was acceptable except for the shoulder range of motion (ROM), pain, visuospatial function, and speech items; and the SIAS items could be arranged on a common unidimensional scale. The difficulty patterns were identical at admission and at discharge except for the deep tendon reflexes, ROM, and pain items. They were also similar for the right- and left-sided brain lesion groups except for the speech and visuospatial items. For the prediction of the discharge FIM scores, the independent variables selected were age, the SIAS total scores, and the admission FIM scores; and the adjusted R2 was .64 (p < .0001). Stability of the predictive equation was confirmed in the cross-validation sample (R2 = .68, p < .001). The unidimensionality of the SIAS was confirmed, and the SIAS total scores proved useful for stroke outcome prediction.
McDonough, Christine M.; Jette, Alan M.; Ni, Pengsheng; Bogusz, Kara; Marfeo, Elizabeth E; Brandt, Diane E; Chan, Leighton; Meterko, Mark; Haley, Stephen M.; Rasch, Elizabeth K.
2014-01-01
Objectives To build a comprehensive item pool representing work-relevant physical functioning and to test the factor structure of the item pool. These developmental steps represent initial outcomes of a broader project to develop instruments for the assessment of function within the context of Social Security Administration (SSA) disability programs. Design Comprehensive literature review; gap analysis; item generation with expert panel input; stakeholder interviews; cognitive interviews; cross-sectional survey administration; and exploratory and confirmatory factor analyses to assess item pool structure. Setting In-person and semi-structured interviews; internet and telephone surveys. Participants A sample of 1,017 SSA claimants, and a normative sample of 999 adults from the US general population. Interventions Not Applicable. Main Outcome Measure Model fit statistics Results The final item pool consisted of 139 items. Within the claimant sample 58.7% were white; 31.8% were black; 46.6% were female; and the mean age was 49.7 years. Initial factor analyses revealed a 4-factor solution which included more items and allowed separate characterization of: 1) Changing and Maintaining Body Position, 2) Whole Body Mobility, 3) Upper Body Function and 4) Upper Extremity Fine Motor. The final 4-factor model included 91 items. Confirmatory factor analyses for the 4-factor models for the claimant and the normative samples demonstrated very good fit. Fit statistics for claimant and normative samples respectively were: Comparative Fit Index = 0.93 and 0.98; Tucker-Lewis Index = 0.92 and 0.98; Root Mean Square Error Approximation = 0.05 and 0.04. Conclusions The factor structure of the Physical Function item pool closely resembled the hypothesized content model. The four scales relevant to work activities offer promise for providing reliable information about claimant physical functioning relevant to work disability. PMID:23542402
McDonough, Christine M; Jette, Alan M; Ni, Pengsheng; Bogusz, Kara; Marfeo, Elizabeth E; Brandt, Diane E; Chan, Leighton; Meterko, Mark; Haley, Stephen M; Rasch, Elizabeth K
2013-09-01
To build a comprehensive item pool representing work-relevant physical functioning and to test the factor structure of the item pool. These developmental steps represent initial outcomes of a broader project to develop instruments for the assessment of function within the context of Social Security Administration (SSA) disability programs. Comprehensive literature review; gap analysis; item generation with expert panel input; stakeholder interviews; cognitive interviews; cross-sectional survey administration; and exploratory and confirmatory factor analyses to assess item pool structure. In-person and semistructured interviews and Internet and telephone surveys. Sample of SSA claimants (n=1017) and a normative sample of adults from the U.S. general population (n=999). Not applicable. Model fit statistics. The final item pool consisted of 139 items. Within the claimant sample, 58.7% were white; 31.8% were black; 46.6% were women; and the mean age was 49.7 years. Initial factor analyses revealed a 4-factor solution, which included more items and allowed separate characterization of: (1) changing and maintaining body position, (2) whole body mobility, (3) upper body function, and (4) upper extremity fine motor. The final 4-factor model included 91 items. Confirmatory factor analyses for the 4-factor models for the claimant and the normative samples demonstrated very good fit. Fit statistics for claimant and normative samples, respectively, were: Comparative Fit Index=.93 and .98; Tucker-Lewis Index=.92 and .98; and root mean square error approximation=.05 and .04. The factor structure of the physical function item pool closely resembled the hypothesized content model. The 4 scales relevant to work activities offer promise for providing reliable information about claimant physical functioning relevant to work disability. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Improved Classification of Mammograms Following Idealized Training
Hornsby, Adam N.; Love, Bradley C.
2014-01-01
People often make decisions by stochastically retrieving a small set of relevant memories. This limited retrieval implies that human performance can be improved by training on idealized category distributions (Giguère & Love, 2013). Here, we evaluate whether the benefits of idealized training extend to categorization of real-world stimuli, namely classifying mammograms as normal or tumorous. Participants in the idealized condition were trained exclusively on items that, according to a norming study, were relatively unambiguous. Participants in the actual condition were trained on a representative range of items. Despite being exclusively trained on easy items, idealized-condition participants were more accurate than those in the actual condition when tested on a range of item types. However, idealized participants experienced difficulties when test items were very dissimilar from training cases. The benefits of idealization, attributable to reducing noise arising from cognitive limitations in memory retrieval, suggest ways to improve real-world decision making. PMID:24955325
Improved Classification of Mammograms Following Idealized Training.
Hornsby, Adam N; Love, Bradley C
2014-06-01
People often make decisions by stochastically retrieving a small set of relevant memories. This limited retrieval implies that human performance can be improved by training on idealized category distributions (Giguère & Love, 2013). Here, we evaluate whether the benefits of idealized training extend to categorization of real-world stimuli, namely classifying mammograms as normal or tumorous. Participants in the idealized condition were trained exclusively on items that, according to a norming study, were relatively unambiguous. Participants in the actual condition were trained on a representative range of items. Despite being exclusively trained on easy items, idealized-condition participants were more accurate than those in the actual condition when tested on a range of item types. However, idealized participants experienced difficulties when test items were very dissimilar from training cases. The benefits of idealization, attributable to reducing noise arising from cognitive limitations in memory retrieval, suggest ways to improve real-world decision making.
Informed choice: understanding knowledge in the context of screening uptake.
Michie, Susan; Dormandy, Elizabeth; Marteau, Theresa M
2003-07-01
This study evaluates a scale measuring knowledge about a screening test and investigates the association between knowledge, uptake and attitudes towards screening. One thousand four hundred ninety-nine pregnant women completed the knowledge scale of the multidimensional measure of informed choice (MMIC). Three hundred forty-five of these women and 152 professionals providing antenatal care also rated the importance of the knowledge items. Item characteristic curves show that, with one exception, the knowledge items reflect a spread of difficulty and are able to discriminate between people. All items were seen as essential or helpful by both women and health professionals, with two items seen as particularly important and one as unimportant. There were some differences between health professionals, women with low risk results and women with high risk results. Knowledge was not associated with uptake, attitude, or the extent to which uptake was consistent with women's attitudes towards undergoing the test.
Development and initial evaluation of the SCI-FI/AT
Jette, Alan M.; Slavin, Mary D.; Ni, Pengsheng; Kisala, Pamela A.; Tulsky, David S.; Heinemann, Allen W.; Charlifue, Susie; Tate, Denise G.; Fyffe, Denise; Morse, Leslie; Marino, Ralph; Smith, Ian; Williams, Steve
2015-01-01
Objectives To describe the domain structure and calibration of the Spinal Cord Injury Functional Index for samples using Assistive Technology (SCI-FI/AT) and report the initial psychometric properties of each domain. Design Cross sectional survey followed by computerized adaptive test (CAT) simulations. Setting Inpatient and community settings. Participants A sample of 460 adults with traumatic spinal cord injury (SCI) stratified by level of injury, completeness of injury, and time since injury. Interventions None Main outcome measure SCI-FI/AT Results Confirmatory factor analysis (CFA) and Item response theory (IRT) analyses identified 4 unidimensional SCI-FI/AT domains: Basic Mobility (41 items) Self-care (71 items), Fine Motor Function (35 items), and Ambulation (29 items). High correlations of full item banks with 10-item simulated CATs indicated high accuracy of each CAT in estimating a person's function, and there was high measurement reliability for the simulated CAT scales compared with the full item bank. SCI-FI/AT item difficulties in the domains of Self-care, Fine Motor Function, and Ambulation were less difficult than the same items in the original SCI-FI item banks. Conclusion With the development of the SCI-FI/AT, clinicians and investigators have available multidimensional assessment scales that evaluate function for users of AT to complement the scales available in the original SCI-FI. PMID:26010975
Development and initial evaluation of the SCI-FI/AT.
Jette, Alan M; Slavin, Mary D; Ni, Pengsheng; Kisala, Pamela A; Tulsky, David S; Heinemann, Allen W; Charlifue, Susie; Tate, Denise G; Fyffe, Denise; Morse, Leslie; Marino, Ralph; Smith, Ian; Williams, Steve
2015-05-01
To describe the domain structure and calibration of the Spinal Cord Injury Functional Index for samples using Assistive Technology (SCI-FI/AT) and report the initial psychometric properties of each domain. Cross sectional survey followed by computerized adaptive test (CAT) simulations. Inpatient and community settings. A sample of 460 adults with traumatic spinal cord injury (SCI) stratified by level of injury, completeness of injury, and time since injury. None SCI-FI/AT RESULTS: Confirmatory factor analysis (CFA) and Item response theory (IRT) analyses identified 4 unidimensional SCI-FI/AT domains: Basic Mobility (41 items) Self-care (71 items), Fine Motor Function (35 items), and Ambulation (29 items). High correlations of full item banks with 10-item simulated CATs indicated high accuracy of each CAT in estimating a person's function, and there was high measurement reliability for the simulated CAT scales compared with the full item bank. SCI-FI/AT item difficulties in the domains of Self-care, Fine Motor Function, and Ambulation were less difficult than the same items in the original SCI-FI item banks. With the development of the SCI-FI/AT, clinicians and investigators have available multidimensional assessment scales that evaluate function for users of AT to complement the scales available in the original SCI-FI.
Wolfe, Edward W; McGill, Michael T
2011-01-01
This article summarizes a simulation study of the performance of five item quality indicators (the weighted and unweighted versions of the mean square and standardized mean square fit indices and the point-measure correlation) under conditions of relatively high and low amounts of missing data under both random and conditional patterns of missing data for testing contexts such as those encountered in operational administrations of a computerized adaptive certification or licensure examination. The results suggest that weighted fit indices, particularly the standardized mean square index, and the point-measure correlation provide the most consistent information between random and conditional missing data patterns and that these indices perform more comparably for items near the passing score than for items with extreme difficulty values.
Somatic complaints in children and community violence exposure.
Bailey, Beth Nordstrom; Delaney-Black, Virginia; Hannigan, John H; Ager, Joel; Sokol, Robert J; Covington, Chandice Y
2005-10-01
Somatic complaints of children in primary care settings often go unexplained despite attempts to determine a cause. Recent research has linked violence exposure to stress symptomatology and associated somatic problems. Unknown, however, is whether specific physical symptom complaints can be attributed, at least in part, to violence exposure. Urban African-American 6- and 7-year-old children (N = 268), residing with their biological mothers, recruited before birth, and without prenatal exposure to hard illicit drugs participated. Children and mothers were evaluated in our hospital-based research laboratory, with teacher data collected by mail. Community violence exposure (Things I Have Seen and Heard), stress symptomatology (Levonn), and somatic complaints (teacher-and self-report items) were assessed. Additional data collected included prenatal alcohol exposure, socioeconomic status, domestic violence, maternal age, stress, somatic complaints and psychopathology, and child depression, abuse, and gender. Community violence witnessing and victimization were associated with stress symptoms (r = .26 and .25, respectively, p < .001); violence victimization was related to decreased appetite (r = .16, p < .01), difficulty sleeping (r = .21, p < .001), and stomachache complaints (r = .13, p < .05); witnessed violence was associated with difficulty sleeping (r = .13, p < .05) and headaches (r = .12, p < .05). All associations remained significant after control for confounding. Community violence exposure accounted for 10% of the variance in child stress symptoms, and children who had experienced community violence victimization had a 28% increased risk of appetite problems, a 94% increased risk of sleeping problems, a 57% increased risk of headaches, and a 174% increased risk of stomachaches. Results provide yet another possibility for clinicians to explore when treating these physical symptoms in children.
Screening for Moral Injury: The Moral Injury Symptom Scale - Military Version Short Form.
Koenig, Harold G; Ames, Donna; Youssef, Nagy A; Oliver, John P; Volk, Fred; Teng, Ellen J; Haynes, Kerry; Erickson, Zachary D; Arnold, Irina; O'Garo, Keisha; Pearce, Michelle
2018-03-26
To develop a short form (SF) of the 45-item multidimensional Moral Injury Symptom Scale - Military Version (MISS-M) to use when screening for moral injury and monitoring treatment response in veterans and active duty military with PTSD. A total of 427 veterans and active duty military with PTSD symptoms were recruited from VA Medical Centers in Augusta, GA; Los Angeles, CA; Durham, NC; Houston, TX; and San Antonio, TX; and from Liberty University, Lynchburg, Virginia. The sample was randomly split in two. In the first half (n = 214), exploratory factor analysis identified the highest loading item on each of the 10 MISS scales (guilt, shame, moral concerns, loss of meaning, difficulty forgiving, loss of trust, self-condemnation, religious struggle, and loss of religious faith) to form the 10-item MISS-M-SF; confirmatory factor analysis was then performed to replicate results in the second half of the sample (n = 213). Internal reliability, test-retest reliability, and convergent, discriminant, and concurrent validity were examined in the overall sample. The study was approved by the institutional review boards and the Research & Development (R&D) Committees at Veterans Administration medical centers in Durham, Los Angeles, Augusta, Houston, and San Antonio, and the Liberty University and Duke University Medical Center institutional review boards. The 10-item MISS-M-SF had a median of 50 and a range of 12-91 (possible range 10-100). Over 70% scored a 9 or 10 (highest possible) on at least one item. Cronbach's alpha was 0.73 (95% CI 0.69-0.76), and test-retest reliability was 0.87 (95% CI 0.79-0.92). Convergent validity with the 45-item MISS-M was r = 0.92. Discriminant validity was demonstrated by relatively weak correlations with social, religious, and physical health constructs (r = 0.21-0.35), and concurrent validity was indicated by strong correlations with PTSD, depression, and anxiety symptoms (r = 0.54-0.58). The MISS-M-SF is a reliable and valid measure of MI symptoms that can be used to screen for MI and monitor response to treatment in veterans and active duty military with PTSD.
Yang, Sook Ja; Chee, Yeon Kyung; An, Jisook; Park, Min Hee; Jung, Sunok
2016-05-01
The purpose of this study was to obtain an independent evaluation of the factor structure of the 12-item Health Literacy Index for Female Marriage Immigrants (HLI-FMI), the first measure for assessing health literacy for FMIs in Korea. Participants were 250 Asian women who migrated from China, Vietnam, and the Philippines to marry. The HLI-FMI was originally developed and administered in Korean, and other questionnaires were translated into participants' native languages. The HLI-FMI consisted of 2 factors: (1) Access-Understand Health Literacy (7 items) and (2) Appraise-Apply Health Literacy (5 items); Cronbach's α = .73. Confirmatory factor analysis indicated adequate fit for the 2-factor model. HLI-FMI scores were positively associated with time since immigration and Korean proficiency. Based on classical test theory and item response theory, strong support was provided for item discrimination and item difficulty. Findings suggested that the HLI-FMI is an easily administered, reliable, and valid scale. © 2016 APJPH.
An Analysis of the Connectedness to Nature Scale Based on Item Response Theory
Pasca, Laura; Aragonés, Juan I.; Coello, María T.
2017-01-01
The Connectedness to Nature Scale (CNS) is used as a measure of the subjective cognitive connection between individuals and nature. However, to date, it has not been analyzed at the item level to confirm its quality. In the present study, we conduct such an analysis based on Item Response Theory. We employed data from previous studies using the Spanish-language version of the CNS, analyzing a sample of 1008 participants. The results show that seven items presented appropriate indices of discrimination and difficulty, in addition to a good fit. The remaining six have inadequate discrimination indices and do not present a good fit. A second study with 321 participants shows that the seven-item scale has adequate levels of reliability and validity. Therefore, it would be appropriate to use a reduced version of the scale after eliminating the items that display inappropriate behavior, since they may interfere with research results on connectedness to nature. PMID:28824509
Cabanas-Sánchez, Verónica; Tejero-González, Carlos M; Veiga, Oscar L
2012-01-01
One of the main problems of health in the first world is the increase of physical inactivity. In this respect, adolescence has been identified as a critic period with high decline of physical activity. Therefore, a relevant line of research is the understanding of this social phenomenon. The aim of this study was to design a scale to assess perceived barriers to physical activity on adolescents. A convenience sample of 160 Spanish adolescents (84 girls), between 12 and 18 years old, was recruited for this study. Firstly, there were designed 40 items whose pertinence was evaluated through content validation by experts. Later, the participants were divided in two randomized groups, and Exploratory Factor Analysis and Confirmatory Factor Analysis were performed to define a short scale of 12 items. Cronbach Alfa Coefficent was used to evaluate internal consistence of the instrument. The scale reports four dimensions: incompatibility barriers (2 items), self-concept barriers (4 items), amotivation barriers (4 items) and social barriers (2 items). The scale showed enough construct validity (χ2=60.78; d.f.=48; p=0.100; GFI=0.88; CFI=0.94; RMSEA=0.58) and high internal reliability (α=0.80). Moreover, the scale was able to explain 67% of the data variance. The Short Scale of Perceived Barriers to Physical Activity in Adolescents is a valid and reliable instrument.
Colorado Learning Difficulties Questionnaire:Validation of a parent-report screening measure
Willcutt, Erik G.; Boada, Richard; Riddle, Margaret W.; Chhabildas, Nomita; DeFries, John C.; Pennington, Bruce F.
2011-01-01
This study evaluated the internal structure and convergent and discriminant evidence for the Colorado Learning Difficulties Questionnaire (CLDQ), a 20-item parent-report rating scale that was developed to provide a brief screening measure for learning difficulties. CLDQ ratings were obtained from parents of children in two large community samples and two samples from clinics that specialize in the assessment of learning disabilities and related disorders (total N = 8,004). Exploratory and confirmatory factor analyses revealed five correlated but separable dimensions that were labeled reading, math, social cognition, social anxiety, and spatial difficulties. Results revealed strong convergent and discriminant evidence for the CLDQ Reading scale, suggesting that this scale may provide a useful method to screen for reading difficulties in both research studies and clinical settings. Results are also promising for the other four CLDQ scales, but additional research is needed to refine each of these measures. PMID:21574721
ERIC Educational Resources Information Center
Backman, Erik; Larsson, Håkan
2016-01-01
Background: Research indicates that physical education teacher education (PETE) has only limited impact on how physical education (PE) is taught in schools. In this paper, our starting point is that the difficulties of challenging the dominating subject traditions in PE could be due to difficulties of challenging certain epistemological…
ERIC Educational Resources Information Center
Ekici, Erhan
2016-01-01
The aim of this study is to develop a valid and reliable instrument to assess why physics courses are perceived as one of the most difficult courses among high school students and to investigate the reasons why students have difficulty in learning physics through this scale. This study includes the development and validation studies of the…
ERIC Educational Resources Information Center
Sengoren, Serap Kaya; Tanel, Rabia; Kavcar, Nevzat
2006-01-01
The superposition principle is used to explain many phenomena in physics. Incomplete knowledge about this topic at a basic level leads to physics students having problems in the future. As long as prospective physics teachers have difficulties in the subject, it is inevitable that high school students will have the same difficulties. The aim of…
Rasch Mixture Models for DIF Detection
Strobl, Carolin; Zeileis, Achim
2014-01-01
Rasch mixture models can be a useful tool when checking the assumption of measurement invariance for a single Rasch model. They provide advantages compared to manifest differential item functioning (DIF) tests when the DIF groups are only weakly correlated with the manifest covariates available. Unlike in single Rasch models, estimation of Rasch mixture models is sensitive to the specification of the ability distribution even when the conditional maximum likelihood approach is used. It is demonstrated in a simulation study how differences in ability can influence the latent classes of a Rasch mixture model. If the aim is only DIF detection, it is not of interest to uncover such ability differences as one is only interested in a latent group structure regarding the item difficulties. To avoid any confounding effect of ability differences (or impact), a new score distribution for the Rasch mixture model is introduced here. It ensures the estimation of the Rasch mixture model to be independent of the ability distribution and thus restricts the mixture to be sensitive to latent structure in the item difficulties only. Its usefulness is demonstrated in a simulation study, and its application is illustrated in a study of verbal aggression. PMID:29795819
Teaching Physical Geography with Toys, Household Items, and Food
ERIC Educational Resources Information Center
Carnahan, Laura; Pankratz, Mary Jo; Alberts, Heike
2014-01-01
While many college physical geography instructors already use a wide variety of creative teaching approaches in their classes, others have not yet been exposed to teaching with toys, household items, or food. The goal in this article is to present some ideas for teaching college-level physical geography (weather/climate and geomorphology) for…
ERIC Educational Resources Information Center
Kallemeyn, LeRoy Willard
Determined were the kinds of physics study items used, and the emphasis placed per item, by both the Physical Science Study Committee (PSSC) teachers and the teachers of traditional physics materials in the state of Nebraska. A questionnaire was sent to teachers from the largest 100 schools, ranked by total enrollment, and to fifty other teachers…
Teachers' experiences supporting children after traumatic exposure.
Alisic, Eva; Bus, Marissa; Dulack, Wendel; Pennings, Lenneke; Splinter, Jessica
2012-02-01
Teachers can be instrumental in supporting children's recovery after trauma, but some work suggests that elementary school teachers are uncertain about their role and about what to do to assist children effectively after their students have been exposed to traumatic stressors. This study examined the extent to which teachers working with children from ages 8 to 12 years report similar concerns. A random sample of teachers in the Netherlands (N = 765) completed a questionnaire that included 9 items measuring difficulties on a 6-point Likert scale (potential range of total scores: 9-54). The mean total difficulty score was 29.8 (ranging from 10 to 50; SD = 7.37). On individual items, the fraction of teachers scoring 4 or more varied between 25 and 63%. A multiple regression analysis showed that teachers' total scores depended on amount of teaching experience, attendance at trauma-focused training, and the number of traumatized children they had worked with. The model explained 4% of the variance, a small effect. Because traumatic exposure in children is rather common, the findings point to a need to better understand what influences teachers' difficulties and develop trauma-informed practice in elementary schools. Copyright © 2012 International Society for Traumatic Stress Studies.
Schmitter-Edgecombe, Maureen; Parsey, Carolyn; Lamb, Richard
2014-01-01
The Instrumental Activities of Daily Living – Compensation (IADL-C) scale was developed to capture early functional difficulties and to quantify compensatory strategy use that may mitigate functional decline in the aging population. The IADL-C was validated in a sample of cognitively healthy older adults (N=184) and individuals with mild cognitive impairment (MCI; N=92) and dementia (N=24). Factor analysis and Rasch item analysis led to the 27-item IADL-C informant questionnaire with four functional domain subscales (money and self-management, home daily living, travel and event memory, and social skills). The subscales demonstrated good internal consistency (Rasch reliability 0.80 to 0.93) and test-retest reliability (Spearman coefficients 0.70 to 0.91). The IADL-C total score and subscales showed convergent validity with other IADL measures, discriminant validity with psychosocial measures, and the ability to discriminate between diagnostic groups. The money and self management subscale showed notable difficulties for individuals with MCI, whereas difficulties with home daily living became more prominent for dementia participants. Compensatory strategy use increased in the MCI group and decreased in the dementia group. PMID:25344901
Helping Students Draw Correct Free-Body Diagrams
ERIC Educational Resources Information Center
Lee, Albert
2017-01-01
As physics instructors, we try to help our students learn physics. But most of us begin to realize that our students are not learning as much as we hope they would. As we listen to our students, we begin to see some of their difficulties. Some of their difficulties are expected, but some are unexpected. One such difficulty is drawing the force…
ERIC Educational Resources Information Center
Gonzalez-Roma, Vicente; Tomas, Ines; Ferreres, Doris; Hernandez, Ana
2005-01-01
The aims of this study were to investigate whether the 6 items of the Physical Appearance Scale (Marsh, Richards, Johnson, Roche, & Tremayne, 1994) show differential item functioning (DIF) across gender groups of adolescents, and to show how this can be done using the multigroup mean and covariance structure (MG-MACS) analysis model. Two samples…
McFarland, Daniel C; Shaffer, Kelly M; Polizzi, Heather; Mascarenhas, John; Kremyanskaya, Marina; Holland, Jimmie; Hoffman, Ronald
2018-01-31
The physical symptom burden of patients with myeloproliferative neoplasms (MPNs) may last for extended periods during their disease trajectories and lead to psychologic distress, anxiety, or depression or all of these. This study evaluated the relationship between physical symptom burden captured by the Physical Problem List (PPL) on the Distress Thermometer and Problem List and psychologic outcomes (distress, anxiety, and depression) in the MPN setting. Patients (N = 117) with MPNs completed questionnaires containing the Distress Thermometer and Problem List and the Hospital Anxiety and Depression Scale in a dedicated MPN clinic within an academic medical center. They reported symptoms from any of 22 physical problems on the PPL. Items endorsed by more than 10% of participants were assessed for their associations with distress (Distress Thermometer and Problem List), anxiety (Hospital Anxiety and Depression Scale-Anxiety), and depression (Hospital Anxiety and Depression Scale-Depression). The total number of endorsed PPL items per participant was also evaluated. Nine of 22 PPL items (fatigue, sleep, pain, dry skin/pruritus, memory/concentration, feeling swollen, breathing, and sexual) were reported by >10% of participants. In univariate analyses, all PPL items but one were associated with distress and depression, and all but 2 were associated with anxiety. In multivariate analyses, the total number of PPL items was associated with depression only (p < 0.001) when controlling for covariates. Physical symptom burden in MPN patients was clearly associated with psychologic symptoms. Depression was uniquely associated with overall physical symptom burden. As such, the endorsement of multiple PPL items on the Distress Thermometer and Problem List should prompt an evaluation for psychologic symptoms to improve MPN patients' overall morbidity and quality of life. Copyright © 2018 The Academy of Psychosomatic Medicine. Published by Elsevier Inc. All rights reserved.
ITEMS Project: An online sequence for teaching mathematics and astronomy
NASA Astrophysics Data System (ADS)
Martínez, Bernat; Pérez, Josep
2010-10-01
This work describes an elearning sequence for teaching geometry and astronomy in lower secondary school created inside the ITEMS (Improving Teacher Education in Mathematics and Science) project. It is based on results from the astronomy education research about studentsŠ difficulties in understanding elementary astronomical observations and models. The sequence consists of a set of computer animations embedded in an elearning environment aimed at supporting students in learning about astronomy ideas that require the use of geometrical concepts and visual-spatial reasoning.
Item response theory analysis of the mechanics baseline test
NASA Astrophysics Data System (ADS)
Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.
2012-02-01
Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.
Item-Writing Guidelines for Physics
ERIC Educational Resources Information Center
Regan, Tom
2015-01-01
A teacher learning how to write test questions (test items) will almost certainly encounter item-writing guidelines--lists of item-writing do's and don'ts. Item-writing guidelines usually are presented as applicable across all assessment settings. Table I shows some guidelines that I believe to be generally applicable and two will be briefly…
Hinton, Pamela S; Johnstone, Brick; Blaine, Edward; Bodling, Angela
2011-09-01
To determine the relative influence of current exercise and diet on the late-life cognitive health of former Division I collision-sport collegiate athletes (ie, football players) compared with noncollision-sport athletes and non-athletes. Graduates (n = 400) of a Midwestern university (average age, 64.09 years; standard deviation, 13.32) completed a self-report survey to assess current demographics/physical characteristics, exercise, diet, cognitive difficulties, and physical and mental health. Former football players reported more cognitive difficulties, as well as worse physical and mental health than controls. Among former football players, greater intake of total and saturated fat and cholesterol and lower overall diet quality were significantly correlated with cognitive difficulties; current dietary intake was not associated with cognitive health for the noncollision-sport athletes or nonathletes. Hierarchical regressions predicting cognitive difficulties indicated that income was positively associated with fewer cognitive difficulties and predicted 8% of the variance; status as a former football player predicted an additional 2% of the variance; and the interaction between being a football player and total dietary fat intake significantly predicted an additional 6% of the total variance (total model predicted 16% of variance). Greater intake of dietary fat was associated with increased cognitive difficulties, but only in the former football players, and not in the controls. Prior participation in football was associated with worse physical and mental health, while more frequent vigorous exercise was associated with higher physical and mental health ratings. Former football players reported more late-life cognitive difficulties and worse physical and mental health than former noncollision-sport athletes and nonathletes. A novel finding of the present study is that current dietary fat was associated with more cognitive difficulties, but only in the former football players. These results suggest the need for educational interventions to encourage healthy dietary habits to promote the long-term cognitive health of collision-sport athletes.
Ten Issues in Criterion-Referenced Testing: A Response to Commonly Heard Criticisms.
ERIC Educational Resources Information Center
Curlette, William L.; Stallings, William M.
1979-01-01
The 10 criticisms of criterion-referenced tests addressed in this paper are: the domains tested; pedagogical influence; difficulty of items; cumbersome reports; reliability; arbitrary criteria; local objectives; labeling; predictive validity; and repeated testing. (SJL)