test item difficulty: Topics by Science.gov

Sample records for test item difficulty

Unidimensional IRT Item Parameter Estimates across Equivalent Test Forms with Confounding Specifications within Dimensions

ERIC Educational Resources Information Center

Matlock, Ki Lynn; Turner, Ronna

2016-01-01

When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
An Analysis of Factors Affecting the Difficulty of Dialogue Items in TOEFL Listening Comprehension. TOEFL Research Reports, 51.

ERIC Educational Resources Information Center

Nissan, Susan; And Others

One of the item types in the Listening Comprehension section of the Test of English as a Foreign Language (TOEFL) test is the dialogue. Because the dialogue item pool needs to have an appropriate balance of items at a range of difficulty levels, test developers have examined items at various difficulty levels in an attempt to identify their…
Development and psychometric characteristics of the SCI-QOL Bladder Management Difficulties and Bowel Management Difficulties item banks and short forms and the SCI-QOL Bladder Complications scale.

PubMed

Tulsky, David S; Kisala, Pamela A; Tate, Denise G; Spungen, Ann M; Kirshblum, Steven C

2015-05-01

To describe the development and psychometric properties of the Spinal Cord Injury--Quality of Life (SCI-QOL) Bladder Management Difficulties and Bowel Management Difficulties item banks and Bladder Complications scale. Using a mixed-methods design, a pool of items assessing bladder and bowel-related concerns were developed using focus groups with individuals with spinal cord injury (SCI) and SCI clinicians, cognitive interviews, and item response theory (IRT) analytic approaches, including tests of model fit and differential item functioning. Thirty-eight bladder items and 52 bowel items were tested at the University of Michigan, Kessler Foundation Research Center, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters VA Medical Center, Bronx, NY. Seven hundred fifty-seven adults with traumatic SCI. The final item banks demonstrated unidimensionality (Bladder Management Difficulties CFI=0.965; RMSEA=0.093; Bowel Management Difficulties CFI=0.955; RMSEA=0.078) and acceptable fit to a graded response IRT model. The final calibrated Bladder Management Difficulties bank includes 15 items, and the final Bowel Management Difficulties item bank consists of 26 items. Additionally, 5 items related to urinary tract infections (UTI) did not fit with the larger Bladder Management Difficulties item bank but performed relatively well independently (CFI=0.992, RMSEA=0.050) and were thus retained as a separate scale. The SCI-QOL Bladder Management Difficulties and Bowel Management Difficulties item banks are psychometrically robust and are available as computer adaptive tests or short forms. The SCI-QOL Bladder Complications scale is a brief, fixed-length outcomes instrument for individuals with a UTI.
Examining Differential Item Functions of Different Item Ordered Test Forms According to Item Difficulty Levels

ERIC Educational Resources Information Center

Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem

2016-01-01

The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
The Confounding Effects of Ability, Item Difficulty, and Content Balance within Multiple Dimensions on the Estimation of Unidimensional Thetas

ERIC Educational Resources Information Center

Matlock, Ki Lynn

2013-01-01

When test forms that have equal total test difficulty and number of items vary in difficulty and length within sub-content areas, an examinee's estimated score may vary across equivalent forms, depending on how well his or her true ability in each sub-content area aligns with the difficulty of items and number of items within these areas.…
The Effect of the Position of an Item within a Test on the Item Difficulty Value.

ERIC Educational Resources Information Center

Rubin, Lois S.; Mott, David E. W.

An investigation of the effect on the difficulty value of an item due to position placement within a test was made. Using a 60-item operational test comprised of 5 subtests, 60 items were placed as experimental items on a number of spiralled test forms in three different positions (first, middle, last) within the subtest composed of like items.…
A Comparison of Three Test Formats to Assess Word Difficulty

ERIC Educational Resources Information Center

Culligan, Brent

2015-01-01

This study compared three common vocabulary test formats, the Yes/No test, the Vocabulary Knowledge Scale (VKS), and the Vocabulary Levels Test (VLT), as measures of vocabulary difficulty. Vocabulary difficulty was defined as the item difficulty estimated through Item Response Theory (IRT) analysis. Three tests were given to 165 Japanese students,…
Identifying predictors of physics item difficulty: A linear regression approach

NASA Astrophysics Data System (ADS)

Mesic, Vanes; Muratovic, Hasnija

2011-06-01

Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge structures. Identified predictors point out the fundamental cognitive dimensions of student physics achievement at the end of compulsory education in Bosnia and Herzegovina, whose level of development influenced the test results within the conducted assessments.
Factors Affecting Item Difficulty in English Listening Comprehension Tests

ERIC Educational Resources Information Center

Sung, Pei-Ju; Lin, Su-Wei; Hung, Pi-Hsia

2015-01-01

Task difficulty is a critical issue affecting test developers. Controlling or balancing the item difficulty of an assessment improves its validity and discrimination. Test developers construct tests from the cognitive perspective, by making the test constructing process more scientific and efficient; thus, the scores obtained more precisely…
Using Differential Item Functioning Procedures to Explore Sources of Item Difficulty and Group Performance Characteristics.

ERIC Educational Resources Information Center

Scheuneman, Janice Dowd; Gerritz, Kalle

1990-01-01

Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)
Item Structural Properties as Predictors of Item Difficulty and Item Association.

ERIC Educational Resources Information Center

Solano-Flores, Guillermo

1993-01-01

Studied the ability of logical test design (LTD) to predict student performance in reading Roman numerals for 211 sixth graders in Mexico City tested on Roman numeral items varying on LTD-related and non-LTD-related variables. The LTD-related variable item iterativity was found to be the best predictor of item difficulty. (SLD)
Predicting Item Difficulty in a Reading Comprehension Test with an Artificial Neural Network.

ERIC Educational Resources Information Center

Perkins, Kyle; And Others

1995-01-01

This article reports the results of using a three-layer back propagation artificial neural network to predict item difficulty in a reading comprehension test. Three classes of variables were examined: text structure, propositional analysis, and cognitive demand. Results demonstrate that the networks can consistently predict item difficulty. (JL)
Statistical Approaches to the Study of Item Difficulty.

ERIC Educational Resources Information Center

Olson, John F.; And Others

Traditionally, item difficulty has been defined in terms of the performance of examinees. For test development purposes, a more useful concept would be some kind of intrinsic item difficulty, defined in terms of the item's content, context, or characteristics and the task demands set by the item. In this investigation, the measurement literature…
Repeated retrieval practice and item difficulty: does criterion learning eliminate item difficulty effects?

PubMed

Vaughn, Kalif E; Rawson, Katherine A; Pyc, Mary A

2013-12-01

A wealth of previous research has established that retrieval practice promotes memory, particularly when retrieval is successful. Although successful retrieval promotes memory, it remains unclear whether successful retrieval promotes memory equally well for items of varying difficulty. Will easy items still outperform difficult items on a final test if all items have been correctly recalled equal numbers of times during practice? In two experiments, normatively difficult and easy Lithuanian-English word pairs were learned via test-restudy practice until each item had been correctly recalled a preassigned number of times (from 1 to 11 correct recalls). Despite equating the numbers of successful recalls during practice, performance on a delayed final cued-recall test was lower for difficult than for easy items. Experiment 2 was designed to diagnose whether the disadvantage for difficult items was due to deficits in cue memory, target memory, and/or associative memory. The results revealed a disadvantage for the difficult versus the easy items only on the associative recognition test, with no differences on cue recognition, and even an advantage on target recognition. Although successful retrieval enhanced memory for both difficult and easy items, equating retrieval success during practice did not eliminate normative item difficulty differences.
Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination.

PubMed

Koh, Bongyeun; Hong, Sunggi; Kim, Soon-Sim; Hyun, Jin-Sook; Baek, Milye; Moon, Jundong; Kwon, Hayran; Kim, Gyoungyong; Min, Seonggi; Kang, Gu-Hyun

2016-01-01

The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.
Rasch Measurement and Item Banking: Theory and Practice.

ERIC Educational Resources Information Center

Nakamura, Yuji

The Rasch Model is an item response theory, one parameter model developed that states that the probability of a correct response on a test is a function of the difficulty of the item and the ability of the candidate. Item banking is useful for language testing. The Rasch Model provides estimates of item difficulties that are meaningful,…
Predicting Item Difficulty of Science National Curriculum Tests: The Case of Key Stage 2 Assessments

ERIC Educational Resources Information Center

El Masri, Yasmine H.; Ferrara, Steve; Foltz, Peter W.; Baird, Jo-Anne

2017-01-01

Predicting item difficulty is highly important in education for both teachers and item writers. Despite identifying a large number of explanatory variables, predicting item difficulty remains a challenge in educational assessment with empirical attempts rarely exceeding 25% of variance explained. This paper analyses 216 science items of key stage…
An Improved Internal Consistency Reliability Estimate.

ERIC Educational Resources Information Center

Cliff, Norman

1984-01-01

The proposed coefficient is derived by assuming that the average Goodman-Kruskal gamma between items of identical difficulty would be the same for items of different difficulty. An estimate of covariance between items of identical difficulty leads to an estimate of the correlation between two tests with identical distributions of difficulty.…
An Investigation of the Impact of Guessing on Coefficient α and Reliability

PubMed Central

2014-01-01

Guessing is known to influence the test reliability of multiple-choice tests. Although there are many studies that have examined the impact of guessing, they used rather restrictive assumptions (e.g., parallel test assumptions, homogeneous inter-item correlations, homogeneous item difficulty, and homogeneous guessing levels across items) to evaluate the relation between guessing and test reliability. Based on the item response theory (IRT) framework, this study investigated the extent of the impact of guessing on reliability under more realistic conditions where item difficulty, item discrimination, and guessing levels actually vary across items with three different test lengths (TL). By accommodating multiple item characteristics simultaneously, this study also focused on examining interaction effects between guessing and other variables entered in the simulation to be more realistic. The simulation of the more realistic conditions and calculations of reliability and classical test theory (CTT) item statistics were facilitated by expressing CTT item statistics, coefficient α, and reliability in terms of IRT model parameters. In addition to the general negative impact of guessing on reliability, results showed interaction effects between TL and guessing and between guessing and test difficulty.
Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination

PubMed Central

2016-01-01

Purpose: The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. Methods: The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. Results: In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). Conclusion: In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination. PMID:26883810

When Listening Is Better Than Reading: Performance Gains on Cardiac Auscultation Test Questions.

PubMed

Short, Kathleen; Bucak, S Deniz; Rosenthal, Francine; Raymond, Mark R

2018-05-01

In 2007, the United States Medical Licensing Examination embedded multimedia simulations of heart sounds into multiple-choice questions. This study investigated changes in item difficulty as determined by examinee performance over time. The data reflect outcomes obtained following initial use of multimedia items from 2007 through 2012, after which an interface change occurred. A total of 233,157 examinees responded to 1,306 cardiology test items over the six-year period; 138 items included multimedia simulations of heart sounds, while 1,168 text-based items without multimedia served as controls. The authors compared changes in difficulty of multimedia items over time with changes in difficulty of text-based cardiology items over time. Further, they compared changes in item difficulty for both groups of items between graduates of Liaison Committee on Medical Education (LCME)-accredited and non-LCME-accredited (i.e., international) medical schools. Examinee performance on cardiology test items with multimedia heart sounds improved by 12.4% over the six-year period, while performance on text-based cardiology items improved by approximately 1.4%. These results were similar for graduates of LCME-accredited and non-LCME-accredited medical schools. Examinees' ability to interpret auscultation findings in test items that include multimedia presentations increased from 2007 to 2012.
A Study of Inference in Standardized Reading Test Items and Its Relationship to Difficulty.

ERIC Educational Resources Information Center

Marzano, Robert J.

To study the relationship between inferences made on standardized reading tests and item difficulty, 50 items on the reading comprehension section of the Metropolitan Achievement Test were analyzed independently in this study by two raters using four general categories of inferences: (1) reference inferences, (2) between proposition inferences,…
Is the Factor Observed in Investigations on the Item-Position Effect Actually the Difficulty Factor?

ERIC Educational Resources Information Center

Schweizer, Karl; Troche, Stefan

2018-01-01

In confirmatory factor analysis quite similar models of measurement serve the detection of the difficulty factor and the factor due to the item-position effect. The item-position effect refers to the increasing dependency among the responses to successively presented items of a test whereas the difficulty factor is ascribed to the wide range of…
Difficulty and Discriminability of Introductory Psychology Test Items.

ERIC Educational Resources Information Center

Scialfa, Charles; Legare, Connie; Wenger, Larry; Dingley, Louis

2001-01-01

Analyzes multiple-choice questions provided in test banks for introductory psychology textbooks. Study 1 offered a consistent picture of the objective difficulty of multiple-choice tests for introductory psychology students, while both studies 1 and 2 indicated that test items taken from commercial test banks have poor psychometric properties.…
Predicting Item Difficulty in a Reading Comprehension Test with an Artificial Neural Network.

ERIC Educational Resources Information Center

Perkins, Kyle; And Others

This paper reports the results of using a three-layer backpropagation artificial neural network to predict item difficulty in a reading comprehension test. Two network structures were developed, one with and one without a sigmoid function in the output processing unit. The data set, which consisted of a table of coded test items and corresponding…
Selecting Items for Criterion-Referenced Tests.

ERIC Educational Resources Information Center

Mellenbergh, Gideon J.; van der Linden, Wim J.

1982-01-01

Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)
Is the Factor Observed in Investigations on the Item-Position Effect Actually the Difficulty Factor?

PubMed

Schweizer, Karl; Troche, Stefan

2018-02-01

In confirmatory factor analysis quite similar models of measurement serve the detection of the difficulty factor and the factor due to the item-position effect. The item-position effect refers to the increasing dependency among the responses to successively presented items of a test whereas the difficulty factor is ascribed to the wide range of item difficulties. The similarity of the models of measurement hampers the dissociation of these factors. Since the item-position effect should theoretically be independent of the item difficulties, the statistical ex post manipulation of the difficulties should enable the discrimination of the two types of factors. This method was investigated in two studies. In the first study, Advanced Progressive Matrices (APM) data of 300 participants were investigated. As expected, the factor thought to be due to the item-position effect was observed. In the second study, using data simulated to show the major characteristics of the APM data, the wide range of items with various difficulties was set to zero to reduce the likelihood of detecting the difficulty factor. Despite this reduction, however, the factor now identified as item-position factor, was observed in virtually all simulated datasets.
Stereotype threat in classroom settings: the interactive effect of domain identification, task difficulty and stereotype threat on female students' maths performance.

PubMed

Keller, Johannes

2007-06-01

Stereotype threat research revealed that negative stereotypes can disrupt the performance of persons targeted by such stereotypes. This paper contributes to stereotype threat research by providing evidence that domain identification and the difficulty level of test items moderate stereotype threat effects on female students' maths performance. The study was designed to test theoretical ideas derived from stereotype threat theory and assumptions outlined in the Yerkes-Dodson law proposing a nonlinear relationship between arousal, task difficulty and performance. Participants were 108 high school students attending secondary schools. Participants worked on a test comprising maths problems of different difficulty levels. Half of the participants learned that the test had been shown to produce gender differences (stereotype threat). The other half learned that the test had been shown not to produce gender differences (no threat). The degree to which participants identify with the domain of maths was included as a quasi-experimental factor. Maths-identified female students showed performance decrements under conditions of stereotype threat. Moreover, the stereotype threat manipulation had different effects on low and high domain identifiers' performance depending on test item difficulty. On difficult items, low identifiers showed higher performance under threat (vs. no threat) whereas the reverse was true in high identifiers. This interaction effect did not emerge on easy items. Domain identification and test item difficulty are two important factors that need to be considered in the attempt to understand the impact of stereotype threat on performance.
What Does a Verbal Test Measure? A New Approach to Understanding Sources of Item Difficulty.

ERIC Educational Resources Information Center

Berk, Eric J. Vanden; Lohman, David F.; Cassata, Jennifer Coyne

Assessing the construct relevance of mental test results continues to present many challenges, and it has proven to be particularly difficult to assess the construct relevance of verbal items. This study was conducted to gain a better understanding of the conceptual sources of verbal item difficulty using a unique approach that integrates…
On Maximizing Item Information and Matching Difficulty with Ability.

ERIC Educational Resources Information Center

Bickel, Peter; Buyske, Steven; Chang, Huahua; Ying, Zhiliang

2001-01-01

Examined the assumption that matching difficulty levels of test items with an examinee's ability makes a test more efficient and challenged this assumption through a class of one-parameter item response theory models. Found the validity of the fundamental assumption to be closely related to the van Zwet tail ordering of symmetric distributions (W.…
Detecting a Gender-Related Differential Item Functioning Using Transformed Item Difficulty

ERIC Educational Resources Information Center

Abedalaziz, Nabeel; Leng, Chin Hai; Alahmadi, Ahlam

2014-01-01

The purpose of the study was to examine gender differences in performance on multiple-choice mathematical ability test, administered within the context of high school graduation test that was designed to match eleventh grade curriculum. The transformed item difficulty (TID) was used to detect a gender related DIF. A random sample of 1400 eleventh…
International Semiotics: Item Difficulty and the Complexity of Science Item Illustrations in the PISA-2009 International Test Comparison

ERIC Educational Resources Information Center

Solano-Flores, Guillermo; Wang, Chao; Shade, Chelsey

2016-01-01

We examined multimodality (the representation of information in multiple semiotic modes) in the context of international test comparisons. Using Program of International Student Assessment (PISA)-2009 data, we examined the correlation of the difficulty of science items and the complexity of their illustrations. We observed statistically…
The Effects of Judgment-Based Stratum Classifications on the Efficiency of Stratum Scored CATs.

ERIC Educational Resources Information Center

Finney, Sara J.; Smith, Russell W.; Wise, Steven L.

Two operational item pools were used to investigate the performance of stratum computerized adaptive tests (CATs) when items were assigned to strata based on empirical estimates of item difficulty or human judgments of item difficulty. Items from the first data set consisted of 54 5-option multiple choice items from a form of the ACT mathematics…
Item difficulty and item validity for the Children's Group Embedded Figures Test.

PubMed

Rusch, R R; Trigg, C L; Brogan, R; Petriquin, S

1994-02-01

The validity and reliability of the Children's Group Embedded Figures Test was reported for students in Grade 2 by Cromack and Stone in 1980; however, a search of the literature indicates no evidence for internal consistency or item analysis. Hence the purpose of this study was to examine the item difficulty and item validity of the test with children in Grades 1 and 2. Confusion in the literature over development and use of this test was seemingly resolved through analysis of these descriptions and through an interview with the test developer. One early-appearing item was unreasonably difficult. Two or three other items were quite difficult and made little contribution to the total score. Caution is recommended, however, in any reordering or elimination of items based on these findings, given the limited number of subjects (n = 84).
Component Identification and Item Difficulty of Raven's Matrices Items.

ERIC Educational Resources Information Center

Green, Kathy E.; Kluever, Raymond C.

Item components that might contribute to the difficulty of items on the Raven Colored Progressive Matrices (CPM) and the Standard Progressive Matrices (SPM) were studied. Subjects providing responses to CPM items were 269 children aged 2 years 9 months to 11 years 8 months, most of whom were referred for testing as potentially gifted. A second…
Readability Level of Standardized Test Items and Student Performance: The Forgotten Validity Variable

ERIC Educational Resources Information Center

Hewitt, Margaret A.; Homan, Susan P.

2004-01-01

Test validity issues considered by test developers and school districts rarely include individual item readability levels. In this study, items from a major standardized test were examined for individual item readability level and item difficulty. The Homan-Hewitt Readability Formula was applied to items across three grade levels. Results of…
Item Response Theory Modeling of the Philadelphia Naming Test.

PubMed

Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D

2015-06-01

In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating explanatory variables to item difficulty. This article describes the statistical model underlying the computer adaptive PNT presented in a companion article (Hula, Kellough, & Fergadiotis, 2015). Using archival data, we evaluated the fit of the PNT to 1- and 2-parameter logistic models and examined the precision of the resulting parameter estimates. We regressed the item difficulty estimates on three predictor variables: word length, age of acquisition, and contextual diversity. The 2-parameter logistic model demonstrated marginally better fit, but the fit of the 1-parameter logistic model was adequate. Precision was excellent for both person ability and item difficulty estimates. Word length, age of acquisition, and contextual diversity all independently contributed to variance in item difficulty. Item-response-theory methods can be productively used to analyze and quantify anomia severity in aphasia. Regression of item difficulty on lexical variables supported the validity of the PNT and interpretation of anomia severity scores in the context of current word-finding models.
Enhancing the Equating of Item Difficulty Metrics: Estimation of Reference Distribution. Research Report. ETS RR-14-07

ERIC Educational Resources Information Center

Ali, Usama S.; Walker, Michael E.

2014-01-01

Two methods are currently in use at Educational Testing Service (ETS) for equating observed item difficulty statistics. The first method involves the linear equating of item statistics in an observed sample to reference statistics on the same items. The second method, or the item response curve (IRC) method, involves the summation of conditional…
Do Images Influence Assessment in Anatomy? Exploring the Effect of Images on Item Difficulty and Item Discrimination

ERIC Educational Resources Information Center

Vorstenbosch, Marc A. T. M.; Klaassen, Tim P. F. M.; Kooloos, Jan G. M.; Bolhuis, Sanneke M.; Laan, Roland F. J. M.

2013-01-01

Anatomists often use images in assessments and examinations. This study aims to investigate the influence of different types of images on item difficulty and item discrimination in written assessments. A total of 210 of 460 students volunteered for an extra assessment in a gross anatomy course. This assessment contained 39 test items grouped in…
Multiple choice questions can be designed or revised to challenge learners' critical thinking.

PubMed

Tractenberg, Rochelle E; Gushta, Matthew M; Mulroney, Susan E; Weissinger, Peggy A

2013-12-01

Multiple choice (MC) questions from a graduate physiology course were evaluated by cognitive-psychology (but not physiology) experts, and analyzed statistically, in order to test the independence of content expertise and cognitive complexity ratings of MC items. Integration of higher order thinking into MC exams is important, but widely known to be challenging-perhaps especially when content experts must think like novices. Expertise in the domain (content) may actually impede the creation of higher-complexity items. Three cognitive psychology experts independently rated cognitive complexity for 252 multiple-choice physiology items using a six-level cognitive complexity matrix that was synthesized from the literature. Rasch modeling estimated item difficulties. The complexity ratings and difficulty estimates were then analyzed together to determine the relative contributions (and independence) of complexity and difficulty to the likelihood of correct answers on each item. Cognitive complexity was found to be statistically independent of difficulty estimates for 88 % of items. Using the complexity matrix, modifications were identified to increase some item complexities by one level, without affecting the item's difficulty. Cognitive complexity can effectively be rated by non-content experts. The six-level complexity matrix, if applied by faculty peer groups trained in cognitive complexity and without domain-specific expertise, could lead to improvements in the complexity targeted with item writing and revision. Targeting higher order thinking with MC questions can be achieved without changing item difficulties or other test characteristics, but this may be less likely if the content expert is left to assess items within their domain of expertise.

A Review of Classical Methods of Item Analysis.

ERIC Educational Resources Information Center

French, Christine L.

Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…
Development and Psychometric Evaluation of the Gay Male Sexual Difficulties Scale.

PubMed

McDonagh, Lorraine K; Stewart, Ian; Morrison, Melanie A; Morrison, Todd G

2016-08-01

Sexual difficulties (i.e., disturbances in normal sexual responding) have the potential to significantly and negatively affect men's social and psychological well-being. However, a review of published measurement tools indicates that most have limited applicability to gay men, and none offer a nuanced understanding of sexual difficulties, as experienced by members of this population. To address this omission, the Gay Male Sexual Difficulties Scale (GMSDS) was developed using a sequential mixed-methods approach. The 25-item GMSDS uses a 6-point frequency Likert-type response format and examines: difficulties with receptive and insertive anal intercourse (5 items each); erectile difficulties (4 items); foreskin difficulties (4 items); body embarrassment (4 items); and seminal fluid concerns (3 items). The measure's scale score dimensionality, assessed using both exploratory and confirmatory factor analyses, as well as scale score reliability and validity (e.g., known-groups and convergent) was tested and deemed to be satisfactory. Limitations of the current series of studies and directions for future research are discussed.
A Comparison of Three Types of Test Development Procedures Using Classical and Latent Trait Methods.

ERIC Educational Resources Information Center

Benson, Jeri; Wilson, Michael

Three methods of item selection were used to select sets of 38 items from a 50-item verbal analogies test and the resulting item sets were compared for internal consistency, standard errors of measurement, item difficulty, biserial item-test correlations, and relative efficiency. Three groups of 1,500 cases each were used for item selection. First…
Detecting a Gender-Related DIF Using Logistic Regression and Transformed Item Difficulty

ERIC Educational Resources Information Center

Abedlaziz, Nabeel; Ismail, Wail; Hussin, Zaharah

2011-01-01

Test items are designed to provide information about the examinees. Difficult items are designed to be more demanding and easy items are less so. However, sometimes, test items carry with their demands other than those intended by the test developer (Scheuneman & Gerritz, 1990). When personal attributes such as gender systematically affect…
Relevance of Item Analysis in Standardizing an Achievement Test in Teaching of Physical Science in B.Ed Syllabus

ERIC Educational Resources Information Center

Marie, S. Maria Josephine Arokia; Edannur, Sreekala

2015-01-01

This paper focused on the analysis of test items constructed in the paper of teaching Physical Science for B.Ed. class. It involved the analysis of difficulty level and discrimination power of each test item. Item analysis allows selecting or omitting items from the test, but more importantly item analysis is a tool to help the item writer improve…
Bilingual health literacy assessment using the Talking Touchscreen/la Pantalla Parlanchina: Development and pilot testing.

PubMed

Yost, Kathleen J; Webster, Kimberly; Baker, David W; Choi, Seung W; Bode, Rita K; Hahn, Elizabeth A

2009-06-01

Current health literacy measures are too long, imprecise, or have questionable equivalence of English and Spanish versions. The purpose of this paper is to describe the development and pilot testing of a new bilingual computer-based health literacy assessment tool. We analyzed literacy data from three large studies. Using a working definition of health literacy, we developed new prose, document and quantitative items in English and Spanish. Items were pilot tested on 97 English- and 134 Spanish-speaking participants to assess item difficulty. Items covered topics relevant to primary care patients and providers. English- and Spanish-speaking participants understood the tasks involved in answering each type of question. The English Talking Touchscreen was easy to use and the English and Spanish items provided good coverage of the difficulty continuum. Qualitative and quantitative results provided useful information on computer acceptability and initial item difficulty. After the items have been administered on the Talking Touchscreen (la Pantalla Parlanchina) to 600 English-speaking (and 600 Spanish-speaking) primary care patients, we will develop a computer adaptive test. This health literacy tool will enable clinicians and researchers to more precisely determine the level at which low health literacy adversely affects health and healthcare utilization.
Interpretation of the Rasch Ability and Difficulty Scales for Educational Purposes.

ERIC Educational Resources Information Center

Woodcock, Richard W.

Though many test developers have utilized item response theory in their work, few have taken advantage of the potential of item response theory for providing new interpretation procedures that accentuate the educational implications to be drawn from test scores. This paper describes several features, based upon the Rasch difficulty and ability…
The Effect of Anchor Test Construction on Scale Drift

ERIC Educational Resources Information Center

Antal, Judit; Proctor, Thomas P.; Melican, Gerald J.

2014-01-01

In common-item equating the anchor block is generally built to represent a miniature form of the total test in terms of content and statistical specifications. The statistical properties frequently reflect equal mean and spread of item difficulty. Sinharay and Holland (2007) suggested that the requirement for equal spread of difficulty may be too…
Sources of difficulty in assessment: example of PISA science items

NASA Astrophysics Data System (ADS)

Le Hebel, Florence; Montpied, Pascale; Tiberghien, Andrée; Fontanieu, Valérie

2017-03-01

The understanding of what makes a question difficult is a crucial concern in assessment. To study the difficulty of test questions, we focus on the case of PISA, which assesses to what degree 15-year-old students have acquired knowledge and skills essential for full participation in society. Our research question is to identify PISA science item characteristics that could influence the item's proficiency level. It is based on an a-priori item analysis and a statistical analysis. Results show that only the cognitive complexity and the format out of the different characteristics of PISA science items determined in our a-priori analysis have an explanatory power on an item's proficiency levels. The proficiency level cannot be explained by the dependence/independence of the information provided in the unit and/or item introduction and the competence. We conclude that in PISA, it appears possible to anticipate a high proficiency level, that is, students' low scores for items displaying a high cognitive complexity. In the case of a middle or low cognitive complexity level item, the cognitive complexity level is not sufficient to predict item difficulty. Other characteristics play a crucial role in item difficulty. We discuss anticipating the difficulties in assessment in a broader perspective.
Effects of Item Exposure for Conventional Examinations in a Continuous Testing Environment.

ERIC Educational Resources Information Center

Hertz, Norman R.; Chinn, Roberta N.

This study explored the effect of item exposure on two conventional examinations administered as computer-based tests. A principal hypothesis was that item exposure would have little or no effect on average difficulty of the items over the course of an administrative cycle. This hypothesis was tested by exploring conventional item statistics and…
Using Reliability and Item Analysis to Evaluate a Teacher-Developed Test in Educational Measurement and Evaluation

ERIC Educational Resources Information Center

Quaigrain, Kennedy; Arhin, Ato Kwamina

2017-01-01

Item analysis is essential in improving items which will be used again in later tests; it can also be used to eliminate misleading items in a test. The study focused on item and test quality and explored the relationship between difficulty index (p-value) and discrimination index (DI) with distractor efficiency (DE). The study was conducted among…
The quadratic relationship between difficulty of intelligence test items and their correlations with working memory.

PubMed

Smolen, Tomasz; Chuderski, Adam

2015-01-01

Fluid intelligence (Gf) is a crucial cognitive ability that involves abstract reasoning in order to solve novel problems. Recent research demonstrated that Gf strongly depends on the individual effectiveness of working memory (WM). We investigated a popular claim that if the storage capacity underlay the WM-Gf correlation, then such a correlation should increase with an increasing number of items or rules (load) in a Gf-test. As often no such link is observed, on that basis the storage-capacity account is rejected, and alternative accounts of Gf (e.g., related to executive control or processing speed) are proposed. Using both analytical inference and numerical simulations, we demonstrated that the load-dependent change in correlation is primarily a function of the amount of floor/ceiling effect for particular items. Thus, the item-wise WM correlation of a Gf-test depends on its overall difficulty, and the difficulty distribution across its items. When the early test items yield huge ceiling, but the late items do not approach floor, that correlation will increase throughout the test. If the early items locate themselves between ceiling and floor, but the late items approach floor, the respective correlation will decrease. For a hallmark Gf-test, the Raven-test, whose items span from ceiling to floor, the quadratic relationship is expected, and it was shown empirically using a large sample and two types of WMC tasks. In consequence, no changes in correlation due to varying WM/Gf load, or lack of them, can yield an argument for or against any theory of WM/Gf. Moreover, as the mathematical properties of the correlation formula make it relatively immune to ceiling/floor effects for overall moderate correlations, only minor changes (if any) in the WM-Gf correlation should be expected for many psychological tests.
Do Reading Experts Agree with MCAT Verbal Reasoning Item Classifications?

ERIC Educational Resources Information Center

Jackson, Evelyn W.; And Others

1994-01-01

Examined whether expert raters (n=5) could agree about classification of Medical College Admission Test (MCAT) items and whether they agreed with MCAT student manual in labeling skill being measured by each test item. Results revealed difficulties in replicating authors' labeling of skills for reading items on practice test provided with 1991 MCAT…
Do the Guideline Violations Influence Test Difficulty of High-Stake Test?: An Investigation on University Entrance Examination in Turkey

ERIC Educational Resources Information Center

Atalmis, Erkan Hasan

2016-01-01

Multiple-choice (MC) items are commonly used in high-stake tests. Thus, each item of such tests should be meticulously constructed to increase the accuracy of decisions based on test results. Haladyna and his colleagues (2002) addressed the valid item-writing guidelines to construct high quality MC items in order to increase test reliability and…
An Investigation of Gender Differences in the Components Influencing the Difficulty of Spatial Ability Items.

ERIC Educational Resources Information Center

Kramer, Gene A.; Smith, Richard M.

2001-01-01

Examined the role that gender differences play in the determination of the components influencing the difficulty of spatial ability items. Results for 2,245 examinees taking a spatial ability test that is part of the Dental School Admission Battery show that component difficulties show little variation across gender. (SLD)
Item analysis of the Spanish version of the Boston Naming Test with a Spanish speaking adult population from Colombia.

PubMed

Kim, Stella H; Strutt, Adriana M; Olabarrieta-Landa, Laiene; Lequerica, Anthony H; Rivera, Diego; De Los Reyes Aragon, Carlos Jose; Utria, Oscar; Arango-Lasprilla, Juan Carlos

2018-02-23

The Boston Naming Test (BNT) is a widely used measure of confrontation naming ability that has been criticized for its questionable construct validity for non-English speakers. This study investigated item difficulty and construct validity of the Spanish version of the BNT to assess cultural and linguistic impact on performance. Subjects were 1298 healthy Spanish speaking adults from Colombia. They were administered the 60- and 15-item Spanish version of the BNT. A Rasch analysis was computed to assess dimensionality, item hierarchy, targeting, reliability, and item fit. Both versions of the BNT satisfied requirements for unidimensionality. Although internal consistency was excellent for the 60-item BNT, order of difficulty did not increase consistently with item number and there were a number of items that did not fit the Rasch model. For the 15-item BNT, a total of 5 items changed position on the item hierarchy with 7 poor fitting items. Internal consistency was acceptable. Construct validity of the BNT remains a concern when it is administered to non-English speaking populations. Similar to previous findings, the order of item presentation did not correspond with increasing item difficulty, and both versions were inadequate at assessing high naming ability.
Measuring the Instructional Sensitivity of ESL Reading Comprehension Items.

ERIC Educational Resources Information Center

Brutten, Sheila R.; And Others

A study attempted to estimate the instructional sensitivity of items in three reading comprehension tests in English as a second language (ESL). Instructional sensitivity is a test-item construct defined as the tendency for a test item to vary in difficulty as a function of instruction. Similar tasks were given to readers at different proficiency…
Efforts Toward the Development of Unbiased Selection and Assessment Instruments.

ERIC Educational Resources Information Center

Rudner, Lawrence M.

Investigations into item bias provide an empirical basis for the identification and elimination of test items which appear to measure different traits across populations or cultural groups. The Psychometric rationales for six approaches to the identification of biased test items are reviewed: (1) Transformed item difficulties: within-group…
Faster on Easy Items, More Accurate on Difficult Ones: Cognitive Ability and Performance on a Task of Varying Difficulty

ERIC Educational Resources Information Center

Dodonova, Yulia A.; Dodonov, Yury S.

2013-01-01

Using more complex items than those commonly employed within the information-processing approach, but still easier than those used in intelligence tests, this study analyzed how the association between processing speed and accuracy level changes as the difficulty of the items increases. The study involved measuring cognitive ability using Raven's…
Fitting the Rasch Model to Account for Variation in Item Discrimination

ERIC Educational Resources Information Center

Weitzman, R. A.

2009-01-01

Building on the Kelley and Gulliksen versions of classical test theory, this article shows that a logistic model having only a single item parameter can account for varying item discrimination, as well as difficulty, by using item-test correlations to adjust incorrect-correct (0-1) item responses prior to an initial model fit. The fit occurs…

Item analysis of examinations in the Faculty of Medicine of Tunis.

PubMed

Hermi, Amene; Achour, Wafa

2016-04-01

Introduction Item analysis is the process of collecting, summarizing and using information from students' responses to assess test items' quality. This study used this approach to evaluate the quality of items and examinations given in the Faculty of Medicine of Tunis (FMT). Methods This study concerned the examinations of 2012-2013 (principal session). It analyzed 3138 items from 66 examinations, of which, 46 were multidisciplinary (187 disciplines). A total of 2515 students took the examinations. "AnItem.xls" file was used for the analysis that focused on difficulty, discrimination and internal consistency. Results Mean difficulty for all examinations was optimum (mean difficulty index: 0.59). Majority of items (89.17%) were either easy or of acceptable difficulty. Mean discrimination for all examinations was moderate (mean item discrimination coefficient: 0.28) with poor discrimination in 23.62% of items. Maximal discrimination occurred with disciplines of difficulty index between 0.4-0.6. « Ideal » items represented 27.02%. Mean internal consistency for all examinations was acceptable (Cronbach's alpha: 0.79). Disciplines with nonacceptable internal consistency (68.45%) contained a maximum of 33 items (each one) and a positive correlation between their alpha and the number of their questions. Distributions were mostly (72.73%) platykurtic and negatively asymmetric (89.39%). First year of studies had the best parameters. Conclusion Our examinations had an acceptable internal consistency, and a good level of difficulty and discrimination. They tended to facility and discriminated basically students of medium level. Item analysis is useful as a guide to item writers to improve the overall quality of questions in the future.
An Alternate Definition of the ETS Delta Scale of Item Difficulty. Program Statistics Research.

ERIC Educational Resources Information Center

Holland, Paul W.; Thayer, Dorothy T.

An alternative definition has been developed of the delta scale of item difficulty used at Educational Testing Service. The traditional delta scale uses an inverse normal transformation based on normal ogive models developed years ago. However, no use is made of this fact in typical uses of item deltas. It is simply one way to make the probability…
Demonstrating the Difference between Classical Test Theory and Item Response Theory Using Derived Test Data

ERIC Educational Resources Information Center

Magno, Carlo

2009-01-01

The present report demonstrates the difference between classical test theory (CTT) and item response theory (IRT) approach using an actual test data for chemistry junior high school students. The CTT and IRT were compared across two samples and two forms of test on their item difficulty, internal consistency, and measurement errors. The specific…
The role of difficulty and gender in numbers, algebra, geometry and mathematics achievement

NASA Astrophysics Data System (ADS)

Rabab'h, Belal Sadiq Hamed; Veloo, Arsaythamby; Perumal, Selvan

2015-05-01

This study aims to identify the role of difficulty and gender in numbers, algebra, geometry and mathematics achievement among secondary schools students in Jordan. The respondent of the study were 337 students from eight public secondary school in Alkoura district by using stratified random sampling. The study comprised of 179 (53%) males and 158 (47%) females students. The mathematics test comprises of 30 items which has eight items for numbers, 14 items for algebra and eight items for geometry. Based on difficulties among male and female students, the findings showed that item 4 (fractions - 0.34) was most difficult for male students and item 6 (square roots - 0.39) for females in numbers. For the algebra, item 11 (inequality - 0.23) was most difficult for male students and item 6 (algebraic expressions - 0.35) for female students. In geometry, item 3 (reflection - 0.34) was most difficult for male students and item 8 (volume - 0.33) for female students. Based on gender differences, female students showed higher achievement in numbers and algebra compare to male students. On the other hand, there was no differences between male and female students achievement in geometry test. This study suggest that teachers need to give more attention on numbers and algebra when teaching mathematics.
Assessing the Conceptual Understanding about Heat and Thermodynamics at Undergraduate Level

ERIC Educational Resources Information Center

Kulkarni, Vasudeo Digambar; Tambade, Popat Savaleram

2013-01-01

In this study, a Thermodynamic Concept Test (TCT) was designed to assess student's conceptual understanding heat and thermodynamics at undergraduate level. The different statistical tests such as item difficulty index, item discrimination index, point biserial coefficient were used for assessing TCT. For each item of the test these indices were…
Effects of Using Modified Items to Test Students with Persistent Academic Difficulties

ERIC Educational Resources Information Center

Elliott, Stephen N.; Kettler, Ryan J.; Beddow, Peter A.; Kurz, Alexander; Compton, Elizabeth; McGrath, Dawn; Bruen, Charles; Hinton, Kent; Palmer, Porter; Rodriguez, Michael C.; Bolt, Daniel; Roach, Andrew T.

2010-01-01

This study investigated the effects of using modified items in achievement tests to enhance accessibility. An experiment determined whether tests composed of modified items would reduce the performance gap between students eligible for an alternate assessment based on modified achievement standards (AA-MAS) and students not eligible, and the…
A Comparison of Traditional Test Blueprinting and Item Development to Assessment Engineering in a Licensure Context

ERIC Educational Resources Information Center

Masters, James S.

2010-01-01

With the need for larger and larger banks of items to support adaptive testing and to meet security concerns, large-scale item generation is a requirement for many certification and licensure programs. As part of the mass production of items, it is critical that the difficulty and the discrimination of the items be known without the need for…
A Comparison of Different Psychometric Approaches to Modeling Testlet Structures: An Example with C-Tests

ERIC Educational Resources Information Center

Schroeders, Ulrich; Robitzsch, Alexander; Schipolowski, Stefan

2014-01-01

C-tests are a specific variant of cloze tests that are considered time-efficient, valid indicators of general language proficiency. They are commonly analyzed with models of item response theory assuming local item independence. In this article we estimated local interdependencies for 12 C-tests and compared the changes in item difficulties,…
Application of Computerized Adaptive Testing to Entrance Examination for Graduate Studies in Turkey

ERIC Educational Resources Information Center

Bulut, Okan; Kan, Adnan

2012-01-01

Problem Statement: Computerized adaptive testing (CAT) is a sophisticated and efficient way of delivering examinations. In CAT, items for each examinee are selected from an item bank based on the examinee's responses to the items. In this way, the difficulty level of the test is adjusted based on the examinee's ability level. Instead of…
Rasch Based Analysis of Oral Proficiency Test Data.

ERIC Educational Resources Information Center

Nakamura, Yuji

2001-01-01

This paper examines the rating scale data of oral proficiency tests analyzed by a Rasch Analysis focusing on an item map and factor analysis. In discussing the item map, the difficulty order of six items and students' answering patterns are analyzed using descriptive statistics and measures of central tendency of test scores. The data ranks the…
Adaptive Mental Testing: The State of the Art

DTIC Science & Technology

1979-11-01

typically vary in their psychometric properties --particularly in their difficulty--the test designer must decide what configuration of these item...psychometric properties best suits the test’s purpose. There are two extreme ration- ales to guide that decision. One rationale is to choose items that are...development of item response theory (Rasch, 1960; Lord, 1952, 1970, 1974a; Birnbaum, 1968) that provided the needed invariance properties for item
Item analysis of three Spanish naming tests: a cross-cultural investigation.

PubMed

Marquez de la Plata, Carlos; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C Munro

2009-01-01

Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test's construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (136 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided.
Item development process and analysis of 50 case-based items for implementation on the Korean Nursing Licensing Examination.

PubMed

Park, In Sook; Suh, Yeon Ok; Park, Hae Sook; Kang, So Young; Kim, Kwang Sung; Kim, Gyung Hee; Choi, Yeon-Hee; Kim, Hyun-Ju

2017-01-01

The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge. We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory. A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%-100% for 12 items; 60%-80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below -2.0), easy for 8 items (-2.0 to -0.5), medium for 6 items (-0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate. We surmised that the low reliability of the mock examination was influenced by the timing of the test for the examinees and the inappropriate difficulty of the items. Our study suggested a methodology for the development of future case-based items for the Korean Nursing Licensing Examination.
Modeling Booklet Effects for Nonequivalent Group Designs in Large-Scale Assessment

ERIC Educational Resources Information Center

Hecht, Martin; Weirich, Sebastian; Siegle, Thilo; Frey, Andreas

2015-01-01

Multiple matrix designs are commonly used in large-scale assessments to distribute test items to students. These designs comprise several booklets, each containing a subset of the complete item pool. Besides reducing the test burden of individual students, using various booklets allows aligning the difficulty of the presented items to the assumed…
Two-item same/different discrimination in rhesus monkeys (Macaca mulatta).

PubMed

Basile, Benjamin M; Moylan, Emily J; Charles, David P; Murray, Elisabeth A

2015-11-01

Almost all nonhuman animals can recognize when one item is the same as another item. It is less clear whether nonhuman animals possess abstract concepts of "same" and "different" that can be divorced from perceptual similarity. Pigeons and monkeys show inconsistent performance, and often surprising difficulty, in laboratory tests of same/different learning that involve only two items. Previous results from tests using multi-item arrays suggest that nonhumans compute sameness along a continuous scale of perceptual variability, which would explain the difficulty of making two-item same/different judgments. Here, we provide evidence that rhesus monkeys can learn a two-item same/different discrimination similar to those on which monkeys and pigeons have previously failed. Monkeys' performance transferred to novel stimuli and was not affected by perceptual variations in stimulus size, rotation, view, or luminance. Success without the use of multi-item arrays, and the lack of effect of perceptual variability, suggests a computation of sameness that is more categorical, and perhaps more abstract, than previously thought.
Item analysis of university-wide multiple choice objective examinations: the experience of a Nigerian private university.

PubMed

Odukoya, Jonathan A; Adekeye, Olajide; Igbinoba, Angie O; Afolabi, A

2018-01-01

Teachers and Students worldwide often dance to the tune of tests and examinations. Assessments are powerful tools for catalyzing the achievement of educational goals, especially if done rightly. One of the tools for 'doing it rightly' is item analysis. The core objectives for this study, therefore, were: ascertaining the item difficulty and distractive indices of the university wide courses. A range of 112-1956 undergraduate students participated in this study. With the use of secondary data, the ex-post facto design was adopted for this project. In virtually all cases, majority of the items (ranging between 65% and 97% of the 70 items fielded in each course) did not meet psychometric standard in terms of difficulty and distractive indices and consequently needed to be moderated or deleted. Considering the importance of these courses, the need to apply item analyses when developing these tests was emphasized.
Analysis Test of Understanding of Vectors with the Three-Parameter Logistic Model of Item Response Theory and Item Response Curves Technique

ERIC Educational Resources Information Center

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-01-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming…
A Comparison between Discrimination Indices and Item-Response Theory Using the Rasch Model in a Clinical Course Written Examination of a Medical School.

PubMed

Park, Jong Cook; Kim, Kwang Sig

2012-03-01

The reliability of test is determined by each items' characteristics. Item analysis is achieved by classical test theory and item response theory. The purpose of the study was to compare the discrimination indices with item response theory using the Rasch model. Thirty-one 4th-year medical school students participated in the clinical course written examination, which included 22 A-type items and 3 R-type items. Point biserial correlation coefficient (C(pbs)) was compared to method of extreme group (D), biserial correlation coefficient (C(bs)), item-total correlation coefficient (C(it)), and corrected item-total correlation coeffcient (C(cit)). Rasch model was applied to estimate item difficulty and examinee's ability and to calculate item fit statistics using joint maximum likelihood. Explanatory power (r2) of Cpbs is decreased in the following order: C(cit) (1.00), C(it) (0.99), C(bs) (0.94), and D (0.45). The ranges of difficulty logit and standard error and ability logit and standard error were -0.82 to 0.80 and 0.37 to 0.76, -3.69 to 3.19 and 0.45 to 1.03, respectively. Item 9 and 23 have outfit > or =1.3. Student 1, 5, 7, 18, 26, 30, and 32 have fit > or =1.3. C(pbs), C(cit), and C(it) are good discrimination parameters. Rasch model can estimate item difficulty parameter and examinee's ability parameter with standard error. The fit statistics can identify bad items and unpredictable examinee's responses.
Item Estimates under Low-Stakes Conditions: How Should Omits Be Treated?

ERIC Educational Resources Information Center

DeMars, Christine

Using data from a pilot test of science and math from students in 30 high schools, item difficulties were estimated with a one-parameter model (partial-credit model for the multi-point items). Some items were multiple-choice items, and others were constructed-response items (open-ended). Four sets of estimates were obtained: estimates for males…
The Arabic Version of The Depression Anxiety Stress Scale-21: Cumulative scaling and discriminant-validation testing.

PubMed

Ali, Amira Mohammed; Ahmed, Anwar; Sharaf, Amira; Kawakami, Norito; Abdeldayem, Samia M; Green, Joseph

2017-12-01

This study aimed to examine the validity of the Arabic version of the Depression Anxiety Stress Scale-21 (DASS-21) in 149 illicit drug users. We calculated α coefficient, inter-item and item-total correlations, coefficients of reproducibility and scalability (CR and CS), item difficulty and discrimination indices. The DASS-21 had an acceptable reliability; but values of the CR and the CS were less than acceptable. Items varied in difficulty and discrimination; some items are candidates for elimination. The DASS-21 is a probabilistic and not a deterministic measure of distress; it has problematic items and needs further investigations. Copyright © 2017 Elsevier B.V. All rights reserved.

Building an Evaluation Scale using Item Response Theory.

PubMed

Lalor, John P; Wu, Hao; Yu, Hong

2016-11-01

Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.
Building an Evaluation Scale using Item Response Theory

PubMed Central

Lalor, John P.; Wu, Hao; Yu, Hong

2016-01-01

Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.1 PMID:28004039
Treatment of Not-Administered Items on Individually Administered Intelligence Tests

ERIC Educational Resources Information Center

He, Wei; Wolfe, Edward W.

2012-01-01

In administration of individually administered intelligence tests, items are commonly presented in a sequence of increasing difficulty, and test administration is terminated after a predetermined number of incorrect answers. This practice produces stochastically censored data, a form of nonignorable missing data. By manipulating four factors…
A Monte Carlo Simulation Investigating the Validity and Reliability of Ability Estimation in Item Response Theory with Speeded Computer Adaptive Tests

ERIC Educational Resources Information Center

Schmitt, T. A.; Sass, D. A.; Sullivan, J. R.; Walker, C. M.

2010-01-01

Imposed time limits on computer adaptive tests (CATs) can result in examinees having difficulty completing all items, thus compromising the validity and reliability of ability estimates. In this study, the effects of speededness were explored in a simulated CAT environment by varying examinee response patterns to end-of-test items. Expectedly,…
Examining the Invariance of Rater and Project Calibrations Using a Multi-facet Rasch Model.

ERIC Educational Resources Information Center

O'Neill, Thomas R.; Lunz, Mary E.

To generalize test results beyond the particular test administration, an examinee's ability estimate must be independent of the particular items attempted, and the item difficulty calibrations must be independent of the particular sample of people attempting the items. This stability is a key concept of the Rasch model, a latent trait model of…
ITEM ANALYSIS OF THREE SPANISH NAMING TESTS: A CROSS-CULTURAL INVESTIGATION

PubMed Central

de la Plata, Carlos Marquez; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C. Munro

2009-01-01

Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test’s construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (126 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided. PMID:19208960
Mutual Information Item Selection in Adaptive Classification Testing

ERIC Educational Resources Information Center

Weissman, Alexander

2007-01-01

A general approach for item selection in adaptive multiple-category classification tests is provided. The approach uses mutual information (MI), a special case of the Kullback-Leibler distance, or relative entropy. MI works efficiently with the sequential probability ratio test and alleviates the difficulties encountered with using other local-…
Validation of a clinical critical thinking skills test in nursing.

PubMed

Shin, Sujin; Jung, Dukyoo; Kim, Sungeun

2015-01-27

The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability.
Validation of a clinical critical thinking skills test in nursing

PubMed Central

2015-01-01

Purpose: The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. Methods: This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Results: Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. Conclusion: From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability. PMID:25622716
Constructing three emotion knowledge tests from the invariant measurement approach

PubMed Central

Prieto, Gerardo; Burin, Debora I.

2017-01-01

Background Psychological constructionist models like the Conceptual Act Theory (CAT) postulate that complex states such as emotions are composed of basic psychological ingredients that are more clearly respected by the brain than basic emotions. The objective of this study was the construction and initial validation of Emotion Knowledge measures from the CAT frame by means of an invariant measurement approach, the Rasch Model (RM). Psychological distance theory was used to inform item generation. Methods Three EK tests—emotion vocabulary (EV), close emotional situations (CES) and far emotional situations (FES)—were constructed and tested with the RM in a community sample of 100 females and 100 males (age range: 18–65), both separately and conjointly. Results It was corroborated that data-RM fit was sufficient. Then, the effect of type of test and emotion on Rasch-modelled item difficulty was tested. Significant effects of emotion on EK item difficulty were found, but the only statistically significant difference was that between “happiness” and the remaining emotions; neither type of test, nor interaction effects on EK item difficulty were statistically significant. The testing of gender differences was carried out after corroborating that differential item functioning (DIF) would not be a plausible alternative hypothesis for the results. No statistically significant sex-related differences were found out in EV, CES, FES, or total EK. However, the sign of d indicate that female participants were consistently better than male ones, a result that will be of interest for future meta-analyses. Discussion The three EK tests are ready to be used as components of a higher-level measurement process. PMID:28929013
The Accuracy of Estimated Total Test Statistics. Final Report.

ERIC Educational Resources Information Center

Kleinke, David J.

In a post-mortem study of item sampling, 1,050 examinees were divided into ten groups 50 times. Each time, their papers were scored on four different sets of item samples from a 150-item test of academic aptitude. These samples were selected using (a) unstratified random sampling and stratification on (b) content, (c) difficulty, and (d) both.…
Applying Item Response Theory to the Development of a Screening Adaptation of the Goldman-Fristoe Test of Articulation-Second Edition

ERIC Educational Resources Information Center

Brackenbury, Tim; Zickar, Michael J.; Munson, Benjamin; Storkel, Holly L.

2017-01-01

Purpose: Item response theory (IRT) is a psychometric approach to measurement that uses latent trait abilities (e.g., speech sound production skills) to model performance on individual items that vary by difficulty and discrimination. An IRT analysis was applied to preschoolers' productions of the words on the Goldman-Fristoe Test of…
Exploring Alternative Conceptions from Newtonian Dynamics and Simple DC Circuits: Links between Item Difficulty and Item Confidence

ERIC Educational Resources Information Center

Planinic, Maja; Boone, William J.; Krsnik, Rudolf; Beilfuss, Meredith L.

2006-01-01

Croatian 1st-year and 3rd-year high-school students (N = 170) completed a conceptual physics test. Students were evaluated with regard to two physics topics: Newtonian dynamics and simple DC circuits. Students answered test items and also indicated their confidence in each answer. Rasch analysis facilitated the calculation of three linear…
Comparison of university students' understanding of graphs in different contexts

NASA Astrophysics Data System (ADS)

Planinic, Maja; Ivanjek, Lana; Susac, Ana; Milin-Sipus, Zeljka

2013-12-01

This study investigates university students’ understanding of graphs in three different domains: mathematics, physics (kinematics), and contexts other than physics. Eight sets of parallel mathematics, physics, and other context questions about graphs were developed. A test consisting of these eight sets of questions (24 questions in all) was administered to 385 first year students at University of Zagreb who were either prospective physics or mathematics teachers or prospective physicists or mathematicians. Rasch analysis of data was conducted and linear measures for item difficulties were obtained. Average difficulties of items in three domains (mathematics, physics, and other contexts) and over two concepts (graph slope, area under the graph) were computed and compared. Analysis suggests that the variation of average difficulty among the three domains is much smaller for the concept of graph slope than for the concept of area under the graph. Most of the slope items are very close in difficulty, suggesting that students who have developed sufficient understanding of graph slope in mathematics are generally able to transfer it almost equally successfully to other contexts. A large difference was found between the difficulty of the concept of area under the graph in physics and other contexts on one side and mathematics on the other side. Comparison of average difficulty of the three domains suggests that mathematics without context is the easiest domain for students. Adding either physics or other context to mathematical items generally seems to increase item difficulty. No significant difference was found between the average item difficulty in physics and contexts other than physics, suggesting that physics (kinematics) remains a difficult context for most students despite the received instruction on kinematics in high school.
Comparing Methods for Item Analysis: The Impact of Different Item-Selection Statistics on Test Difficulty

ERIC Educational Resources Information Center

Jones, Andrew T.

2011-01-01

Practitioners often depend on item analysis to select items for exam forms and have a variety of options available to them. These include the point-biserial correlation, the agreement statistic, the B index, and the phi coefficient. Although research has demonstrated that these statistics can be useful for item selection, no research as of yet has…
Modeling the Severity of Drinking Consequences in First-Year College Women: An Item Response Theory Analysis of the Rutgers Alcohol Problem Index*

PubMed Central

Cohn, Amy M.; Hagman, Brett T.; Graff, Fiona S.; Noel, Nora E.

2011-01-01

Objective: The present study examined the latent continuum of alcohol-related negative consequences among first-year college women using methods from item response theory and classical test theory. Method: Participants (N = 315) were college women in their freshman year who reported consuming any alcohol in the past 90 days and who completed assessments of alcohol consumption and alcohol-related negative consequences using the Rutgers Alcohol Problem Index. Results: Item response theory analyses showed poor model fit for five items identified in the Rutgers Alcohol Problem Index. Two-parameter item response theory logistic models were applied to the remaining 18 items to examine estimates of item difficulty (i.e., severity) and discrimination parameters. The item difficulty parameters ranged from 0.591 to 2.031, and the discrimination parameters ranged from 0.321 to 2.371. Classical test theory analyses indicated that the omission of the five misfit items did not significantly alter the psychometric properties of the construct. Conclusions: Findings suggest that those consequences that had greater severity and discrimination parameters may be used as screening items to identify female problem drinkers at risk for an alcohol use disorder. PMID:22051212
Item Writer Judgments of Item Difficulty versus Actual Item Difficulty: A Case Study

ERIC Educational Resources Information Center

Sydorenko, Tetyana

2011-01-01

This study investigates how accurate one item writer can be on item difficulty estimates and whether factors affecting item writer judgments correspond to predictors of actual item difficulty. The items were based on conversational dialogs (presented as videos online) that focus on pragmatic functions. Thirty-five 2nd-, 3rd-, and 4th-year learners…
An Evaluation of Different Statistical Targets for Assembling Parallel Forms in Item Response Theory

PubMed Central

Ali, Usama S.; van Rijn, Peter W.

2015-01-01

Assembly of parallel forms is an important step in the test development process. Therefore, choosing a suitable theoretical framework to generate well-defined test specifications is critical. The performance of different statistical targets of test specifications using the test characteristic curve (TCC) and the test information function (TIF) was investigated. Test length, the number of test forms, and content specifications are considered as well. The TCC target results in forms that are parallel in difficulty, but not necessarily in terms of precision. Vice versa, test forms created using a TIF target are parallel in terms of precision, but not necessarily in terms of difficulty. As sometimes the focus is either on TIF or TCC, differences in either difficulty or precision can arise. Differences in difficulty can be mitigated by equating, but differences in precision cannot. In a series of simulations using a real item bank, the two-parameter logistic model, and mixed integer linear programming for automated test assembly, these differences were found to be quite substantial. When both TIF and TCC are combined into one target with manipulation to relative importance, these differences can be made to disappear.
Item Difficulty in the Evaluation of Computer-Based Instruction: An Example from Neuroanatomy

ERIC Educational Resources Information Center

Chariker, Julia H.; Naaz, Farah; Pani, John R.

2012-01-01

This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of…
The Impact of Escape Alternative Position Change in Multiple-Choice Test on the Psychometric Properties of a Test and Its Items Parameters

ERIC Educational Resources Information Center

Hamadneh, Iyad Mohammed

2015-01-01

This study aimed at investigating the impact changing of escape alternative position in multiple-choice test on the psychometric properties of a test and it's items parameters (difficulty, discrimination & guessing), and estimation of examinee ability. To achieve the study objectives, a 4-alternative multiple choice type achievement test…

The Handling of Missing Binary Data in Language Research

ERIC Educational Resources Information Center

Pichette, François; Béland, Sébastien; Jolani, Shahab; Lesniewska, Justyna

2015-01-01

Researchers are frequently confronted with unanswered questions or items on their questionnaires and tests, due to factors such as item difficulty, lack of testing time, or participant distraction. This paper first presents results from a poll confirming previous claims (Rietveld & van Hout, 2006; Schafer & Graham, 2002) that data…
Understanding Test-Takers' Perceptions of Difficulty in EAP Vocabulary Tests: The Role of Experiential Factors

ERIC Educational Resources Information Center

Oruç Ertürk, Nesrin; Mumford, Simon E.

2017-01-01

This study, conducted by two researchers who were also multiple-choice question (MCQ) test item writers at a private English-medium university in an English as a foreign language (EFL) context, was designed to shed light on the factors that influence test-takers' perceptions of difficulty in English for academic purposes (EAP) vocabulary, with the…
Estimating the Number of Examinees Who Did Not Reach the Last Item of a Section.

ERIC Educational Resources Information Center

Wainer, Howard

It is important to estimate the number of examinees who reached a test item, because item difficulty is defined by the number who answered correctly divided by the number who reached the item. A new method is presented and compared to the previously used definition of three categories of response to an item: (1) answered; (2) omitted--a…
Equating with Miditests Using IRT

ERIC Educational Resources Information Center

Fitzpatrick, Joseph; Skorupski, William P.

2016-01-01

The equating performance of two internal anchor test structures--miditests and minitests--is studied for four IRT equating methods using simulated data. Originally proposed by Sinharay and Holland, miditests are anchors that have the same mean difficulty as the overall test but less variance in item difficulties. Four popular IRT equating methods…
Simple mental addition in children with and without mild mental retardation.

PubMed

Janssen, R; De Boeck, P; Viaene, M; Vallaeys, L

1999-11-01

The speeded performance on simple mental addition problems of 6- and 7-year-old children with and without mild mental retardation is modeled from a person perspective and an item perspective. On the person side, it was found that a single cognitive dimension spanned the performance differences between the two ability groups. However, a discontinuity, or "jump," was observed in the performance of the normal ability group on the easier items. On the item side, the addition problems were almost perfectly ordered in difficulty according to their problem size. Differences in difficulty were explained by factors related to the difficulty of executing nonretrieval strategies. All findings were interpreted within the framework of Siegler's (e.g., R. S. Siegler & C. Shipley, 1995) model of children's strategy choices in arithmetic. Models from item response theory were used to test the hypotheses. Copyright 1999 Academic Press.
A New Clinical Pain Knowledge Test for Nurses: Development and Psychometric Evaluation.

PubMed

Bernhofer, Esther I; St Marie, Barbara; Bena, James F

2017-08-01

All nurses care for patients with pain, and pain management knowledge and attitude surveys for nurses have been around since 1987. However, no validated knowledge test exists to measure postlicensure clinicians' knowledge of the core competencies of pain management in current complex patient populations. To develop and test the psychometric properties of an instrument designed to measure pain management knowledge of postlicensure nurses. Psychometric instrument validation. Four large Midwestern U.S. hospitals. Registered nurses employed full time and part time August 2015 to April 2016, aged M = 43.25 years; time as RN, M = 16.13 years. Prospective survey design using e-mail to invite nurses to take an electronic multiple choice pain knowledge test. Content validity of initial 36-item test "very good" (95.1% agreement). Completed tests that met analysis criteria, N = 747. Mean initial test score, 69.4% correct (range 27.8-97.2). After revision/removal of 13 unacceptable questions, mean test score was 50.4% correct (range 8.7-82.6). Initial test item percent difficulty range was 15.2%-98.1%; discrimination values range, 0.03-0.50; final test item percent difficulty range, 17.6%-91.1%, discrimination values range, -0.04 to 1.04. Split-half reliability final test was 0.66. A high decision consistency reliability was identified, with test cut-score of 75%. The final 23-item Clinical Pain Knowledge Test has acceptable discrimination, difficulty, decision consistency, reliability, and validity in the general clinical inpatient nurse population. This instrument will be useful in assessing pain management knowledge of clinical nurses to determine gaps in education, evaluate knowledge after pain management education, and measure research outcomes. Copyright © 2017 American Society for Pain Management Nursing. Published by Elsevier Inc. All rights reserved.
Assessing the Life Science Knowledge of Students and Teachers Represented by the K–8 National Science Standards

PubMed Central

Sadler, Philip M.; Coyle, Harold; Smith, Nancy Cook; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John

2013-01-01

We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K–8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test takers hold either a misconception or an accepted scientific view. Tested nationally with 30,594 students, following their study of life science, and their 353 teachers, these items reveal a range of interesting results, particularly student difficulties in mastering the NRC standards. Teachers also answered test items and demonstrated a high level of subject matter knowledge reflecting the standards of the grade level at which they teach, but exhibiting few misconceptions of their own. In addition, teachers predicted the difficulty of each item for their students and which of the wrong answers would be the most popular. Teachers were found to generally overestimate their own students’ performance and to have a high level of awareness of the particular misconceptions that their students hold on the K–4 standards, but a low level of awareness of misconceptions related to the 5–8 standards. PMID:24006402
Assessing the life science knowledge of students and teachers represented by the K-8 national science standards.

PubMed

Sadler, Philip M; Coyle, Harold; Smith, Nancy Cook; Miller, Jaimie; Mintzes, Joel; Tanner, Kimberly; Murray, John

2013-01-01

We report on the development of an item test bank and associated instruments based on the National Research Council (NRC) K-8 life sciences content standards. Utilizing hundreds of studies in the science education research literature on student misconceptions, we constructed 476 unique multiple-choice items that measure the degree to which test takers hold either a misconception or an accepted scientific view. Tested nationally with 30,594 students, following their study of life science, and their 353 teachers, these items reveal a range of interesting results, particularly student difficulties in mastering the NRC standards. Teachers also answered test items and demonstrated a high level of subject matter knowledge reflecting the standards of the grade level at which they teach, but exhibiting few misconceptions of their own. In addition, teachers predicted the difficulty of each item for their students and which of the wrong answers would be the most popular. Teachers were found to generally overestimate their own students' performance and to have a high level of awareness of the particular misconceptions that their students hold on the K-4 standards, but a low level of awareness of misconceptions related to the 5-8 standards.
Working memory capacity and fluid abilities: the more difficult the item, the more more is better.

PubMed

Little, Daniel R; Lewandowsky, Stephan; Craig, Stewart

2014-01-01

The relationship between fluid intelligence and working memory is of fundamental importance to understanding how capacity-limited structures such as working memory interact with inference abilities to determine intelligent behavior. Recent evidence has suggested that the relationship between a fluid abilities test, Raven's Progressive Matrices, and working memory capacity (WMC) may be invariant across difficulty levels of the Raven's items. We show that this invariance can only be observed if the overall correlation between Raven's and WMC is low. Simulations of Raven's performance revealed that as the overall correlation between Raven's and WMC increases, the item-wise point bi-serial correlations involving WMC are no longer constant but increase considerably with item difficulty. The simulation results were confirmed by two studies that used a composite measure of WMC, which yielded a higher correlation between WMC and Raven's than reported in previous studies. As expected, with the higher overall correlation, there was a significant positive relationship between Raven's item difficulty and the extent of the item-wise correlation with WMC.
Cognitive Complexity in the Remote Association Test--Chinese Version

ERIC Educational Resources Information Center

Hung, Su-Pin; Huang, Po-Sheng; Chen, Hsueh-Chih

2016-01-01

The remote association test (RAT) has been applied in various fields; however, evidence of construct validity for the original version and subsequent extensions of the RAT remains limited. This study aimed to elucidate the dimensionality and the relationship between item features and item difficulties for the RAT--Chinese Version (RAT-C) using the…
Measuring Ability, Speed, or Both? Challenges, Psychometric Solutions, and What Can Be Gained from Experimental Control

ERIC Educational Resources Information Center

Goldhammer, Frank

2015-01-01

The main challenge of ability tests relates to the difficulty of items, whereas speed tests demand that test takers complete very easy items quickly. This article proposes a conceptual framework to represent how performance depends on both between-person differences in speed and ability and the speed-ability compromise within persons. Related…
Developing a situational judgment test blueprint for assessing the non-cognitive skills of applicants to the University of Utah School of Medicine, the United States

PubMed Central

2015-01-01

Purpose: The situational judgment test (SJT) shows promise for assessing the non-cognitive skills of medical school applicants, but has only been used in Europe. Since the admissions processes and education levels of applicants to medical school are different in the United States and in Europe, it is necessary to obtain validity evidence of the SJT based on a sample of United States applicants. Methods: Ninety SJT items were developed and Kane’s validity framework was used to create a test blueprint. A total of 489 applicants selected for assessment/interview day at the University of Utah School of Medicine during the 2014-2015 admissions cycle completed one of five SJTs, which assessed professionalism, coping with pressure, communication, patient focus, and teamwork. Item difficulty, each item’s discrimination index, internal consistency, and the categorization of items by two experts were used to create the test blueprint. Results: The majority of item scores were within an acceptable range of difficulty, as measured by the difficulty index (0.50-0.85) and had fair to good discrimination. However, internal consistency was low for each domain, and 63% of items appeared to assess multiple domains. The concordance of categorization between the two educational experts ranged from 24% to 76% across the five domains. Conclusion: The results of this study will help medical school admissions departments determine how to begin constructing a SJT. Further testing with a more representative sample is needed to determine if the SJT is a useful assessment tool for measuring the non-cognitive skills of medical school applicants. PMID:26582629
Determining Cloze Item Difficulty from Item and Passage Characteristics across Different Learner Backgrounds

ERIC Educational Resources Information Center

Trace, Jonathan; Brown, James Dean; Janssen, Gerriet; Kozhevnikova, Liudmila

2017-01-01

Cloze tests have been the subject of numerous studies regarding their function and use in both first language and second language contexts (e.g., Jonz & Oller, 1994; Watanabe & Koyama, 2008). From a validity standpoint, one area of investigation has been the extent to which cloze tests measure reading ability beyond the sentence level.…
Language Effects in International Testing: The Case of PISA 2006 Science Items

ERIC Educational Resources Information Center

El Masri, Yasmine H.; Baird, Jo-Anne; Graesser, Art

2016-01-01

We investigate the extent to which language versions (English, French and Arabic) of the same science test are comparable in terms of item difficulty and demands. We argue that language is an inextricable part of the scientific literacy construct, be it intended or not by the examiner. This argument has considerable implications on methodologies…
The Golden Rule Agreement is Psychometrically Defensible.

ERIC Educational Resources Information Center

Gonzalez-Tamayo, Eulogio

The agreement between the Educational Testing Service (ETS) and the Golden Rule Insurance Company of Illinois is interpreted as setting the general principles on which items must be selected to be included in a licensure test. These principles put a limit to the difficulty level of any item, and they also limit the size of the difference in…
[Perceptions on item disclosure for the Korean medical licensing examination].

PubMed

Yang, Eunbae B

2015-09-01

This study analyzed the perceptions of medical students and faculty regarding disclosure of test items on the Korean medical licensing examination. I conducted a survey of medical students from medical colleges and professional medical schools nationwide. Responses were analyzed from 718 participants as well as 69 faculty members who participated in creating the medical licensing examination item sets. Data were analyzed using descriptive statistics and the chi-square test. It is important to maintain test quality and to keep the test items unavailable to the public. There are also concerns among students that disclosure of test items would prompt increasing difficulty of test items (48.3%). Further, few students found it desirable to disclose test items regardless of any considerations (28.5%). The professors, who had experience in designing the test items, also expressed their opposition to test item disclosure (60.9%). It is desirable not to disclose the test items of the Korean medical licensing examination to the public on the condition that students are provided with a sufficient amount of information regarding the examination. This is so that the exam can appropriately identify candidates with the required qualifications.
Sensitivity of Equated Aggregate Scores to the Treatment of Misbehaving Common Items

ERIC Educational Resources Information Center

Michaelides, Michalis P.

2010-01-01

The delta-plot method (Angoff, 1972) is a graphical technique used in the context of test equating for identifying common items with aberrant changes in their item difficulties across administrations or alternate forms. This brief research report explores the effects on equated aggregate scores when delta-plot outliers are either retained in or…
Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

NASA Astrophysics Data System (ADS)

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-12-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
Conflict and metacognitive control: the mismatch-monitoring hypothesis of how others' knowledge states affect recall.

PubMed

Fraundorf, Scott H; Benjamin, Aaron S

2016-09-01

Information about others' success in remembering is frequently available. For example, students taking an exam may assess its difficulty by monitoring when others turn in their exams. In two experiments, we investigated how rememberers use this information to guide recall. Participants studied paired associates, some semantically related (and thus easier to retrieve) and some unrelated (and thus harder). During a subsequent cued recall test, participants viewed fictive information about an opponent's accuracy on each item. In Experiment 1, participants responded to each cue once before seeing the opponent's performance and once afterwards. Participants reconsidered their responses least often when the opponent's accuracy matched the item difficulty (easy items the opponent recalled, hard items the opponent forgot) and most often when the opponent's accuracy and the item difficulty mismatched. When participants responded only after seeing the opponent's performance (Experiment 2), the same mismatch conditions that led to reconsideration even produced superior recall. These results suggest that rememberers monitor whether others' knowledge states accord or conflict with their own experience, and that this information shifts how they interrogate their memory and what they recall.
Intervention for children with word-finding difficulties: a parallel group randomised control trial.

PubMed

Best, Wendy; Hughes, Lucy Mari; Masterson, Jackie; Thomas, Michael; Fedor, Anna; Roncoli, Silvia; Fern-Pollak, Liory; Shepherd, Donna-Lynn; Howard, David; Shobbrook, Kate; Kapikian, Anna

2017-07-31

The study investigated the outcome of a word-web intervention for children diagnosed with word-finding difficulties (WFDs). Twenty children age 6-8 years with WFDs confirmed by a discrepancy between comprehension and production on the Test of Word Finding-2, were randomly assigned to intervention (n = 11) and waiting control (n = 9) groups. The intervention group had six sessions of intervention which used word-webs and targeted children's meta-cognitive awareness and word-retrieval. On the treated experimental set (n = 25 items) the intervention group gained on average four times as many items as the waiting control group (d = 2.30). There were also gains on personally chosen items for the intervention group. There was little change on untreated items for either group. The study is the first randomised control trial to demonstrate an effect of word-finding therapy with children with language difficulties in mainstream school. The improvement in word-finding for treated items was obtained following a clinically realistic intervention in terms of approach, intensity and duration.

Identification of technical item flaws leads to improvement of the quality of single best Multiple Choice Questions.

PubMed

Fayyaz Khan, Humaira; Farooq Danish, Khalid; Saeed Awan, Azra; Anwar, Masood

2013-05-01

The purpose of the study was to identify technical item flaws in the multiple choice questions submitted for the final exams for the years 2009, 2010 and 2011. This descriptive analytical study was carried out in Islamic International Medical College (IIMC). The Data was collected from the MCQ's submitted by the faculty for the final exams for the year 2009, 2010 and 2011. The data was compiled and evaluated by a three member assessment committee. The data was analyzed for frequency and percentages the categorical data was analyzed by chi-square test. Overall percentage of flawed item was 67% for the year 2009 of which 21% were for testwiseness and 40% were for irrelevant difficulty. In year 2010 the total item flaws were 36% and 11% testwiseness and 22% were for irrelevant difficulty. The year 2011 data showed decreased overall flaws of 21%. The flaws of testwisness were 7%, irrelevant difficulty were 11%. Technical item flaws are frequently encountered during MCQ construction, and the identification of flaws leads to improved quality of the single best MCQ's.
An Empirical Bayes Approach to Item Banking. Project Psychometric Aspects of Item Banking No. 6. Research Report 86-6.

ERIC Educational Resources Information Center

van der Linden, Wim J.; Eggen, Theo J. H. M.

A procedure for the sequential optimization of the calibration of an item bank is given. The procedure is based on an empirical Bayes approach to a reformulation of the Rasch model as a model for paired comparisons between the difficulties of test items in which ties are allowed to occur. First, it is indicated how a paired-comparisons design…
Assessment of item-writing flaws in multiple-choice questions.

PubMed

Nedeau-Cayo, Rosemarie; Laughlin, Deborah; Rus, Linda; Hall, John

2013-01-01

This study evaluated the quality of multiple-choice questions used in a hospital's e-learning system. Constructing well-written questions is fraught with difficulty, and item-writing flaws are common. Study results revealed that most items contained flaws and were written at the knowledge/comprehension level. Few items had linked objectives, and no association was found between the presence of objectives and flaws. Recommendations include education for writing test questions.
Examining an Alternative to Score Equating: A Randomly Equivalent Forms Approach. Research Report. ETS RR-08-14

ERIC Educational Resources Information Center

Liao, Chi-Wen; Livingston, Samuel A.

2008-01-01

Randomly equivalent forms (REF) of tests in listening and reading for nonnative speakers of English were created by stratified random assignment of items to forms, stratifying on item content and predicted difficulty. The study included 50 replications of the procedure for each test. Each replication generated 2 REFs. The equivalence of those 2…
Belief-bias reasoning in non-clinical delusion-prone individuals.

PubMed

Anandakumar, T; Connaughton, E; Coltheart, M; Langdon, R

2017-03-01

It has been proposed that people with delusions have difficulty inhibiting beliefs (i.e., "doxastic inhibition") so as to reason about them as if they might not be true. We used a continuity approach to test this proposal in non-clinical adults scoring high and low in psychometrically assessed delusion-proneness. High delusion-prone individuals were expected to show greater difficulty than low delusion-prone individuals on "conflict" items of a "belief-bias" reasoning task (i.e. when required to reason logically about statements that conflicted with reality), but not on "non-conflict" items. Twenty high delusion-prone and twenty low delusion-prone participants (according to the Peters et al. Delusions Inventory) completed a belief-bias reasoning task and tests of IQ, working memory and general inhibition (Excluded Letter Fluency, Stroop and Hayling Sentence Completion). High delusion-prone individuals showed greater difficulty than low delusion-prone individuals on the Stroop and Excluded Letter Fluency tests of inhibition, but no greater difficulty on the conflict versus non-conflict items of the belief-bias task. They did, however, make significantly more errors overall on the belief-bias task, despite controlling for IQ, working memory and general inhibitory control. The study had a relatively small sample size and used non-clinical participants to test a theory of cognitive processing in individuals with clinically diagnosed delusions. Results failed to support a role for doxastic inhibitory failure in non-clinical delusion-prone individuals. These individuals did, however, show difficulty with conditional reasoning about statements that may or may not conflict with reality, independent of any general cognitive or inhibitory deficits. Copyright © 2016 Elsevier Ltd. All rights reserved.
Belief-bias reasoning in non-clinical delusion-prone individuals.

PubMed

Anandakumar, T; Connaughton, E; Coltheart, M; Langdon, R

2017-09-01

It has been proposed that people with delusions have difficulty inhibiting beliefs (i.e., "doxastic inhibition") so as to reason about them as if they might not be true. We used a continuity approach to test this proposal in non-clinical adults scoring high and low in psychometrically assessed delusion-proneness. High delusion-prone individuals were expected to show greater difficulty than low delusion-prone individuals on "conflict" items of a "belief-bias" reasoning task (i.e. when required to reason logically about statements that conflicted with reality), but not on "non-conflict" items. Twenty high delusion-prone and twenty low delusion-prone participants (according to the Peters et al. Delusions Inventory) completed a belief-bias reasoning task and tests of IQ, working memory and general inhibition (Excluded Letter Fluency, Stroop and Hayling Sentence Completion). High delusion-prone individuals showed greater difficulty than low delusion-prone individuals on the Stroop and Excluded Letter Fluency tests of inhibition, but no greater difficulty on the conflict versus non-conflict items of the belief-bias task. They did, however, make significantly more errors overall on the belief-bias task, despite controlling for IQ, working memory and general inhibitory control. The study had a relatively small sample size and used non-clinical participants to test a theory of cognitive processing in individuals with clinically diagnosed delusions. Results failed to support a role for doxastic inhibitory failure in non-clinical delusion-prone individuals. These individuals did, however, show difficulty with conditional reasoning about statements that may or may not conflict with reality, independent of any general cognitive or inhibitory deficits. Copyright © 2016 Elsevier Ltd. All rights reserved.
The promise and challenge of including multimedia items in medical licensure examinations: some insights from an empirical trial.

PubMed

Shen, Linjun; Li, Feiming; Wattleworth, Roberta; Filipetto, Frank

2010-10-01

The Comprehensive Osteopathic Medical Licensing Examination conducted a trial of multimedia items in the 2008-2009 Level 3 testing cycle to determine (1) if multimedia items were able to test additional elements of medical knowledge and skills and (2) how to develop effective multimedia items. Forty-four content-matched multimedia and text multiple-choice items were randomly delivered to Level 3 candidates. Logistic regression and paired-samples t tests were used for pairwise and group-level comparisons, respectively. Nine pairs showed significant differences in either difficulty or/and discrimination. Content analysis found that, if text narrations were less direct, multimedia materials could make items easier. When textbook terminologies were replaced by multimedia presentations, multimedia items could become more difficult. Moreover, a multimedia item was found not uniformly difficult for candidates at different ability levels, possibly because multimedia and text items tested different elements of a same concept. Multimedia items may be capable of measuring some constructs different from what text items can measure. Effective multimedia items with reasonable psychometric properties can be intentionally developed.
Exploring Item Characteristics That Are Related to the Difficulty of TOEFL Dialogue Items. Research Reports. RR-79. RR-04-11

ERIC Educational Resources Information Center

Kostin, Irene

2004-01-01

The purpose of this study is to explore the relationship between a set of item characteristics and the difficulty of TOEFL[R] dialogue items. Identifying characteristics that are related to item difficulty has the potential to improve the efficiency of the item-writing process The study employed 365 TOEFL dialogue items, which were coded on 49…
The Effect of Sequential Dependence on the Sampling Distributions of KR-20, KR-21, and Split-Halves Reliabilities.

ERIC Educational Resources Information Center

Sullins, Walter L.

Five-hundred dichotomously scored response patterns were generated with sequentially independent (SI) items and 500 with dependent (SD) items for each of thirty-six combinations of sampling parameters (i.e., three test lengths, three sample sizes, and four item difficulty distributions). KR-20, KR-21, and Split-Half (S-H) reliabilities were…
Can Item Analysis of MCQs Accomplish the Need of a Proper Assessment Strategy for Curriculum Improvement in Medical Education?

ERIC Educational Resources Information Center

Pawade, Yogesh R.; Diwase, Dipti S.

2016-01-01

Item analysis of Multiple Choice Questions (MCQs) is the process of collecting, summarizing and utilizing information from students' responses to evaluate the quality of test items. Difficulty Index (p-value), Discrimination Index (DI) and Distractor Efficiency (DE) are the parameters which help to evaluate the quality of MCQs used in an…
Modelling Question Difficulty in an A Level Physics Examination

ERIC Educational Resources Information Center

Crisp, Victoria; Grayson, Rebecca

2013-01-01

"Item difficulty modelling" is a technique used for a number of purposes such as to support future item development, to explore validity in relation to the constructs that influence difficulty and to predict the difficulty of items. This research attempted to explore the factors influencing question difficulty in a general qualification…
TEDS-M 2008 User Guide for the International Database. Supplement 4: TEDS-M Released Mathematics and Mathematics Pedagogy Knowledge Assessment Items

ERIC Educational Resources Information Center

Brese, Falk, Ed.

2012-01-01

The goal for selecting the released set of test items was to have approximately 25% of each of the full item sets for mathematics content knowledge (MCK) and mathematics pedagogical content knowledge (MPCK) that would represent the full range of difficulty, content, and item format used in the TEDS-M study. The initial step in the selection was to…
Ten Issues in Criterion-Referenced Testing: A Response to Commonly Heard Criticisms.

ERIC Educational Resources Information Center

Curlette, William L.; Stallings, William M.

1979-01-01

The 10 criticisms of criterion-referenced tests addressed in this paper are: the domains tested; pedagogical influence; difficulty of items; cumbersome reports; reliability; arbitrary criteria; local objectives; labeling; predictive validity; and repeated testing. (SJL)
A Classical Test Theory Analysis of the Light and Spectroscopy Concept Inventory National Study Data Set

ERIC Educational Resources Information Center

Schlingman, Wayne M.; Prather, Edward E.; Wallace, Colin S.; Brissenden, Gina; Rudolph, Alexander L.

2012-01-01

This paper is the first in a series of investigations into the data from the recent national study using the Light and Spectroscopy Concept Inventory (LSCI). In this paper, we use classical test theory to form a framework of results that will be used to evaluate individual item difficulties, item discriminations, and the overall reliability of the…
Evaluation of the Fecal Incontinence Quality of Life Scale (FIQL) using item response theory reveals limitations and suggests revisions.

PubMed

Peterson, Alexander C; Sutherland, Jason M; Liu, Guiping; Crump, R Trafford; Karimuddin, Ahmer A

2018-06-01

The Fecal Incontinence Quality of Life Scale (FIQL) is a commonly used patient-reported outcome measure for fecal incontinence, often used in clinical trials, yet has not been validated in English since its initial development. This study uses modern methods to thoroughly evaluate the psychometric characteristics of the FIQL and its potential for differential functioning by gender. This study analyzed prospectively collected patient-reported outcome data from a sample of patients prior to colorectal surgery. Patients were recruited from 14 general and colorectal surgeons in Vancouver Coastal Health hospitals in Vancouver, Canada. Confirmatory factor analysis was used to assess construct validity. Item response theory was used to evaluate test reliability, describe item-level characteristics, identify local item dependence, and test for differential functioning by gender. 236 patients were included for analysis, with mean age 58 and approximately half female. Factor analysis failed to identify the lifestyle, coping, depression, and embarrassment domains, suggesting lack of construct validity. Items demonstrated low difficulty, indicating that the test has the highest reliability among individuals who have low quality of life. Five items are suggested for removal or replacement. Differential test functioning was minimal. This study has identified specific improvements that can be made to each domain of the Fecal Incontinence Quality of Life Scale and to the instrument overall. Formatting, scoring, and instructions may be simplified, and items with higher difficulty developed. The lifestyle domain can be used as is. The embarrassment domain should be significantly revised before use.
Three controversies over item disclosure in medical licensure examinations.

PubMed

Park, Yoon Soo; Yang, Eunbae B

2015-01-01

In response to views on public's right to know, there is growing attention to item disclosure - release of items, answer keys, and performance data to the public - in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations - 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure - by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers' right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration.
Conflict and metacognitive control: The mismatch-monitoring hypothesis of how others’ knowledge states affect recall

PubMed Central

Fraundorf, Scott H.; Benjamin, Aaron S.

2015-01-01

Information about others’ success in remembering is frequently available. For example, students taking an exam may assess its difficulty by monitoring when others turn in their exams. In two experiments, we investigated how rememberers use this information to guide recall. Participants studied paired associates, some semantically related (and thus easier to retrieve) and some unrelated (and thus harder). During a subsequent cued recall test, participants viewed fictive information about an opponent’s accuracy on each item. In Experiment 1, participants responded to each cue once before seeing the opponent’s performance and once afterwards. Participants reconsidered their responses least often when the opponent’s accuracy matched the item difficulty (easy items the opponent recalled, hard items the opponent forgot) and most often when the opponent’s accuracy and the item difficulty mismatched. When participants responded only after seeing the opponent’s performance (Experiment 2), the same mismatch conditions that led to reconsideration even produced superior recall. These results suggest that rememberers monitor whether others’ knowledge states accord or conflict with their own experience, and that this information shifts how they interrogate their memory and what they recall. PMID:26247369
Cancer Health Literacy Test-30-Spanish (CHLT-30-DKspa), a New Spanish-Language Version of the Cancer Health Literacy Test (CHLT-30) for Spanish-Speaking Latinos.

PubMed

Echeverri, Margarita; Anderson, David; Nápoles, Anna María

2016-01-01

This article describes the adaptation and initial validation of the Cancer Health Literacy Test (CHLT) for Spanish speakers. A cross-sectional field test of the Spanish version of the CHLT (CHLT-30-DKspa) was conducted among healthy Latinos in Louisiana. Diagonally weighted least squares was used to confirm the factor structure. Item response analysis using 2-parameter logistic estimates was used to identify questions that may require modification to avoid bias. Cronbach's alpha coefficients estimated scale internal consistency reliability. Analysis of variance was used to test for significant differences in CHLT-30-DKspa scores by gender, origin, age and education. The mean CHLT-30-DKspa score (N = 400) was 17.13 (range = 0-30, SD = 6.65). Results confirmed a unidimensional structure, χ(2)(405) = 461.55, p = .027, comparative fit index = .993, Tucker-Lewis index = .992, root mean square error of approximation = .0180. Cronbach's alpha was .88. Items Q1-High Calorie and Q15-Tumor Spread had the lowest item-scale correlations (.148 and .288, respectively) and standardized factor loadings (.152 and .302, respectively). Items Q19-Smoking Risk, Q8-Palliative Care, and Q1-High Calorie had the highest item difficulty parameters (difficulty = 1.12, 1.21, and 2.40, respectively). Results generally support the applicability of the CHLT-30-DKspa for healthy Spanish-speaking populations, with the exception of 4 items that need to be deleted or revised and further studied: Q1, Q8, Q15, and Q19.
Development and preliminary evaluation of a music-based attention assessment for patients with traumatic brain injury.

PubMed

Jeong, Eunju; Lesiuk, Teresa L

2011-01-01

Impairments in attention are commonly seen in individuals with traumatic brain injury (TBI). While visual attention assessment measurements have been rigorously developed and frequently used in cognitive neurorehabilitation, there is a paucity of auditory attention assessment measurements for patients with TBI. The purpose of this study was to field test a researcher-developed Music-based Attention Assessment (MAA), a melodic contour identification test designed to assess three different types of attention (i.e., sustained attention, selective attention, and divided attention), for patients with TBI. Additionally, this study aimed to evaluate the readability and comprehensibility of the test items and to examine the preliminary psychometric properties of the scale and test items. Fifteen patients diagnosed with TBI completed 3 different series of tasks in which they were required to identify melodic contours. The resulting data showed that (a) test items in each of the 3 subtests were found to have an easy to moderate level of item difficulty and an acceptable to high level of item discrimination, and (b) the musical characteristics (i.e., contour, congruence, and pitch interference) were found to be associated with the level of item difficulty, and (c) the internal consistency of the MAA as computed by Cronbach's alpha was .95. Subsequent studies using a larger sample of typical participants, along with individuals with TBI, are needed to confirm construct validity and internal consistency of the MAA. In addition, the authors recommend examination of criterion validity of the MAA as correlated with current neuropsychological attention assessment measurements.
A Test of the Similar Sequence Hypothesis.

ERIC Educational Resources Information Center

Silverstein, A. B.; And Others

1982-01-01

Scales for object permanence and spatial relationships were administered to 98 severely and profoundly mentally retarded children (mean age 13 years) on three occasions, 6 months apart. Differences in the difficulty of the items were quite stable, but their order of difficulty differed appreciably from that for nonretarded infants. (Author/SB)

Rasch Analysis of the Power as Knowing Participation in Change Tool--the Brazilian version.

PubMed

Guedes, Erika de Souza; Orozco-Vargas, Luiz Carlos; Turrini, Ruth Natália Teresa; de Sousa, Regina Márcia Cardoso; dos Santos, Mariana Alvina; da Cruz, Diná de Almeida Lopes Monteiro

2013-01-01

the objective of this study was to evaluate the items contained in the Brazilian version of the Power as Knowing Participation in Change Tool (PKPCT). investigation of the psychometric properties of the mentioned questionnaire through Rasch analysis. the data from 952 nursing assistants and 627 baccalaureate nurses were analyzed (average age 44.1 (SD=9.5); 13.0% men). The subscales Choices, Awareness, Freedom and Involvement were tested separately and presented unidimensionality; the categories of the responses given to the items were compiled from 7 to 3 levels and the items fit the model well, except for the following/leading item, in which the infit and outfit values were above 1.4; this item has also presented Differential Item Functioning (DIF) according to the participant's role. The reliability of the items was of 0.99 and the reliability of the participants ranged from 0.80 to 0.84 in the subscales. Items with extremely high levels of difficulty were not identified. the PKPCT should not be viewed as unidimensional, items with extremely high levels of difficulty in the scale need to be created and the differential functioning of some items has to be further investigated.
Students’ understanding of forces: Force diagrams on horizontal and inclined plane

NASA Astrophysics Data System (ADS)

Sirait, J.; Hamdani; Mursyid, S.

2018-03-01

This study aims to analyse students’ difficulties in understanding force diagrams on horizontal surfaces and inclined planes. Physics education students (pre-service physics teachers) of Tanjungpura University, who had completed a Basic Physics course, took a Force concept test which has six questions covering three concepts: an object at rest, an object moving at constant speed, and an object moving at constant acceleration both on a horizontal surface and on an inclined plane. The test is in a multiple-choice format. It examines the ability of students to select appropriate force diagrams depending on the context. The results show that 44% of students have difficulties in solving the test (these students only could solve one or two items out of six items). About 50% of students faced difficulties finding the correct diagram of an object when it has constant speed and acceleration in both contexts. In general, students could only correctly identify 48% of the force diagrams on the test. The most difficult task for the students in terms was identifying the force diagram representing forces exerted on an object on in an inclined plane.
Increased susceptibility to proactive interference in adults with dyslexia?

PubMed

Bogaerts, Louisa; Szmalec, Arnaud; Hachmann, Wibke M; Page, Mike P A; Woumans, Evy; Duyck, Wouter

2015-01-01

Recent findings show that people with dyslexia have an impairment in serial-order memory. Based on these findings, the present study aimed to test the hypothesis that people with dyslexia have difficulties dealing with proactive interference (PI) in recognition memory. A group of 25 adults with dyslexia and a group of matched controls were subjected to a 2-back recognition task, which required participants to indicate whether an item (mis)matched the item that had been presented 2 trials before. PI was elicited using lure trials in which the item matched the item in the 3-back position instead of the targeted 2-back position. Our results demonstrate that the introduction of lure trials affected 2-back recognition performance more severely in the dyslexic group than in the control group, suggesting greater difficulty in resisting PI in dyslexia.
Reading Ability and Print Exposure: Item Response Theory Analysis of the Author Recognition Test

PubMed Central

Moore, Mariah; Gordon, Peter C.

2015-01-01

In the Author Recognition Test (ART) participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, with this predictive ability generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. This large-scale study (1012 college student participants) used Item Response Theory (IRT) to analyze item (author) characteristics to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and to optimize scoring of the ART. Factor analysis suggests a potential two factor structure of the ART differentiating between literary vs. popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of time spent encoding words as measured using eye-tracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Further, they show that frequency data can be used to select items of appropriate difficulty and that frequency data from corpora based on particular time periods and types of text may allow test adaptation for different populations. PMID:25410405
Reading ability and print exposure: item response theory analysis of the author recognition test.

PubMed

Moore, Mariah; Gordon, Peter C

2015-12-01

In the author recognition test (ART), participants are presented with a series of names and foils and are asked to indicate which ones they recognize as authors. The test is a strong predictor of reading skill, and this predictive ability is generally explained as occurring because author knowledge is likely acquired through reading or other forms of print exposure. In this large-scale study (1,012 college student participants), we used item response theory (IRT) to analyze item (author) characteristics in order to facilitate identification of the determinants of item difficulty, provide a basis for further test development, and optimize scoring of the ART. Factor analysis suggested a potential two-factor structure of the ART, differentiating between literary and popular authors. Effective and ineffective author names were identified so as to facilitate future revisions of the ART. Analyses showed that the ART is a highly significant predictor of the time spent encoding words, as measured using eyetracking during reading. The relationship between the ART and time spent reading provided a basis for implementing a higher penalty for selecting foils, rather than the standard method of ART scoring (names selected minus foils selected). The findings provide novel support for the view that the ART is a valid indicator of reading volume. Furthermore, they show that frequency data can be used to select items of appropriate difficulty, and that frequency data from corpora based on particular time periods and types of texts may allow adaptations of the test for different populations.
Item response theory analysis of the mechanics baseline test

NASA Astrophysics Data System (ADS)

Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.

2012-02-01

Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.
Comparison of Alternate and Original Items on the Montreal Cognitive Assessment.

PubMed

Lebedeva, Elena; Huang, Mei; Koski, Lisa

2016-03-01

The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. None of the five items from the alternate versions matched the difficulty level of their corresponding original items. This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time.
Item Difficulty in the Evaluation of Computer-Based Instruction: An Example from Neuroanatomy

PubMed Central

Chariker, Julia H.; Naaz, Farah; Pani, John R.

2012-01-01

This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of comparisons between instructional methods changed with the difficulty of the items to be learned. More challenging items better differentiated between instructional methods. This set of results is important for two reasons. First, it suggests that instruction may be more efficient if sets of consistently difficult items are the targets of instructional methods particularly suited to them. Second, there is wide variation in the published literature regarding the outcomes of empirical evaluations of computer-based instruction. As a consequence, many questions arise as to the factors that may affect such evaluations. The present paper demonstrates that the level of challenge in the material that is presented to learners is an important factor to consider in the evaluation of a computer-based instructional system. PMID:22231801
Item difficulty in the evaluation of computer-based instruction: an example from neuroanatomy.

PubMed

Chariker, Julia H; Naaz, Farah; Pani, John R

2012-01-01

This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of comparisons between instructional methods changed with the difficulty of the items to be learned. More challenging items better differentiated between instructional methods. This set of results is important for two reasons. First, it suggests that instruction may be more efficient if sets of consistently difficult items are the targets of instructional methods particularly suited to them. Second, there is wide variation in the published literature regarding the outcomes of empirical evaluations of computer-based instruction. As a consequence, many questions arise as to the factors that may affect such evaluations. The present article demonstrates that the level of challenge in the material that is presented to learners is an important factor to consider in the evaluation of a computer-based instructional system. Copyright © 2011 American Association of Anatomists.
Examination of the item structure of the Alberta infant motor scale.

PubMed

Liao, Pai-Jun M; Campbell, Suzann K

2004-01-01

The Alberta Infant Motor Scale (AIMS) is a screening tool for identifying delayed motor development from birth to 18 months of age. The purpose of this study was to examine the psychometric structure of the AIMS, including the hierarchical scale of items and the precision for measuring infant ability at different ages. Ninety-seven infants with varying degrees of risk of developmental disability were recruited from three hospitals or from the community in the Chicago metropolitan area. Infants were tested on the AIMS at three, six, nine, and 12 months of age. The hierarchical structure and the range and distribution of item difficulty on the AIMS were analyzed using Rasch psychometric analysis. The Rasch analysis confirmed that items for each of the four testing positions (supine, prone, sitting, and standing) were arranged in increasing order of difficulty, but a ceiling effect was present. Gaps exist at six ability levels, indicating low precision of measurement for differentiating among infants after about nine months of age. The AIMS shows a ceiling effect, measures infant ability best from three to nine months of age, and has few items available for discriminating among infants after they pass the controlled lowering through standing item. Clinical impressions should be drawn with caution at ages when the precision of measurement is low.
Probing University Students' Pre-Knowledge in Quantum Physics with QPCS Survey

ERIC Educational Resources Information Center

Asikainen, Mervi A.

2017-01-01

The study investigated the use of Quantum Physics Conceptual Survey (QPCS) in probing student understanding of quantum physics. Altogether 103 Finnish university students responded to QPCS. The mean scores of the student responses were calculated and the test was evaluated using common five indices: Item difficulty index, Item discrimination…
Identifying Predictors of Physics Item Difficulty: A Linear Regression Approach

ERIC Educational Resources Information Center

Mesic, Vanes; Muratovic, Hasnija

2011-01-01

Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary…
Developing and Evaluating a Machine-Scorable, Constrained Constructed-Response Item.

ERIC Educational Resources Information Center

Braun, Henry I.; And Others

The use of constructed response items in large scale standardized testing has been hampered by the costs and difficulties associated with obtaining reliable scores. The advent of expert systems may signal the eventual removal of this impediment. This study investigated the accuracy with which expert systems could score a new, non-multiple choice…
An Application of the Rasch Model.

ERIC Educational Resources Information Center

Veitch, William R.

The one parameter latent trait theory of Georg Rasch has two assumptions: that student abilities can be measured on an equal interval scale, and that the success of a student with a given item is a function of student achievement and item difficulty. The grade four Michigan Educational Assessment Program reading test was designed to measure…
The Effect of Visual-Chunking-Representation Accommodation on Geometry Testing for Students with Math Disabilities

ERIC Educational Resources Information Center

Zhang, Dake; Ding, Yi; Stegall, Joanna; Mo, Lei

2012-01-01

Students who struggle with learning mathematics often have difficulties with geometry problem solving, which requires strong visual imagery skills. These difficulties have been correlated with deficiencies in visual working memory. Cognitive psychology has shown that chunking of visual items accommodates students' working memory deficits. This…
Development of Thermodynamic Conceptual Evaluation

NASA Astrophysics Data System (ADS)

Talaeb, P.; Wattanakasiwich, P.

2010-07-01

This research aims to develop a test for assessing student understanding of fundamental principles in thermodynamics. Misconceptions found from previous physics education research were used to develop the test. Its topics include heat and temperature, the zeroth and the first law of thermodynamics, and the thermodynamics processes. The content validity was analyzed by three physics experts. Then the test was administered to freshmen, sophomores and juniors majored in physics in order to determine item difficulties and item discrimination of the test. A few items were eliminated from the test. Finally, the test will be administered to students taking Physics I course in order to evaluate the effectiveness of Interactive Lecture Demonstrations that will be used for the first time at Chiang Mai University.
Development of multiple choice pictorial test for measuring the dimensions of knowledge

NASA Astrophysics Data System (ADS)

Nahadi, Siswaningsih, Wiwi; Erna

2017-05-01

This study aims to develop a multiple choice pictorial test as a tool to measure dimension of knowledge in chemical equilibrium subject. The method used is Research and Development and validation that was conducted in the preliminary studies and model development. The product is multiple choice pictorial test. The test was developed by 22 items and tested to 64 high school students in XII grade. The quality of test was determined by value of validity, reliability, difficulty index, discrimination power, and distractor effectiveness. The validity of test was determined by CVR calculation using 8 validators (4 university teachers and 4 high school teachers) with average CVR value 0,89. The reliability of test has very high category with value 0,87. Discrimination power of items with a very good category is 32%, 59% as good category, and 20% as sufficient category. This test has a varying level of difficulty, item with difficult category is 23%, the medium category is 50%, and the easy category is 27%. The distractor effectiveness of items with a very poor category is 1%, poor category is 1%, medium category is 4%, good category is 39%, and very good category is 55%. The dimension of knowledge that was measured consist of factual knowledge, conceptual knowledge, and procedural knowledge. Based on the questionnaire, students responded quite well to the developed test and most of the students like this kind of multiple choice pictorial test that include picture as evaluation tool compared to the naration tests was dominated by text.
Cognitive testing of tobacco use items for administration to patients with cancer and cancer survivors in clinical research.

PubMed

Land, Stephanie R; Warren, Graham W; Crafts, Jennifer L; Hatsukami, Dorothy K; Ostroff, Jamie S; Willis, Gordon B; Chollette, Veronica Y; Mitchell, Sandra A; Folz, Jasmine N M; Gulley, James L; Szabo, Eva; Brandon, Thomas H; Duffy, Sonia A; Toll, Benjamin A

2016-06-01

To the authors' knowledge, there are currently no standardized measures of tobacco use and secondhand smoke exposure in patients diagnosed with cancer, and this gap hinders the conduct of studies examining the impact of tobacco on cancer treatment outcomes. The objective of the current study was to evaluate and refine questionnaire items proposed by an expert task force to assess tobacco use. Trained interviewers conducted cognitive testing with cancer patients aged ≥21 years with a history of tobacco use and a cancer diagnosis of any stage and organ site who were recruited at the National Institutes of Health Clinical Center in Bethesda, Maryland. Iterative rounds of testing and item modification were conducted to identify and resolve cognitive issues (comprehension, memory retrieval, decision/judgment, and response mapping) and instrument navigation issues until no items warranted further significant modification. Thirty participants (6 current cigarette smokers, 1 current cigar smoker, and 23 former cigarette smokers) were enrolled from September 2014 to February 2015. The majority of items functioned well. However, qualitative testing identified wording ambiguities related to cancer diagnosis and treatment trajectory, such as "treatment" and "surgery"; difficulties with lifetime recall; errors in estimating quantities; and difficulties with instrument navigation. Revisions to item wording, format, order, response options, and instructions resulted in a questionnaire that demonstrated navigational ease as well as good question comprehension and response accuracy. The Cancer Patient Tobacco Use Questionnaire (C-TUQ) can be used as a standardized item set to accelerate the investigation of tobacco use in the cancer setting. Cancer 2016;122:1728-34. © 2016 American Cancer Society. © 2016 American Cancer Society.
Three controversies over item disclosure in medical licensure examinations

PubMed Central

Park, Yoon Soo; Yang, Eunbae B.

2015-01-01

In response to views on public's right to know, there is growing attention to item disclosure – release of items, answer keys, and performance data to the public – in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations – 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure – by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers’ right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration. PMID:26374693
Spatial short-term memory in children with nonverbal learning disabilities: impairment in encoding spatial configuration.

PubMed

Narimoto, Tadamasa; Matsuura, Naomi; Takezawa, Tomohiro; Mitsuhashi, Yoshinori; Hiratani, Michio

2013-01-01

The authors investigated whether impaired spatial short-term memory exhibited by children with nonverbal learning disabilities is due to a problem in the encoding process. Children with or without nonverbal learning disabilities performed a simple spatial test that required them to remember 3, 5, or 7 spatial items presented simultaneously in random positions (i.e., spatial configuration) and to decide if a target item was changed or all items including the target were in the same position. The results showed that, even when the spatial positions in the encoding and probe phases were similar, the mean proportion correct of children with nonverbal learning disabilities was 0.58 while that of children without nonverbal learning disabilities was 0.84. The authors argue with the results that children with nonverbal learning disabilities have difficulty encoding relational information between spatial items, and that this difficulty is responsible for their impaired spatial short-term memory.

Comparison of Alternate and Original Items on the Montreal Cognitive Assessment

PubMed Central

Lebedeva, Elena; Huang, Mei; Koski, Lisa

2016-01-01

Background The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Methods Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. Results None of the five items from the alternate versions matched the difficulty level of their corresponding original items. Conclusions This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time. PMID:27076861
[Simple and useful evaluation of motor difficulty in childhood (9-12 years old children ) by interview score on motor skills and soft neurological signs--aim for the diagnosis of developmental coordination disorder].

PubMed

Kashiwagi, Mitsuru; Suzuki, Shuhei

2009-09-01

Many children with developmental disorders are known to have motor impairment such as clumsiness and poor physical ability;however, the objective evaluation of such difficulties is not easy in routine clinical practice. In this study, we aimed to establish a simple method for evaluating motor difficulty of childhood. This method employs a scored interview and examination for detecting soft neurological signs (SNSs). After a preliminary survey with 22 normal children, we set the items and the cutoffs for the interview and SNSs. The interview consisted of questions pertaining to 12 items related to a child's motor skills in his/her past and current life, such as skipping, jumping a rope, ball sports, origami, and using chopsticks. The SNS evaluation included 5 tests, namely, standing on one leg with eyes closed, diadochokinesia, associated movements during diadochokinesia, finger opposition test, and laterally fixed gaze. We applied this method to 43 children, including 25 cases of developmental disorders. Children showing significantly high scores in both the interview and SNS were assigned to the "with motor difficulty" group, while those with low scores in both the tests were assigned to the "without motor difficulty" group. The remaining children were assigned to the "with suspicious motor difficulty" group. More than 90% of the children in the "with motor difficulty" group had high impairment scores in Movement Assessment Battery for Children (M-ABC), a standardized motor test, whereas 82% of the children in the "without motor difficulty" group revealed no motor impairment. Thus, we conclude that our simple method and criteria would be useful for the evaluation of motor difficulty of childhood. Further, we have discussed the diagnostic process for developmental coordination disorder using our evaluation method.
Computerized Adaptive Testing: An Overview and an Example.

ERIC Educational Resources Information Center

McBride, James R.

The advantages of computerized adaptive testing are discussed, and an example illustrates its use in sixth grade mathematics. These tests are administered at a computer terminal, and the test items to be administered are selected according to the difficulty level appropriate to the individual's ability. Tailoring increases the psychometric…
Evaluation of five guidelines for option development in multiple-choice item-writing.

PubMed

Martínez, Rafael J; Moreno, Rafael; Martín, Irene; Trigo, M Eva

2009-05-01

This paper evaluates certain guidelines for writing multiple-choice test items. The analysis of the responses of 5013 subjects to 630 items from 21 university classroom achievement tests suggests that an option should not differ in terms of heterogeneous content because such error has a slight but harmful effect on item discrimination. This also occurs with the "None of the above" option when it is the correct one. In contrast, results do not show the supposedly negative effects of a different-length option, the use of specific determiners, or the use of the "All of the above" option, which not only decreases difficulty but also improves discrimination when it is the correct option.
Pick-N Multiple Choice-Exams: A Comparison of Scoring Algorithms

ERIC Educational Resources Information Center

Bauer, Daniel; Holzer, Matthias; Kopp, Veronika; Fischer, Martin R.

2011-01-01

To compare different scoring algorithms for Pick-N multiple correct answer multiple-choice (MC) exams regarding test reliability, student performance, total item discrimination and item difficulty. Data from six 3rd year medical students' end of term exams in internal medicine from 2005 to 2008 at Munich University were analysed (1,255 students,…
A Mixture Rasch Model with a Covariate: A Simulation Study via Bayesian Markov Chain Monte Carlo Estimation

ERIC Educational Resources Information Center

Dai, Yunyun

2013-01-01

Mixtures of item response theory (IRT) models have been proposed as a technique to explore response patterns in test data related to cognitive strategies, instructional sensitivity, and differential item functioning (DIF). Estimation proves challenging due to difficulties in identification and questions of effect size needed to recover underlying…
Solving Graphics Problems: Student Performance in Junior Grades

ERIC Educational Resources Information Center

Lowrie, Tom; Diezmann, Carmel M.

2007-01-01

The authors investigated the performance of 172 Grade 4 students (9 to 10 years) over 12 months on a 36-item test that comprised items from 6 distinct graphical languages (e.g., maps) commonly used to convey mathematical information. Results revealed (a) difficulties in Grade 4 students' capacity to decode a variety of graphics, (b) significant…
Stereotype Threat in Classroom Settings: The Interactive Effect of Domain Identification, Task Difficulty and Stereotype Threat on Female Students' Maths Performance

ERIC Educational Resources Information Center

Keller, Johannes

2007-01-01

Background: Stereotype threat research revealed that negative stereotypes can disrupt the performance of persons targeted by such stereotypes. This paper contributes to stereotype threat research by providing evidence that domain identification and the difficulty level of test items moderate stereotype threat effects on female students' maths…
Translation and validation of the Malay version of the Stroke Knowledge Test.

PubMed

Sowtali, Siti Noorkhairina; Yusoff, Dariah Mohd; Harith, Sakinah; Mohamed, Monniaty

2016-04-01

To date, there is a lack of published studies on assessment tools to evaluate the effectiveness of stroke education programs. This study developed and validated the Malay language version of the Stroke Knowledge Test research instrument. This study involved translation, validity, and reliability phases. The instrument underwent backward and forward translation of the English version into the Malay language. Nine experts reviewed the content for consistency, clarity, difficulty, and suitability for inclusion. Perceived usefulness and utilization were obtained from experts' opinions. Later, face validity assessment was conducted with 10 stroke patients to determine appropriateness of sentences and grammar used. A pilot study was conducted with 41 stroke patients to determine the item analysis and reliability of the translated instrument using the Kuder Richardson 20 or Cronbach's alpha. The final Malay version Stroke Knowledge Test included 20 items with good content coverage, acceptable item properties, and positive expert review ratings. Psychometric investigations suggest that Malay version Stroke Knowledge Test had moderate reliability with Kuder Richardson 20 or Cronbach's alpha of 0.58. Improvement is required for Stroke Knowledge Test items with unacceptable difficulty indices. Overall, the average rating of perceived usefulness and perceived utility of the instruments were both 72.7%, suggesting that reviewers were likely to use the instruments in their facilities. Malay version Stroke Knowledge Test was a valid and reliable tool to assess educational needs and to evaluate stroke knowledge among participants of group-based stroke education programs in Malaysia.
Measuring change for a multidimensional test using a generalized explanatory longitudinal item response model.

PubMed

Cho, Sun-Joo; Athay, Michele; Preacher, Kristopher J

2013-05-01

Even though many educational and psychological tests are known to be multidimensional, little research has been done to address how to measure individual differences in change within an item response theory framework. In this paper, we suggest a generalized explanatory longitudinal item response model to measure individual differences in change. New longitudinal models for multidimensional tests and existing models for unidimensional tests are presented within this framework and implemented with software developed for generalized linear models. In addition to the measurement of change, the longitudinal models we present can also be used to explain individual differences in change scores for person groups (e.g., learning disabled students versus non-learning disabled students) and to model differences in item difficulties across item groups (e.g., number operation, measurement, and representation item groups in a mathematics test). An empirical example illustrates the use of the various models for measuring individual differences in change when there are person groups and multiple skill domains which lead to multidimensionality at a time point. © 2012 The British Psychological Society.
Analyzing force concept inventory with item response theory

NASA Astrophysics Data System (ADS)

Wang, Jing; Bao, Lei

2010-10-01

Item response theory is a popular assessment method used in education. It rests on the assumption of a probability framework that relates students' innate ability and their performance on test questions. Item response theory transforms students' raw test scores into a scaled proficiency score, which can be used to compare results obtained with different test questions. The scaled score also addresses the issues of ceiling effects and guessing, which commonly exist in quantitative assessment. We used item response theory to analyze the force concept inventory (FCI). Our results show that item response theory can be useful for analyzing physics concept surveys such as the FCI and produces results about the individual questions and student performance that are beyond the capability of classical statistics. The theory yields detailed measurement parameters regarding the difficulty, discrimination features, and probability of correct guess for each of the FCI questions.
The hierarchy of the activities of daily living in the Katz index in residents of skilled nursing facilities.

PubMed

Gerrard, Paul

2013-01-01

Nursing facility patients are a population that has not been well studied with regard to functional status and independence previously. As such, the manner in which activities of daily living (ADL) relate to one another is not well understood in this population. An understanding of ADL difficulty ordering has helped to devise systems of functional independence grading in other populations, which have value in understanding patients' global levels of independence and providing expectations regarding changes in function. This study seeks to examine the hierarchy of ADL in the nursing facility population. Data were analyzed from the 2004 National Nursing Home Survey, a cross-sectional data set of 13 507 skilled nursing facility subjects with functional independence items. The ADL difficulty hierarchy was determined using Rasch analysis. Item fit values for the Rasch model using Mean-Square infit statistics were also determined. The robustness of the hierarchy was tested for each ADL. Two grading systems were devised from the results of the item difficulty ordering. One was based on the most difficult item that he or she could perform, and the other assigned a grade based on the least difficult item that a subject could not perform. A total of 13 113 patients were included in this analysis, the majority of whom were female and white. They had an average age of 81 years. An ordered hierarchy of ADL was found with eating being the easiest and bathing the most difficult. All items in the Katz index fit the Rasch model adequately well. The majority of patients able to perform any particular ADL were also able to perform all easier ADL. Cohen's κ for the 2 grading systems was 0.73. This study is the first to show the expected hierarchy of difficulty of the 6 activities of daily proposed in the Katz index in the nursing facility population. The hierarchy found in this population matches the original hierarchy found in older adults in the community and acute care settings. It is also similar to hierarchy found in the inpatient rehabilitation setting. Patients would be expected to lose or gain function based on the order of difficulty, but this remains to be confirmed. Among the 6 activities of daily living tested here, their order from easiest to most difficult is eating, maintaining continence, transferring, toileting, dressing, and bathing. In addition, the index formed by these 6 items has construct validity in the nursing facility population.
Differential Item Functioning in Primary Healthcare Evaluation Instruments by French/English Version, Educational Level and Urban/Rural Location

PubMed Central

Haggerty, Jeannie L.; Bouharaoui, Fatima; Santor, Darcy A.

2011-01-01

Evaluating the extent to which groups or subgroups of individuals differ with respect to primary healthcare experience depends on first ruling out the possibility of bias. Objective: To determine whether item or subscale performance differs systematically between French/English, high/low education subgroups and urban/rural residency. Method: A sample of 645 adult users balanced by French/English language (in Quebec and Nova Scotia, respectively), high/low education and urban/rural residency responded to six validated instruments: the Primary Care Assessment Survey (PCAS); the Primary Care Assessment Tool – Short Form (PCAT-S); the Components of Primary Care Index (CPCI); the first version of the EUROPEP (EUROPEP-I); the Interpersonal Processes of Care Survey, version II (IPC-II); and part of the Veterans Affairs National Outpatient Customer Satisfaction Survey (VANOCSS). We normalized subscale scores to a 0-to-10 scale and tested for between-group differences using ANOVA tests. We used a parametric item response model to test for differences between subgroups in item discriminability and item difficulty. We re-examined group differences after removing items with differential item functioning. Results: Experience of care was assessed more positively in the English-speaking (Nova Scotia) than in the French-speaking (Quebec) respondents. We found differential English/French item functioning in 48% of the 153 items: discriminability in 20% and differential difficulty in 28%. English items were more discriminating generally than the French. Removing problematic items did not change the differences in French/English assessments. Differential item functioning by high/low education status affected 27% of items, with items being generally more discriminating in high-education groups. Between-group comparisons were unchanged. In contrast, only 9% of items showed differential item functioning by geography, affecting principally the accessibility attribute. Removing problematic items reversed a previously non-significant finding, revealing poorer first-contact access in rural than in urban areas. Conclusion: Differential item functioning does not bias or invalidate French/English comparisons on subscales, but additional development is required to make French and English items equivalent. These instruments are relatively robust by educational status and geography, but results suggest potential differences in the underlying construct in low-education and rural respondents. PMID:23205035
Adapting a Developmental Screening Measure: Exploring the Effects of Language and Culture on a Parent-Completed SocialEmotional Screening Test

ERIC Educational Resources Information Center

Chen, Chieh-Yu; Chen, Ching-I; Squires, Jane; Bian, Xiaoyan; Heo, Kay H.; Filgueiras, Alberto; Kalinina, Svetlana; Samarina, Larissa; Ermolaeva, Evgeniya; Xie, Huichao; Yu, Ting-Ying; Wu, Pei-Fang; Landeira-Fernandez, Jesus

2017-01-01

Ages & Stages Questionnaires: Social-Emotional (ASQ:SE) is a widely used screening instrument for detecting social-emotional difficulties in infants and young children. To use a screening instrument across cultures and countries, it is necessary to identify potential item-level biases and ensure item equivalence. This study investigated the…
Psychometric characteristics of Clinical Reasoning Problems (CRPs) and its correlation with routine multiple choice question (MCQ) in Cardiology department.

PubMed

Derakhshandeh, Zahra; Amini, Mitra; Kojuri, Javad; Dehbozorgian, Marziyeh

2018-01-01

Clinical reasoning is one of the most important skills in the process of training a medical student to become an efficient physician. Assessment of the reasoning skills in a medical school program is important to direct students' learning. One of the tests for measuring the clinical reasoning ability is Clinical Reasoning Problems (CRPs). The major aim of this study is to measure psychometric qualities of CRPs and define correlation between this test and routine MCQ in cardiology department of Shiraz medical school. This study was a descriptive study conducted on total cardiology residents of Shiraz Medical School. The study population consists of 40 residents in 2014. The routine CRPs and the MCQ tests was designed based on similar objectives and were carried out simultaneously. Reliability, item difficulty, item discrimination, and correlation between each item and the total score of CRPs were all measured by Excel and SPSS software for checking psycometeric CRPs test. Furthermore, we calculated the correlation between CRPs test and MCQ test. The mean differences of CRPs test score between residents' academic year [second, third and fourth year] were also evaluated by Analysis of variances test (One Way ANOVA) using SPSS software (version 20)(α=0.05). The mean and standard deviation of score in CRPs was 10.19 ±3.39 out of 20; in MCQ, it was 13.15±3.81 out of 20. Item difficulty was in the range of 0.27-0.72; item discrimination was 0.30-0.75 with question No.3 being the exception (that was 0.24). The correlation between each item and the total score of CRP was 0.26-0.87; the correlation between CRPs test and MCQ test was 0.68 (p<0.001). The reliability of the CRPs was 0.72 as calculated by using Cronbach's alpha. The mean score of CRPs was different among residents based on their academic year and this difference was statistically significant (p<0.001). The results of this present investigation revealed that CRPs could be reliable test for measuring clinical reasoning in residents. It can be included in cardiology residency assessment programs.
Using the Nudge and Shove Methods to Adjust Item Difficulty Values.

PubMed

Royal, Kenneth D

2015-01-01

In any examination, it is important that a sufficient mix of items with varying degrees of difficulty be present to produce desirable psychometric properties and increase instructors' ability to make appropriate and accurate inferences about what a student knows and/or can do. The purpose of this "teaching tip" is to demonstrate how examination items can be affected by the quality of distractors, and to present a simple method for adjusting items to meet difficulty specifications.
Development and initial psychometric evaluation of an item bank created to measure upper extremity function in persons with stroke.

PubMed

Higgins, Johanne; Finch, Lois E; Kopec, Jacek; Mayo, Nancy E

2010-02-01

To create and illustrate the development of a method to parsimoniously and hierarchically assess upper extremity function in persons after stroke. Data were analyzed using Rasch analysis. Re-analysis of data from 8 studies involving persons after stroke. Over 4000 patients with stroke who participated in various studies in Montreal and elsewhere in Canada. Data comprised 17 tests or indices of upper extremity function and health-related quality of life, for a total of 99 items related to upper extremity function. Tests and indices included, among others, the Box and Block Test, the Nine-Hole Peg Test and the Stroke Impact Scale. Data were collected at various times post-stroke from 3 days to 1 year. Once the data fit the model, a bank of items measuring upper extremity function with persons and items organized hierarchically by difficulty and ability in log units was produced. This bank forms the basis for eventual computer adaptive testing. The calibration of the items should be tested further psychometrically, as should the interpretation of the metric arising from using the item calibration to measure the upper extremity of individuals.
Short-term memory in autism spectrum disorder.

PubMed

Poirier, Marie; Martin, Jonathan S; Gaigg, Sebastian B; Bowler, Dermot M

2011-02-01

Three experiments examined verbal short-term memory in comparison and autism spectrum disorder (ASD) participants. Experiment 1 involved forward and backward digit recall. Experiment 2 used a standard immediate serial recall task where, contrary to the digit-span task, items (words) were not repeated from list to list. Hence, this task called more heavily on item memory. Experiment 3 tested short-term order memory with an order recognition test: Each word list was repeated with or without the position of 2 adjacent items swapped. The ASD group showed poorer performance in all 3 experiments. Experiments 1 and 2 showed that group differences were due to memory for the order of the items, not to memory for the items themselves. Confirming these findings, the results of Experiment 3 showed that the ASD group had more difficulty detecting a change in the temporal sequence of the items. (c) 2010 APA, all rights reserved.
Automated Item Generation with Recurrent Neural Networks.

PubMed

von Davier, Matthias

2018-03-12

Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven's progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.
The second version of the L. V. Prasad-functional vision questionnaire.

PubMed

Gothwal, Vijaya K; Sumalini, Rebecca; Bharani, Seelam; Reddy, Shailaja P; Bagga, Deepak K

2012-11-01

The L. V. Prasad-Functional Vision Questionnaire (LVP-FVQ) was developed using Rasch analysis to assess self-reported difficulties in performing daily tasks in school children with visual impairment (VI) in India. However, the LVP-FVQ has psychometric problems of inadequate measurement precision and lack of detailed assessment of dimensionality. Furthermore, items pertaining to use of technology are lacking. The aim of this study was to present the development and validation of the second version of LVP-FVQ (LVP-FVQ II). Development of LVP-FVQ II involved extracting items from other similar questionnaires (albeit developed for Western populations) and focus group discussions of children with VI and their parents that resulted in a 32-item pilot questionnaire. Overall, six items from the LVP-FVQ were retained. The questionnaire underwent pilot testing in 25 such children, following which a 27-item LVP-FVQ II emerged, and this was administered to 150 children with VI. Response to each item was rated on a three-category scale. Rasch analysis was used to validate the LVP-FVQ II. Rating scale was used by participants as was intended to. Four mobility-related items required deletion, as these did not contribute toward measurement of a single construct, indicating a secondary dimension. Deletion of the four items resulted in the 23-item unidimensional LVP-FVQ II, with good measurement precision, effective targeting of item difficulty to participant ability, and lack of notable differential item functioning. The LVP-FVQ II has high reliability, indicating that it is effectively able to discriminate between visual disability of school children in India, and is valid across age, gender, duration of VI, and location of residence. Given the superior measurement properties and the interval-level scores, the LVP-FVQ II appears to offer advantages over LVP-FVQ in assessment of difficulties in performing daily tasks in this population. It can be adapted for use in other developing countries.

Rasch model based analysis of the Force Concept Inventory

NASA Astrophysics Data System (ADS)

Planinic, Maja; Ivanjek, Lana; Susac, Ana

2010-06-01

The Force Concept Inventory (FCI) is an important diagnostic instrument which is widely used in the field of physics education research. It is therefore very important to evaluate and monitor its functioning using different tools for statistical analysis. One of such tools is the stochastic Rasch model, which enables construction of linear measures for persons and items from raw test scores and which can provide important insight in the structure and functioning of the test (how item difficulties are distributed within the test, how well the items fit the model, and how well the items work together to define the underlying construct). The data for the Rasch analysis come from the large-scale research conducted in 2006-07, which investigated Croatian high school students’ conceptual understanding of mechanics on a representative sample of 1676 students (age 17-18 years). The instrument used in research was the FCI. The average FCI score for the whole sample was found to be (27.7±0.4)% , indicating that most of the students were still non-Newtonians at the end of high school, despite the fact that physics is a compulsory subject in Croatian schools. The large set of obtained data was analyzed with the Rasch measurement computer software WINSTEPS 3.66. Since the FCI is routinely used as pretest and post-test on two very different types of population (non-Newtonian and predominantly Newtonian), an additional predominantly Newtonian sample ( N=141 , average FCI score of 64.5%) of first year students enrolled in introductory physics course at University of Zagreb was also analyzed. The Rasch model based analysis suggests that the FCI has succeeded in defining a sufficiently unidimensional construct for each population. The analysis of fit of data to the model found no grossly misfitting items which would degrade measurement. Some items with larger misfit and items with significantly different difficulties in the two samples of students do require further examination. The analysis revealed some problems with item distribution in the FCI and suggested that the FCI may function differently in non-Newtonian and predominantly Newtonian population. Some possible improvements of the test are suggested.
Skilled but Unaware of It: CAT Undermines a Test Taker's Metacognitive Competence

ERIC Educational Resources Information Center

Ortner, Tuulia M.; Weisskopf, Eva; Gerstenberg, Friederike X. R.

2013-01-01

We investigated students' metacognitive experiences with regard to feelings of difficulty (FD), feelings of satisfaction (FS), and estimate of effort (EE), employing either computerized adaptive testing (CAT) or computerized fixed item testing (FIT). In an experimental approach, 174 students in grades 10 to 13 were tested either with a CAT or a…
Impact of Accumulated Error on Item Response Theory Pre-Equating with Mixed Format Tests

ERIC Educational Resources Information Center

Keller, Lisa A.; Keller, Robert; Cook, Robert J.; Colvin, Kimberly F.

2016-01-01

The equating of tests is an essential process in high-stakes, large-scale testing conducted over multiple forms or administrations. By adjusting for differences in difficulty and placing scores from different administrations of a test on a common scale, equating allows scores from these different forms and administrations to be directly compared…
Effects of Differentially Time-Consuming Tests on Computer-Adaptive Test Scores

ERIC Educational Resources Information Center

Bridgeman, Brent; Cline, Frederick

2004-01-01

Time limits on some computer-adaptive tests (CATs) are such that many examinees have difficulty finishing, and some examinees may be administered tests with more time-consuming items than others. Results from over 100,000 examinees suggested that about half of the examinees must guess on the final six questions of the analytical section of the…
Everyday technology use among people with mental retardation: relevance, perceived difficulty, and influencing factors.

PubMed

Hällgren, Monica; Nygård, Louise; Kottorp, Anders

2014-05-01

While the development and possibilities of technology today are commonly regarded to be unlimited, knowledge regarding the technological needs of people with mental retardation is fairly limited. The aim of this study was to enhance knowledge of perceived relevance and difficulty in using everyday technology (ET) such as stoves, cell phones, and elevators in adults with mental retardation. 120 participants with different levels of mental retardation were interviewed with the Everyday Technology Use Questionnaire (ETUQ) about their use of such technologies in their everyday life. Analyses of variance, post hoc tests, and regression analyses were used to explore the data. Participants with moderate and severe mental retardation differed in mean perceived difficulty from those with mild mental retardation, suggesting that increased perceived difficulty in ET use is related to the level of mental retardation. Differences between groups were also found in the proportion of items that were relevant for each person. The variables Level of Mental Retardation, Additional Disabilities, and Proportional Relevance of ET Items could together predict 67.2% of the variation in perceived difficulty in technology use. The findings also indicate that age, housing, gender, and geographical district do not covariate with perceived difficulty in ET use.
The perceptual learning of time-compressed speech: A comparison of training protocols with different levels of difficulty

PubMed Central

Gabay, Yafit; Karni, Avi; Banai, Karen

2017-01-01

Speech perception can improve substantially with practice (perceptual learning) even in adults. Here we compared the effects of four training protocols that differed in whether and how task difficulty was changed during a training session, in terms of the gains attained and the ability to apply (transfer) these gains to previously un-encountered items (tokens) and to different talkers. Participants trained in judging the semantic plausibility of sentences presented as time-compressed speech and were tested on their ability to reproduce, in writing, the target sentences; trail-by-trial feedback was afforded in all training conditions. In two conditions task difficulty (low or high compression) was kept constant throughout the training session, whereas in the other two conditions task difficulty was changed in an adaptive manner (incrementally from easy to difficult, or using a staircase procedure). Compared to a control group (no training), all four protocols resulted in significant post-training improvement in the ability to reproduce the trained sentences accurately. However, training in the constant-high-compression protocol elicited the smallest gains in deciphering and reproducing trained items and in reproducing novel, untrained, items after training. Overall, these results suggest that training procedures that start off with relatively little signal distortion (“easy” items, not far removed from standard speech) may be advantageous compared to conditions wherein severe distortions are presented to participants from the very beginning of the training session. PMID:28545039
Ability evaluation by binary tests: Problems, challenges & recent advances

NASA Astrophysics Data System (ADS)

Bashkansky, E.; Turetsky, V.

2016-11-01

Binary tests designed to measure abilities of objects under test (OUTs) are widely used in different fields of measurement theory and practice. The number of test items in such tests is usually very limited. The response to each test item provides only one bit of information per OUT. The problem of correct ability assessment is even more complicated, when the levels of difficulty of the test items are unknown beforehand. This fact makes the search for effective ways of planning and processing the results of such tests highly relevant. In recent years, there has been some progress in this direction, generated by both the development of computational tools and the emergence of new ideas. The latter are associated with the use of so-called “scale invariant item response models”. Together with maximum likelihood estimation (MLE) approach, they helped to solve some problems of engineering and proficiency testing. However, several issues related to the assessment of uncertainties, replications scheduling, the use of placebo, as well as evaluation of multidimensional abilities still present a challenge for researchers. The authors attempt to outline the ways to solve the above problems.
Outcome-based self-assessment on a team-teaching subject in the medical school

PubMed Central

Cho, Sa Sun

2014-01-01

We attempted to investigate the reason why the students got a worse grade in gross anatomy and the way how we can improve upon the teaching method since there were gaps between teaching and learning under recently changed integration curriculum. General characteristics of students and exploratory factors to testify the validity were compared between year 2011 and 2012. Students were asked to complete a short survey with a Likert scale. The results were as follows: although the percentage of acceptable items was similar between professors, professor C preferred questions with adequate item discrimination and inappropriate item difficulty whereas professor Y preferred adequate item discrimination and appropriate item difficulty with statistical significance (P<0.01). The survey revealed that 26.5% of total students gave up the exam on gross anatomy of professor Y irrespective of years. These results suggested that students were affected by the corrected item difficulty rather than item discrimination in order to obtain academic achievement. Therefore, professors in a team-teaching subject should reach a consensus on an item difficulty with proper teaching methods. PMID:25548724
On the Issue of Item Selection in Computerized Adaptive Testing with Response Times

ERIC Educational Resources Information Center

Veldkamp, Bernard P.

2016-01-01

Many standardized tests are now administered via computer rather than paper-and-pencil format. The computer-based delivery mode brings with it certain advantages. One advantage is the ability to adapt the difficulty level of the test to the ability level of the test taker in what has been termed computerized adaptive testing (CAT). A second…
Relationship between item difficulty and discrimination indices in true/false-type multiple choice questions of a para-clinical multidisciplinary paper.

PubMed

Sim, Si-Mui; Rasiah, Raja Isaiah

2006-02-01

This paper reports the relationship between the difficulty level and the discrimination power of true/false-type multiple-choice questions (MCQs) in a multidisciplinary paper for the para-clinical year of an undergraduate medical programme. MCQ items in papers taken from Year II Parts A, B and C examinations for Sessions 2001/02, and Part B examinations for 2002/03 and 2003/04, were analysed to obtain their difficulty indices and discrimination indices. Each paper consisted of 250 true/false items (50 questions of 5 items each) on topics drawn from different disciplines. The questions were first constructed and vetted by the individual departments before being submitted to a central committee, where the final selection of the MCQs was made, based purely on the academic judgement of the committee. There was a wide distribution of item difficulty indices in all the MCQ papers analysed. Furthermore, the relationship between the difficulty index (P) and discrimination index (D) of the MCQ items in a paper was not linear, but more dome-shaped. Maximal discrimination (D = 51% to 71%) occurred with moderately easy/difficult items (P = 40% to 74%). On average, about 38% of the MCQ items in each paper were "very easy" (P > or =75%), while about 9% were "very difficult" (P <25%). About two-thirds of these very easy/difficult items had "very poor" or even negative discrimination (D < or =20%). MCQ items that demonstrate good discriminating potential tend to be moderately difficult items, and the moderately-to-very difficult items are more likely to show negative discrimination. There is a need to evaluate the effectiveness of our MCQ items.
An Ethical Issue Scale for Community Pharmacy Setting (EISP): Development and Validation.

PubMed

Crnjanski, Tatjana; Krajnovic, Dusanka; Tadic, Ivana; Stojkov, Svetlana; Savic, Mirko

2016-04-01

Many problems that arise when providing pharmacy services may contain some ethical components and the aims of this study were to develop and validate a scale that could assess difficulties of ethical issues, as well as the frequency of those occurrences in everyday practice of community pharmacists. Development and validation of the scale was conducted in three phases: (1) generating items for the initial survey instrument after qualitative analysis; (2) defining the design and format of the instrument; (3) validation of the instrument. The constructed Ethical Issue scale for community pharmacy setting has two parts containing the same 16 items for assessing the difficulty and frequency thereof. The results of the 171 completely filled out scales were analyzed (response rate 74.89%). The Cronbach's α value of the part of the instrument that examines difficulties of the ethical situations was 0.83 and for the part of the instrument that examined frequency of the ethical situations was 0.84. Test-retest reliability for both parts of the instrument was satisfactory with all Interclass correlation coefficient (ICC) values above 0.6, (for the part that examines severity ICC = 0.809, for the part that examines frequency ICC = 0.929). The 16-item scale, as a self assessment tool, demonstrated a high degree of content, criterion, and construct validity and test-retest reliability. The results support its use as a research tool to asses difficulty and frequency of ethical issues in community pharmacy setting. The validated scale needs to be further employed on a larger sample of pharmacists.
An instrument to measure nurses' knowledge in palliative care: Validation of the Spanish version of Palliative Care Quiz for Nurses

PubMed Central

2017-01-01

Background Palliative care is nowadays essential in nursing care, due to the increasing number of patients who require attention in final stages of their life. Nurses need to acquire specific knowledge and abilities to provide quality palliative care. Palliative Care Quiz for Nurses is a questionnaire that evaluates their basic knowledge about palliative care. The Palliative Care Quiz for Nurses (PCQN) is useful to evaluate basic knowledge about palliative care, but its adaptation into the Spanish language and the analysis of its effectiveness and utility for Spanish culture is lacking. Purpose To report the adaptation into the Spanish language and the psychometric analysis of the Palliative Care Quiz for Nurses. Method The Palliative Care Quiz for Nurses-Spanish Version (PCQN-SV) was obtained from a process including translation, back-translation, comparison with versions in other languages, revision by experts, and pilot study. Content validity and reliability of questionnaire were analyzed. Difficulty and discrimination indexes of each item were also calculated according to Item Response Theory (IRT). Findings Adequate internal consistency was found (S-CVI = 0.83); Cronbach's alpha coefficient of 0.67 and KR-20 test result of 0,72 reflected the reliability of PCQN-SV. The questionnaire had a global difficulty index of 0,55, with six items which could be considered as difficult or very difficult, and five items with could be considered easy or very easy. The discrimination indexes of the 20 items, show us that eight items are good or very good while six items are bad to discriminate between good and bad respondents. Discussion Although in shows internal consistency, reliability and difficulty indexes similar to those obtained by versions of PCQN in other languages, a reformulation of the items with lowest content validity or discrimination indexes and those showing difficulties with their comprehension is an aspect to take into account in order to improve the PCQN-SV. Conclusion The PCQN-SV is a useful Spanish language instrument for measuring Spanish nurses’ knowledge in palliative care and it is adequate to establish international comparisons. PMID:28545037
An instrument to measure nurses' knowledge in palliative care: Validation of the Spanish version of Palliative Care Quiz for Nurses.

PubMed

Chover-Sierra, Elena; Martínez-Sabater, Antonio; Lapeña-Moñux, Yolanda Raquel

2017-01-01

Palliative care is nowadays essential in nursing care, due to the increasing number of patients who require attention in final stages of their life. Nurses need to acquire specific knowledge and abilities to provide quality palliative care. Palliative Care Quiz for Nurses is a questionnaire that evaluates their basic knowledge about palliative care. The Palliative Care Quiz for Nurses (PCQN) is useful to evaluate basic knowledge about palliative care, but its adaptation into the Spanish language and the analysis of its effectiveness and utility for Spanish culture is lacking. To report the adaptation into the Spanish language and the psychometric analysis of the Palliative Care Quiz for Nurses. The Palliative Care Quiz for Nurses-Spanish Version (PCQN-SV) was obtained from a process including translation, back-translation, comparison with versions in other languages, revision by experts, and pilot study. Content validity and reliability of questionnaire were analyzed. Difficulty and discrimination indexes of each item were also calculated according to Item Response Theory (IRT). Adequate internal consistency was found (S-CVI = 0.83); Cronbach's alpha coefficient of 0.67 and KR-20 test result of 0,72 reflected the reliability of PCQN-SV. The questionnaire had a global difficulty index of 0,55, with six items which could be considered as difficult or very difficult, and five items with could be considered easy or very easy. The discrimination indexes of the 20 items, show us that eight items are good or very good while six items are bad to discriminate between good and bad respondents. Although in shows internal consistency, reliability and difficulty indexes similar to those obtained by versions of PCQN in other languages, a reformulation of the items with lowest content validity or discrimination indexes and those showing difficulties with their comprehension is an aspect to take into account in order to improve the PCQN-SV. The PCQN-SV is a useful Spanish language instrument for measuring Spanish nurses' knowledge in palliative care and it is adequate to establish international comparisons.
The Relationship between Older Adults’ Risk for a Future Fall and Difficulty Performing Activities of Daily Living

PubMed Central

Mamikonian-Zarpas, Ani; Laganá, Luciana

2016-01-01

Functional status is often defined by cumulative scores across indices of independence in performing basic and instrumental activities of daily living (ADL/IADL), but little is known about the unique relationship of each daily activity item with the fall outcome. The purpose of this retrospective study was to examine the level of relative risk for a future fall associated with difficulty with performing various tasks of normal daily functioning among older adults who had fallen at least once in the past 12 months. The sample was comprised of community-dwelling individuals 70 years and older from the 1984–1990 Longitudinal Study of Aging by Kovar, Fitti, and Chyba (1992). Risk analysis was performed on individual items quantifying 6 ADLs and 7 IADLs, as well as 10 items related to mobility limitations. Within a subsample of 1,675 older adults with a history of at least one fall within the past year, the responses of individuals who reported multiple falls were compared to the responses of participants who had a single fall and reported 1) difficulty with walking and/or balance (FRAIL group, n = 413) vs. 2) no difficulty with walking or dizziness (NDW+ND group, n = 415). The items that had the strongest relationships and highest risk ratios for the FRAIL group (which had the highest probabilities for a future fall) included difficulty with: eating (73%); managing money (70%); biting or chewing food (66%); walking a quarter of a mile (65%); using fingers to grasp (65%); and dressing without help (65%). For the NDW+ND group, the most noteworthy items included difficulty with: bathing or showering (79%); managing money (77%); shopping for personal items (75%); walking up 10 steps without rest (72%); difficulty with walking a quarter of a mile (72%); and stooping/crouching/kneeling (70%). These findings suggest that individual items quantifying specific ADLs and IADLs have substantive relationships with the fall outcome among older adults who have difficulty with walking and balance, as well as among older individuals without dizziness or difficulty with walking. Furthermore, the examination of the relationships between items that are related to more challenging activities and the fall outcome revealed that higher functioning older adults who reported difficulty with the 6 items that yielded the highest risk ratios may also be at elevated risk for a fall. PMID:27200366
The Dysexecutive Questionnaire advanced: item and test score characteristics, 4-factor solution, and severity classification.

PubMed

Bodenburg, Sebastian; Dopslaff, Nina

2008-01-01

The Dysexecutive Questionnaire (DEX, , Behavioral assessment of the dysexecutive syndrome, 1996) is a standardized instrument to measure possible behavioral changes as a result of the dysexecutive syndrome. Although initially intended only as a qualitative instrument, the DEX has also been used increasingly to address quantitative problems. Until now there have not been more fundamental statistical analyses of the questionnaire's testing quality. The present study is based on an unselected sample of 191 patients with acquired brain injury and reports on the data relating to the quality of the items, the reliability and the factorial structure of the DEX. Item 3 displayed too great an item difficulty, whereas item 11 was not sufficiently discriminating. The DEX's reliability in self-rating is r = 0.85. In addition to presenting the statistical values of the tests, a clinical severity classification of the overall scores of the 4 found factors and of the questionnaire as a whole is carried out on the basis of quartile standards.
AN INVESTIGATION OF ITEM BIAS.

ERIC Educational Resources Information Center

CLEARY, T. ANNE; HILTON, THOMAS L.

THE PURPOSE OF THIS INVESTIGATION WAS TO DETERMINE WHETHER THE PRELIMINARY SCHOLASTIC APTITUDE TEST PRESENTED A DIFFERENTIAL DIFFICULTY FOR RACIAL AND SOCIOECONOMIC GROUPS. THE SUBJECTS WERE TWO GROUPS TOTALING 1,410 NEGRO AND WHITE HIGH SCHOOL SENIORS IN AN INTEGRATED HIGH SCHOOL WHO HAD TAKEN THE TEST. THEY WERE DIVIDED INTO THREE SOCIOECONOMIC…
Math: Figure and Object Characteristics. Measurement and Geometry. Grades K-9. Revised Edition.

ERIC Educational Resources Information Center

Instructional Objectives Exchange, Los Angeles, CA.

To help classroom teachers construct mathematics tests, thirty-seven general objectives, corresponding sub-objectives, sample test items, and answers are presented. In general, sub-objectives are arranged in increasing order of difficulty. The objectives were written to comprehensively cover two categories: measurement and geometry. Measurement…
Divided attention: an undesirable difficulty in memory retention.

PubMed

Gaspelin, Nicholas; Ruthruff, Eric; Pashler, Harold

2013-10-01

How can we improve memory retention? A large body of research has suggested that difficulty encountered during learning, such as when practice sessions are distributed rather than massed, can enhance later memory performance (see R. A. Bjork & E. L. Bjork, 1992). Here, we investigated whether divided attention during retrieval practice can also constitute a desirable difficulty. Following two initial study phases and one test phase with Swahili-English word pairs (e.g., vuvi-snake), we manipulated whether items were tested again under full or divided attention. Two days later, participants were brought back for a final cued-recall test (e.g., vuvi-?). Across three experiments (combined N = 122), we found no evidence that dividing attention while practicing retrieval enhances memory retention. This finding raises the question of why many types of difficulty during practice do improve long-term retention, but dividing attention does not.
Psychometric Properties of Difficulties of Working with Patients with Personality Disorders and Attitudes Towards Patients with Personality Disorders Scales.

PubMed

Eren, Nurhan

2014-12-01

In this study, we aimed to develop two reliable and valid assessment instruments for investigating the level of difficulties mental health workers experience while working with patients with personality disorders and the attitudes they develop tt the patients. The research was carried out based on the general screening model. The study sample consisted of 332 mental health workers in several mental health clinics of Turkey, with a certain amount of experience in working with personality disorders, who were selected with a random assignment method. In order to collect data, the Personal Information Questionnaire, Difficulty of Working with Personality Disorders Scale (PD-DWS), and Attitudes Towards Patients with Personality Disorders Scale (PD-APS), which are being examined for reliability and validity, were applied. To determine construct validity, the Adjective Check List, Maslach Burnout Inventory, and State and Trait Anxiety Inventory were used. Explanatory factor analysis was used for investigating the structural validity, and Cronbach alpha, Spearman-Brown, Guttman Split-Half reliability analyses were utilized to examine the reliability. Also, item reliability and validity computations were carried out by investigating the corrected item-total correlations and discriminative indexes of the items in the scales. For the PD-DWS KMO test, the value was .946; also, a significant difference was found for the Bartlett sphericity test (p<.001). The computed test-retest coefficient reliability was .702; the Cronbach alpha value of the total test score was .952. For PD-APS KMO, the value was .925; a significant difference was found in Bartlett sphericity test (p<.001); the computed reliability coefficient based on continuity was .806; and the Cronbach alpha value of the total test score was .913. Analyses on both scales were based on total scores. It was found that PD-DWS and PD-APS have good psychometric properties, measuring the structure that is being investigated, are compatible with other scales, have high levels of internal reliability between their items, and are consistent across time. Therefore, it was concluded that both scales are valid and reliable instruments.
Improved Classification of Mammograms Following Idealized Training

PubMed Central

Hornsby, Adam N.; Love, Bradley C.

2014-01-01

People often make decisions by stochastically retrieving a small set of relevant memories. This limited retrieval implies that human performance can be improved by training on idealized category distributions (Giguère & Love, 2013). Here, we evaluate whether the benefits of idealized training extend to categorization of real-world stimuli, namely classifying mammograms as normal or tumorous. Participants in the idealized condition were trained exclusively on items that, according to a norming study, were relatively unambiguous. Participants in the actual condition were trained on a representative range of items. Despite being exclusively trained on easy items, idealized-condition participants were more accurate than those in the actual condition when tested on a range of item types. However, idealized participants experienced difficulties when test items were very dissimilar from training cases. The benefits of idealization, attributable to reducing noise arising from cognitive limitations in memory retrieval, suggest ways to improve real-world decision making. PMID:24955325

Improved Classification of Mammograms Following Idealized Training.

PubMed

Hornsby, Adam N; Love, Bradley C

2014-06-01

People often make decisions by stochastically retrieving a small set of relevant memories. This limited retrieval implies that human performance can be improved by training on idealized category distributions (Giguère & Love, 2013). Here, we evaluate whether the benefits of idealized training extend to categorization of real-world stimuli, namely classifying mammograms as normal or tumorous. Participants in the idealized condition were trained exclusively on items that, according to a norming study, were relatively unambiguous. Participants in the actual condition were trained on a representative range of items. Despite being exclusively trained on easy items, idealized-condition participants were more accurate than those in the actual condition when tested on a range of item types. However, idealized participants experienced difficulties when test items were very dissimilar from training cases. The benefits of idealization, attributable to reducing noise arising from cognitive limitations in memory retrieval, suggest ways to improve real-world decision making.
Informed choice: understanding knowledge in the context of screening uptake.

PubMed

Michie, Susan; Dormandy, Elizabeth; Marteau, Theresa M

2003-07-01

This study evaluates a scale measuring knowledge about a screening test and investigates the association between knowledge, uptake and attitudes towards screening. One thousand four hundred ninety-nine pregnant women completed the knowledge scale of the multidimensional measure of informed choice (MMIC). Three hundred forty-five of these women and 152 professionals providing antenatal care also rated the importance of the knowledge items. Item characteristic curves show that, with one exception, the knowledge items reflect a spread of difficulty and are able to discriminate between people. All items were seen as essential or helpful by both women and health professionals, with two items seen as particularly important and one as unimportant. There were some differences between health professionals, women with low risk results and women with high risk results. Knowledge was not associated with uptake, attitude, or the extent to which uptake was consistent with women's attitudes towards undergoing the test.
The Potential Use of the Discouraging Random Guessing (DRG) Approach in Multiple-Choice Exams in Medical Education.

ERIC Educational Resources Information Center

Friedman, Miriam; And Others

1987-01-01

Test performances of sophomore medical students on a pretest and final exam (under guessing and no-guessing instructions) were compared. Discouraging random guessing produced test information with improved test reliability and less distortion of item difficulty. More able examinees were less compliant than less able examinees. (Author/RH)
Validation of a Computerized Cognitive Assessment System for Persons with Stroke: A Pilot Study

ERIC Educational Resources Information Center

Yip, Chi Kwong; Man, David W. K.

2009-01-01

This study investigates the validity of a newly developed computerized cognitive assessment system (CCAS) that is equipped with rich multimedia to generate simulated testing situations and considers both test item difficulty and the test taker's ability. It is also hypothesized that better predictive validity of the CCAS in self-care of persons…
Response pattern of depressive symptoms among college students: What lies behind items of the Beck Depression Inventory-II?

PubMed

de Sá Junior, Antonio Reis; de Andrade, Arthur Guerra; Andrade, Laura Helena; Gorenstein, Clarice; Wang, Yuan-Pang

2018-07-01

This study examines the response pattern of depressive symptoms in a nationwide student sample, through item analyses of a rating scale by both classical test theory (CTT) and item response theory (IRT). The 21-item Beck Depression Inventory-II (BDI-II) was administered to 12,711 college students. First, the psychometric properties of the scale were described. Thereafter, the endorsement probability of depressive symptom in each scale item was analyzed through CTT and IRT. Graphical plots depicted the endorsement probability of scale items and intensity of depression. Three items of different difficulty level were compared through CTT and IRT approach. Four in five students reported the presence of depressive symptoms. The BDI-II items presented good reliability and were distributed along the symptomatic continuum of depression. Similarly, in both CTT and IRT approaches, the item 'changes in sleep' was easily endorsed, 'loss of interest' moderately and 'suicidal thoughts' hardly. Graphical representation of BDI-II of both methods showed much equivalence in terms of item discrimination and item difficulty. The item characteristic curve of the IRT method provided informative evaluation of item performance. The inventory was applied only in college students. Depressive symptoms were frequent psychopathological manifestations among college students. The performance of the BDI-II items indicated convergent results from both methods of analysis. While the CTT was easy to understand and to apply, the IRT was more complex to understand and to implement. Comprehensive assessment of the functioning of each BDI-II item might be helpful in efficient detection of depressive conditions in college students. Copyright © 2018 Elsevier B.V. All rights reserved.
[Development of critical thinking skill evaluation scale for nursing students].

PubMed

You, So Young; Kim, Nam Cho

2014-04-01

To develop a Critical Thinking Skill Test for Nursing Students. The construct concepts were drawn from a literature review and in-depth interviews with hospital nurses and surveys were conducted among students (n=607) from nursing colleges. The data were collected from September 13 to November 23, 2012 and analyzed using the SAS program, 9.2 version. The KR 20 coefficient for reliability, difficulty index, discrimination index, item-total correlation and known group technique for validity were performed. Four domains and 27 skills were identified and 35 multiple choice items were developed. Thirty multiple choice items which had scores higher than .80 on the content validity index were selected for the pre test. From the analysis of the pre test data, a modified 30 items were selected for the main test. In the main test, the KR 20 coefficient was .70 and Corrected Item-Total Correlations range was .11-.38. There was a statistically significant difference between two academic systems (p=.001). The developed instrument is the first critical thinking skill test reflecting nursing perspectives in hospital settings and is expected to be utilized as a tool which contributes to improvement of the critical thinking ability of nursing students.
A Method of Q-Matrix Validation for the Linear Logistic Test Model

PubMed Central

Baghaei, Purya; Hohensinn, Christine

2017-01-01

The linear logistic test model (LLTM) is a well-recognized psychometric model for examining the components of difficulty in cognitive tests and validating construct theories. The plausibility of the construct model, summarized in a matrix of weights, known as the Q-matrix or weight matrix, is tested by (1) comparing the fit of LLTM with the fit of the Rasch model (RM) using the likelihood ratio (LR) test and (2) by examining the correlation between the Rasch model item parameters and LLTM reconstructed item parameters. The problem with the LR test is that it is almost always significant and, consequently, LLTM is rejected. The drawback of examining the correlation coefficient is that there is no cut-off value or lower bound for the magnitude of the correlation coefficient. In this article we suggest a simulation method to set a minimum benchmark for the correlation between item parameters from the Rasch model and those reconstructed by the LLTM. If the cognitive model is valid then the correlation coefficient between the RM-based item parameters and the LLTM-reconstructed item parameters derived from the theoretical weight matrix should be greater than those derived from the simulated matrices. PMID:28611721
Analysis instrument test on mathematical power the material geometry of space flat side for grade 8

NASA Astrophysics Data System (ADS)

Kusmaryono, Imam; Suyitno, Hardi; Dwijanto, Karomah, Nur

2017-08-01

The main problem of research to determine the quality of test items on the material side of flat geometry to assess students' mathematical power. The method used is quantitative descriptive. The subjects were students of class 8 as many as 20 students. The object of research is the quality of test items in terms of the power of mathematics: validity, reliability, level of difficulty and power differentiator. Instrument mathematical power ratings are tested include: written tests and questionnaires about the disposition of mathematical power. Data were obtained from the field, in the form of test data on the material geometry of space flat side and questionnaires. The results of the test instrument to the reliability of the test item is influenced by many factors. Factors affecting the reliability of the instrument is the number of items, homogeneity test questions, the time required, the uniformity of conditions of the test taker, the homogeneity of the group, the variability problem, and motivation of the individual (person taking the test). Overall, the evaluation results of this study stated that the test instrument can be used as a tool to measure students' mathematical power.
The development and validation of a test of science critical thinking for fifth graders.

PubMed

Mapeala, Ruslan; Siew, Nyet Moi

2015-01-01

The paper described the development and validation of the Test of Science Critical Thinking (TSCT) to measure the three critical thinking skill constructs: comparing and contrasting, sequencing, and identifying cause and effect. The initial TSCT consisted of 55 multiple choice test items, each of which required participants to select a correct response and a correct choice of critical thinking used for their response. Data were obtained from a purposive sampling of 30 fifth graders in a pilot study carried out in a primary school in Sabah, Malaysia. Students underwent the sessions of teaching and learning activities for 9 weeks using the Thinking Maps-aided Problem-Based Learning Module before they answered the TSCT test. Analyses were conducted to check on difficulty index (p) and discrimination index (d), internal consistency reliability, content validity, and face validity. Analysis of the test-retest reliability data was conducted separately for a group of fifth graders with similar ability. Findings of the pilot study showed that out of initial 55 administered items, only 30 items with relatively good difficulty index (p) ranged from 0.40 to 0.60 and with good discrimination index (d) ranged within 0.20-1.00 were selected. The Kuder-Richardson reliability value was found to be appropriate and relatively high with 0.70, 0.73 and 0.92 for identifying cause and effect, sequencing, and comparing and contrasting respectively. The content validity index obtained from three expert judgments equalled or exceeded 0.95. In addition, test-retest reliability showed good, statistically significant correlations ([Formula: see text]). From the above results, the selected 30-item TSCT was found to have sufficient reliability and validity and would therefore represent a useful tool for measuring critical thinking ability among fifth graders in primary science.
What Can We Learn about Auditory Processing from Adult Hearing Questionnaires?

PubMed

Bamiou, Doris-Eva; Iliadou, Vasiliki Vivian; Zanchetta, Sthella; Spyridakou, Chrysa

2015-01-01

Questionnaires addressing auditory disability may identify and quantify specific symptoms in adult patients with listening difficulties. (1) To assess validity of the Speech, Spatial, and Qualities of Hearing Scale (SSQ), the (Modified) Amsterdam Inventory for Auditory Disability (mAIAD), and the Hyperacusis Questionnaire (HYP) in adult patients experiencing listening difficulties in the presence of a normal audiogram. (2) To examine which individual questionnaire items give the worse scores in clinical participants with an auditory processing disorder (APD). A prospective correlational analysis study. Clinical participants (N = 58) referred for assessment because of listening difficulties in the presence of normal audiometric thresholds to audiology/ear, nose, and throat or audiovestibular medicine clinics. Normal control participants (N = 30). The mAIAD, HYP, and the SSQ were administered to a clinical population of nonneurological adults who were referred for auditory processing (AP) assessment because of hearing complaints, in the presence of normal audiogram and cochlear function, and to a sample of age-matched normal-hearing controls, before the AP testing. Clinical participants with abnormal results in at least one ear and in at least two tests of AP (and at least one of these tests to be nonspeech) were classified as clinical APD (N = 39), and the remaining (16 of whom had a single test abnormality) as clinical non-APD (N = 19). The SSQ correlated strongly with the mAIAD and the HYP, and correlation was similar within the clinical group and the normal controls. All questionnaire total scores and subscores (except sound distinction of mAIAD) were significantly worse in the clinical APD versus the normal group, while questionnaire total scores and most subscores indicated greater listening difficulties for the clinical non-APD versus the normal subgroups. Overall, the clinical non-APD group tended to give better scores than the APD in all questionnaires administered. Correlation was strong for the worse-ear gaps-in-noise threshold with the SSQ, mAIAD, and HYP; strong to moderate for the speech in babble and left-ear dichotic digit test scores (at p < 0.01); and weak to moderate for the remaining AP tests except the frequency pattern test that did not correlate. The worse-scored items in all three questionnaires concerned speech-in-noise questions. This is similar to worse-scored items by hearing-impaired participants as reported in the literature. Worse-scored items of the clinical group also included quality aspects of listening questions from the SSQ, which most likely pertain to cognitive aspects of listening, such as ability to ignore other sounds and listening effort. Hearing questionnaires may help assess symptoms of adults with APD. The listening difficulties and needs of adults with APD to some extent overlap with those of hearing-impaired listeners, but there are significant differences. The correlation of the gaps-in-noise and duration pattern (but not frequency pattern) tests with the questionnaire scores indicates that temporal processing deficits may play an important role in clinical presentation. American Academy of Audiology.
Meta-Analysis of Fluid Intelligence Tests of Children from the Chinese Mainland with Learning Difficulties

PubMed Central

Tong, Fang; Fu, Tong

2013-01-01

Objective To evaluate the differences in fluid intelligence tests between normal children and children with learning difficulties in China. Method PubMed, MD Consult, and other Chinese Journal Database were searched from their establishment to November 2012. After finding comparative studies of Raven measurements of normal children and children with learning difficulties, full Intelligent Quotation (FIQ) values and the original values of the sub-measurement were extracted. The corresponding effect model was selected based on the results of heterogeneity and parallel sub-group analysis was performed. Results Twelve documents were included in the meta-analysis, and the studies were all performed in mainland of China. Among these, two studies were performed at child health clinics, the other ten sites were schools and control children were schoolmates or classmates. FIQ was evaluated using a random effects model. WMD was −13.18 (95% CI: −16.50–−9.85). Children with learning difficulties showed significantly lower FIQ scores than controls (P<0.00001); Type of learning difficulty and gender differences were evaluated using a fixed-effects model (I2 = 0%). The sites and purposes of the studies evaluated here were taken into account, but the reasons of heterogeneity could not be eliminated; The sum IQ of all the subgroups showed considerable heterogeneity (I2 = 76.5%). The sub-measurement score of document A showed moderate heterogeneity among all documents, and AB, B, and E showed considerable heterogeneity, which was used in a random effect model. Individuals with learning difficulties showed heterogeneity as well. There was a moderate delay in the first three items (−0.5 to −0.9), and a much more pronounced delay in the latter three items (−1.4 to −1.6). Conclusion In the Chinese mainland, the level of fluid intelligence of children with learning difficulties was lower than that of normal children. Delayed development in sub-items of C, D, and E was more obvious. PMID:24236016
Constructing objective tests

NASA Astrophysics Data System (ADS)

Aubrecht, Gordon J.; Aubrecht, Judith D.

1983-07-01

True-false or multiple-choice tests can be useful instruments for evaluating student progress. We examine strategies for planning objective tests which serve to test the material covered in science (physics) courses. We also examine strategies for writing questions for tests within a test blueprint. The statistical basis for judging the quality of test items are discussed. Reliability, difficulty, and discrimination indices are defined and examples presented. Our recommendation are rather easily put into practice.
Psychometric characteristics of Clinical Reasoning Problems (CRPs) and its correlation with routine multiple choice question (MCQ) in Cardiology department

PubMed Central

DERAKHSHANDEH, ZAHRA; AMINI, MITRA; KOJURI, JAVAD; DEHBOZORGIAN, MARZIYEH

2018-01-01

Introduction: Clinical reasoning is one of the most important skills in the process of training a medical student to become an efficient physician. Assessment of the reasoning skills in a medical school program is important to direct students’ learning. One of the tests for measuring the clinical reasoning ability is Clinical Reasoning Problems (CRPs). The major aim of this study is to measure psychometric qualities of CRPs and define correlation between this test and routine MCQ in cardiology department of Shiraz medical school. Methods: This study was a descriptive study conducted on total cardiology residents of Shiraz Medical School. The study population consists of 40 residents in 2014. The routine CRPs and the MCQ tests was designed based on similar objectives and were carried out simultaneously. Reliability, item difficulty, item discrimination, and correlation between each item and the total score of CRPs were all measured by Excel and SPSS software for checking psycometeric CRPs test. Furthermore, we calculated the correlation between CRPs test and MCQ test. The mean differences of CRPs test score between residents’ academic year [second, third and fourth year] were also evaluated by Analysis of variances test (One Way ANOVA) using SPSS software (version 20)(α=0.05). Results: The mean and standard deviation of score in CRPs was 10.19 ±3.39 out of 20; in MCQ, it was 13.15±3.81 out of 20. Item difficulty was in the range of 0.27-0.72; item discrimination was 0.30-0.75 with question No.3 being the exception (that was 0.24). The correlation between each item and the total score of CRP was 0.26-0.87; the correlation between CRPs test and MCQ test was 0.68 (p<0.001). The reliability of the CRPs was 0.72 as calculated by using Cronbach's alpha. The mean score of CRPs was different among residents based on their academic year and this difference was statistically significant (p<0.001). Conclusion: The results of this present investigation revealed that CRPs could be reliable test for measuring clinical reasoning in residents. It can be included in cardiology residency assessment programs. PMID:29344528
Item response theory analysis of the Utrecht Work Engagement Scale for Students (UWES-S) using a sample of Japanese university and college students majoring medical science, nursing, and natural science.

PubMed

Tsubakita, Takashi; Shimazaki, Kazuyo; Ito, Hiroshi; Kawazoe, Nobuo

2017-10-30

The Utrecht Work Engagement Scale for Students has been used internationally to assess students' academic engagement, but it has not been analyzed via item response theory. The purpose of this study was to conduct an item response theory analysis of the Japanese version of the Utrecht Work Engagement Scale for Students translated by authors. Using a two-parameter model and Samejima's graded response model, difficulty and discrimination parameters were estimated after confirming the factor structure of the scale. The 14 items on the scale were analyzed with a sample of 3214 university and college students majoring medical science, nursing, or natural science in Japan. The preliminary parameter estimation was conducted with the two parameter model, and indicated that three items should be removed because there were outlier parameters. Final parameter estimation was conducted using the survived 11 items, and indicated that all difficulty and discrimination parameters were acceptable. The test information curve suggested that the scale better assesses higher engagement than average engagement. The estimated parameters provide a basis for future comparative studies. The results also suggested that a 7-point Likert scale is too broad; thus, the scaling should be modified to fewer graded scaling structure.
Playing the Recording Once or Twice: Effects on Listening Test Performances

ERIC Educational Resources Information Center

Ruhm, Richard; Leitner-Jones, Claire; Kulmhofer, Andrea; Kiefer, Thomas; Mlakar, Heike; Itzlinger-Bruneforth, Ursula

2016-01-01

Much debate surrounds the issue of whether allowing candidates to listen to recordings twice is more desirable in language tests than offering just one opportunity. Using regression models, this study investigates, analyses and interconnects both item difficulty and stimulus length in relation to the frequency of stimulus presentation and its…
KIDMAP--A Diagnostic Tool for Teachers.

ERIC Educational Resources Information Center

Lee, Yew Jin; Linacre, John M.; Yeoh, Oon Chye

While assessment is the bread and butter of the teaching profession, its practitioners usually do not extend analysis of test responses beyond simple measures such as facility or discrimination indices in classical test theory. Item response theory (IRT) has much to offer but its nonintuitive content and difficulty make it a formidable obstacle in…
Content Validity and Psychometric Characteristics of the "Knowledge about Older Patients Quiz" for Nurses Using Item Response Theory.

PubMed

Dikken, Jeroen; Hoogerduijn, Jita G; Kruitwagen, Cas; Schuurmans, Marieke J

2016-11-01

To assess the content validity and psychometric characteristics of the Knowledge about Older Patients Quiz (KOP-Q), which measures nurses' knowledge regarding older hospitalized adults and their certainty regarding this knowledge. Cross-sectional. Content validity: general hospitals. Psychometric characteristics: nursing school and general hospitals in the Netherlands. Content validity: 12 nurse specialists in geriatrics. Psychometric characteristics: 107 first-year and 78 final-year bachelor of nursing students, 148 registered nurses, and 20 nurse specialists in geriatrics. Content validity: The nurse specialists rated each item of the initial KOP-Q (52 items) on relevance. Ratings were used to calculate Item-Content Validity Index and average Scale-Content Validity Index (S-CVI/ave) scores. Items with insufficient content validity were removed. Psychometric characteristics: Ratings of students, nurses, and nurse specialists were used to test for different item functioning (DIF) and unidimensionality before item characteristics (discrimination and difficulty) were examined using Item Response Theory. Finally, norm references were calculated and nomological validity was assessed. Content validity: Forty-three items remained after assessing content validity (S-CVI/ave = 0.90). Psychometric characteristics: Of the 43 items, two demonstrating ceiling effects and 11 distorting ability estimates (DIF) were subsequently excluded. Item characteristics were assessed for the remaining 30 items, all of which demonstrated good discrimination and difficulty parameters. Knowledge was positively correlated with certainty about this knowledge. The final 30-item KOP-Q is a valid, psychometrically sound, comprehensive instrument that can be used to assess the knowledge of nursing students, hospital nurses, and nurse specialists in geriatrics regarding older hospitalized adults. It can identify knowledge and certainty deficits for research purposes or serve as a tool in educational or quality improvement programs. © 2016, Copyright the Authors Journal compilation © 2016, The American Geriatrics Society.
Facilitating the Interpretation of English Language Proficiency Scores: Combining Scale Anchoring and Test Score Mapping Methodologies

ERIC Educational Resources Information Center

Powers, Donald; Schedl, Mary; Papageorgiou, Spiros

2017-01-01

The aim of this study was to develop, for the benefit of both test takers and test score users, enhanced "TOEFL ITP"® test score reports that go beyond the simple numerical scores that are currently reported. To do so, we applied traditional scale anchoring (proficiency scaling) to item difficulty data in order to develop performance…
Effects of Test Level Discrimination and Difficulty on Answer-Copying Indices

ERIC Educational Resources Information Center

Sunbul, Onder; Yormaz, Seha

2018-01-01

In this study Type I Error and the power rates of omega (?) and GBT (generalized binomial test) indices were investigated for several nominal alpha levels and for 40 and 80-item test lengths with 10,000-examinee sample size under several test level restrictions. As a result, Type I error rates of both indices were found to be below the acceptable…
Item-focussed Trees for the Identification of Items in Differential Item Functioning.

PubMed

Tutz, Gerhard; Berger, Moritz

2016-09-01

A novel method for the identification of differential item functioning (DIF) by means of recursive partitioning techniques is proposed. We assume an extension of the Rasch model that allows for DIF being induced by an arbitrary number of covariates for each item. Recursive partitioning on the item level results in one tree for each item and leads to simultaneous selection of items and variables that induce DIF. For each item, it is possible to detect groups of subjects with different item difficulties, defined by combinations of characteristics that are not pre-specified. The way a DIF item is determined by covariates is visualized in a small tree and therefore easily accessible. An algorithm is proposed that is based on permutation tests. Various simulation studies, including the comparison with traditional approaches to identify items with DIF, show the applicability and the competitive performance of the method. Two applications illustrate the usefulness and the advantages of the new method.

Readability and Item Difficulty of the Texas Assessment of Knowledge and Skills Fifth-Grade Science Tests

ERIC Educational Resources Information Center

Thomas, Conn; Carpenter, Clint

2008-01-01

The development of the Texas Assessment of Knowledge and Skills test involves input from educators across the state. The development process attempts to create an assessment that reflects the skills and content understanding of students at the tested grade level. This study attempts to determine other factors that can affect student performance on…
Ubiquitous testing using tablets: its impact on medical student perceptions of and engagement in learning.

PubMed

Kim, Kyong-Jee; Hwang, Jee-Young

2016-03-01

Ubiquitous testing has the potential to affect medical education by enhancing the authenticity of the assessment using multimedia items. This study explored medical students' experience with ubiquitous testing and its impact on student learning. A cohort (n=48) of third-year students at a medical school in South Korea participated in this study. The students were divided into two groups and were given different versions of 10 content-matched items: one in text version (the text group) and the other in multimedia version (the multimedia group). Multimedia items were delivered using tablets. Item response analyses were performed to compare item characteristics between the two versions. Additionally, focus group interviews were held to investigate the students' experiences of ubiquitous testing. The mean test score was significantly higher in the text group. Item difficulty and discrimination did not differ between text and multimedia items. The participants generally showed positive responses on ubiquitous testing. Still, they felt that the lectures that they had taken in preclinical years did not prepare them enough for this type of assessment and clinical encounters during clerkships were more helpful. To be better prepared, the participants felt that they needed to engage more actively in learning in clinical clerkships and have more access to multimedia learning resources. Ubiquitous testing can positively affect student learning by reinforcing the importance of being able to understand and apply knowledge in clinical contexts, which drives students to engage more actively in learning in clinical settings.
Is It Working? Distractor Analysis Results from the Test Of Astronomy STandards (TOAST) Assessment Instrument

NASA Astrophysics Data System (ADS)

Slater, Stephanie

2009-05-01

The Test Of Astronomy STandards (TOAST) assessment instrument is a multiple-choice survey tightly aligned to the consensus learning goals stated by the American Astronomical Society - Chair's Conference on ASTRO 101, the American Association of the Advancement of Science's Project 2061 Benchmarks, and the National Research Council's National Science Education Standards. Researchers from the Cognition in Astronomy, Physics and Earth sciences Research (CAPER) Team at the University of Wyoming's Science and Math Teaching Center (UWYO SMTC) have been conducting a question-by-question distractor analysis procedure to determine the sensitivity and effectiveness of each item. In brief, the frequency each possible answer choice, known as a foil or distractor on a multiple-choice test, is determined and compared to the existing literature on the teaching and learning of astronomy. In addition to having statistical difficulty and discrimination values, a well functioning assessment item will show students selecting distractors in the relative proportions to how we expect them to respond based on known misconceptions and reasoning difficulties. In all cases, our distractor analysis suggests that all items are functioning as expected. These results add weight to the validity of the Test Of Astronomy STandards (TOAST) assessment instrument, which is designed to help instructors and researchers measure the impact of course-length duration instructional strategies for undergraduate science survey courses with learning goals tightly aligned to the consensus goals of the astronomy education community.
Accuracy of a Classical Test Theory-Based Procedure for Estimating the Reliability of a Multistage Test. Research Report. ETS RR-17-02

ERIC Educational Resources Information Center

Kim, Sooyeon; Livingston, Samuel A.

2017-01-01

The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…
Item response theory and the measurement of motor behavior.

PubMed

Safrit, M J; Cohen, A S; Costa, M G

1989-12-01

Item response theory (IRT) has been the focus of intense research and development activity in educational and psychological measurement during the past decade. Because this theory can provide more precise information about test items than other theories usually used in measuring motor behavior, the application of IRT in physical education and exercise science merits investigation. In IRT, the difficulty level of each item (e.g., trial or task) can be estimated and placed on the same scale as the ability of the examinee. Using this information, the test developer can determine the ability levels at which the test functions best. Equating the scores of individuals on two or more items or tests can be handled efficiently by applying IRT. The precision of the identification of performance standards in a mastery test context can be enhanced, as can adaptive testing procedures. In this tutorial, several potential benefits of applying IRT to the measurement of motor behavior were described. An example is provided using bowling data and applying the graded-response form of the Rasch IRT model. The data were calibrated and the goodness of fit was examined. This analysis is described in a step-by-step approach. Limitations to using an IRT model with a test consisting of repeated measures were noted.
Validation of a new measure of availability and accommodation of health care that is valid for rural and urban contexts.

PubMed

Haggerty, Jeannie L; Levesque, Jean-Frédéric

2017-04-01

Patients are the most valid source for evaluating the accessibility of services, but a previous study observed differential psychometric performance of instruments in rural and urban respondents. To validate a measure of organizational accessibility free of differential rural-urban performance that predicts consequences of difficult access for patient-initiated care. Sequential qualitative-quantitative study. Qualitative findings used to adapt or develop evaluative and reporting items. Quantitative validation study. Primary data by telephone from 750 urban, rural and remote respondents in Quebec, Canada; follow-up mailed questionnaire to a subset of 316. Items were developed for barriers along the care trajectory. We used common factor and confirmatory factor analysis to identify constructs and compare models. We used item response theory analysis to test for differential rural-urban performance; examine individual item performance; adjust response options; and exclude redundant or non-discriminatory items. We used logistic regression to examine predictive validity of the subscale on access difficulty (outcome). Initial factor resolution suggested geographic and organizational dimensions, plus consequences of access difficulty. After second administration, organizational accommodation and geographic indicators were integrated into a 6-item subscale of Effective Availability and Accommodation, which demonstrates good variability and internal consistency (α = 0.84) and no differential functioning by geographic area. Each unit increase predicts decreased likelihood of consequences of access difficulties (unmet need and problem aggravation). The new subscale is a practical, valid and reliable measure for patients to evaluate first-contact health services accessibility, yielding valid comparisons between urban and rural contexts. © 2016 The Authors. Health Expectations published by John Wiley & Sons Ltd.
Accommodations for Multiple Choice Tests

ERIC Educational Resources Information Center

Trammell, Jack

2011-01-01

Students with learning or learning-related disabilities frequently struggle with multiple choice assessments due to difficulty discriminating between items, filtering out distracters, and framing a mental best answer. This Practice Brief suggests accommodations and strategies that disability service providers can utilize in conjunction with…
The Definition of Difficulty and Discrimination for Multidimensional Item Response Theory Models.

ERIC Educational Resources Information Center

Reckase, Mark D.; McKinley, Robert L.

A study was undertaken to develop guidelines for the interpretation of the parameters of three multidimensional item response theory models and to determine the relationship between the parameters and traditional concepts of item difficulty and discrimination. The three models considered were multidimensional extensions of the one-, two-, and…
An opportunity in difficulty: Japan-Korea-Taiwan expert Delphi consensus on surgical difficulty during laparoscopic cholecystectomy.

PubMed

Iwashita, Yukio; Hibi, Taizo; Ohyama, Tetsuji; Honda, Goro; Yoshida, Masahiro; Miura, Fumihiko; Takada, Tadahiro; Han, Ho-Seong; Hwang, Tsann-Long; Shinya, Satoshi; Suzuki, Kenji; Umezawa, Akiko; Yoon, Yoo-Seok; Choi, In-Seok; Huang, Wayne Shih-Wei; Chen, Kuo-Hsin; Watanabe, Manabu; Abe, Yuta; Misawa, Takeyuki; Nagakawa, Yuichi; Yoon, Dong-Sup; Jang, Jin-Young; Yu, Hee Chul; Ahn, Keun Soo; Kim, Song Cheol; Song, In Sang; Kim, Ji Hoon; Yun, Sung Su; Choi, Seong Ho; Jan, Yi-Yin; Shan, Yan-Shen; Ker, Chen-Guo; Chan, De-Chuan; Wu, Cheng-Chung; Lee, King-Teh; Toyota, Naoyuki; Higuchi, Ryota; Nakamura, Yoshiharu; Mizuguchi, Yoshiaki; Takeda, Yutaka; Ito, Masahiro; Norimizu, Shinji; Yamada, Shigetoshi; Matsumura, Naoki; Shindoh, Junichi; Sunagawa, Hiroki; Gocho, Takeshi; Hasegawa, Hiroshi; Rikiyama, Toshiki; Sata, Naohiro; Kano, Nobuyasu; Kitano, Seigo; Tokumura, Hiromi; Yamashita, Yuichi; Watanabe, Goro; Nakagawa, Kunitoshi; Kimura, Taizo; Yamakawa, Tatsuo; Wakabayashi, Go; Mori, Rintaro; Endo, Itaru; Miyazaki, Masaru; Yamamoto, Masakazu

2017-04-01

We previously identified 25 intraoperative findings during laparoscopic cholecystectomy (LC) as potential indicators of surgical difficulty per nominal group technique. This study aimed to build a consensus among expert LC surgeons on the impact of each item on surgical difficulty. Surgeons from Japan, Korea, and Taiwan (n = 554) participated in a Delphi process and graded the 25 items on a seven-stage scale (range, 0-6). Consensus was defined as (1) the interquartile range (IQR) of overall responses ≤2 and (2) ≥66% of the responses concentrated within a median ± 1 after stratification by workplace and LC experience level. Response rates for the first and the second-round Delphi were 92.6% and 90.3%, respectively. Final consensus was reached for all the 25 items. 'Diffuse scarring in the Calot's triangle area' in the 'Factors related to inflammation of the gallbladder' category had the strongest impact on surgical difficulty (median, 5; IQR, 1). Surgeons agreed that the surgical difficulty increases as more fibrotic change and scarring develop. The median point for each item was set as the difficulty score. A Delphi consensus was reached among expert LC surgeons on the impact of intraoperative findings on surgical difficulty. © 2017 Japanese Society of Hepato-Biliary-Pancreatic Surgery.
The Effects of Item Format and Cognitive Domain on Students' Science Performance in TIMSS 2011

NASA Astrophysics Data System (ADS)

Liou, Pey-Yan; Bulut, Okan

2017-12-01

The purpose of this study was to examine eighth-grade students' science performance in terms of two test design components, item format, and cognitive domain. The portion of Taiwanese data came from the 2011 administration of the Trends in International Mathematics and Science Study (TIMSS), one of the major international large-scale assessments in science. The item difficulty analysis was initially applied to show the proportion of correct items. A regression-based cumulative link mixed modeling (CLMM) approach was further utilized to estimate the impact of item format, cognitive domain, and their interaction on the students' science scores. The results of the proportion-correct statistics showed that constructed-response items were more difficult than multiple-choice items, and that the reasoning cognitive domain items were more difficult compared to the items in the applying and knowing domains. In terms of the CLMM results, students tended to obtain higher scores when answering constructed-response items as well as items in the applying cognitive domain. When the two predictors and the interaction term were included together, the directions and magnitudes of the predictors on student science performance changed substantially. Plausible explanations for the complex nature of the effects of the two test-design predictors on student science performance are discussed. The results provide practical, empirical-based evidence for test developers, teachers, and stakeholders to be aware of the differential function of item format, cognitive domain, and their interaction in students' science performance.
The development of a computer assisted instruction and assessment system in pharmacology.

PubMed

Madsen, B W; Bell, R C

1977-01-01

We describe the construction of a computer based system for instruction and assessment in pharmacology, utilizing a large bank of multiple choice questions. Items were collected from many sources, edited and coded for student suitability, topic, taxonomy and difficulty and text references. Students reserve a time during the day, specify the type of test desired and questions are presented randomly from the subset satisfying their criteria. Answers are scored after each question and a summary given at the end of every test; details on item performance are recorded automatically. The biggest hurdle in implementation was the assembly, review, classification and editing of items, while the programming was relatively straight-forward. A number of modifications had to be made to the initial plans and changes will undoubtedly continue with further experience. When fully operational the system will possess a number of advantages including: elimination of test preparation, editing and marking; facilitated item review opportunities; increased objectivity, feedback, flexibility and descreased anxiety in students.
Application of Rasch Measurement to a Measure of Musical Performance.

ERIC Educational Resources Information Center

Haley, Kathleen A.

1999-01-01

Describes the Rasch calibration of a portion of the Watkins Farnum Performance Scale (J. Watkins and S. Farnum, 1954), a test of instructional music performance, for 218 sixth graders. Results show how Rasch scaling allows item difficulties to be estimated, the test to be administered more efficiently, and diagnostic information to be obtained.…
Language Games and Meaning as Used in Student Encounters with Scientific Literacy Test Items

ERIC Educational Resources Information Center

Serder, Margareta; Jakobsson, Anders

2016-01-01

Previous research in science education has suggested that difficulties among students learning science relate to challenges in framing its discourse. This article examines the role that language plays in a scientific literacy test for which everyday life is an augmented aspect. Video-recorded data was collected in four ninth-grade science classes…
Reexamining Elicited Imitation as a Measure of Implicit Grammatical Knowledge and Beyond…?

ERIC Educational Resources Information Center

Sarandi, Hedayat

2015-01-01

This study examines elicited imitation (EI) both as a measure of implicit grammatical knowledge and more global semantic and syntactic knowledge. It also examines whether length affects the difficulty of EI tests when they contain both grammatical and ungrammatical items. Fifty language learners took an EI test and an oral narrative task. The data…
Math: Data Relationships. Graphs, Ratios and Proportions, Statistics and Probability. Grades K-9. Revised Edition.

ERIC Educational Resources Information Center

Instructional Objectives Exchange, Los Angeles, CA.

To help classroom teachers in grades K-9 construct mathematics tests, fifteen general objectives, corresponding sub-objectives, sample test items, and answers are presented. In general, sub-objectives are arranged in increasing order of difficulty. The objectives were written to comprehensively cover three categories. The first, graphs, covers the…
A Comparison of IRT Proficiency Estimation Methods under Adaptive Multistage Testing

ERIC Educational Resources Information Center

Kim, Sooyeon; Moses, Tim; Yoo, Hanwook

2015-01-01

This inquiry is an investigation of item response theory (IRT) proficiency estimators' accuracy under multistage testing (MST). We chose a two-stage MST design that includes four modules (one at Stage 1, three at Stage 2) and three difficulty paths (low, middle, high). We assembled various two-stage MST panels (i.e., forms) by manipulating two…
Detecting Different Types of Reading Difficulties: A Comparison of Tests

ERIC Educational Resources Information Center

Moore, Danielle M.; Porter, Melanie A.; Kohnen, Saskia; Castles, Anne

2012-01-01

The focus of this paper is on the assessment of the two main processes that children must acquire at the single word reading level: word recognition (lexical) and decoding (nonlexical) skills. Guided by the framework of the dual route model, this study aimed to (1) investigate the impact of item characteristics on test performance, and (2)…
Psychometrics of Multiple Choice Questions with Non-Functioning Distracters: Implications to Medical Education.

PubMed

Deepak, Kishore K; Al-Umran, Khalid Umran; AI-Sheikh, Mona H; Dkoli, B V; Al-Rubaish, Abdullah

2015-01-01

The functionality of distracters in a multiple choice question plays a very important role. We examined the frequency and impact of functioning and non-functioning distracters on psychometric properties of 5-option items in clinical disciplines. We analyzed item statistics of 1115 multiple choice questions from 15 summative assessments of undergraduate medical students and classified the items into five groups by their number of non-functioning distracters. We analyzed the effect of varying degree of non-functionality ranging from 0 to 4, on test reliability, difficulty index, discrimination index and point biserial correlation. The non-functionality of distracters inversely affected the test reliability and quality of items in a predictable manner. The non-functioning distracters made the items easier and lowered the discrimination index significantly. Three non-functional distracters in a 5-option MCQ significantly affected all psychometric properties (p < 0.5). The corrected point biserial correlation revealed that the items with 3 functional options were psychometrically as effective as 5-option items. Our study reveals that a multiple choice question with 3 functional options provides lower most limit of item format that has adequate psychometric property. The test containing items with less number of functioning options have significantly lower reliability. The distracter function analysis and revision of nonfunctioning distracters can serve as important methods to improve the psychometrics and reliability of assessment.
Test Score Equating Using a Mini-Version Anchor and a Midi Anchor: A Case Study Using SAT[R] Data

ERIC Educational Resources Information Center

Liu, Jinghua; Sinharay, Sandip; Holland, Paul W.; Curley, Edward; Feigenbaum, Miriam

2011-01-01

This study explores an anchor that is different from the traditional miniature anchor in test score equating. In contrast to a traditional "mini" anchor that has the same spread of item difficulties as the tests to be equated, the studied anchor, referred to as a "midi" anchor (Sinharay & Holland), has a smaller spread of…
Development and Validation of the Numeracy Understanding in Medicine Instrument Short Form

PubMed Central

Schapira, Marilyn M.; Walker, Cindy M.; Miller, Tamara; Fletcher, Kathlyn A; Ganschow, Pamela G.; Jacobs, Elizabeth A; Imbert, Diana; O'Connell, Maria; Neuner, Joan M.

2014-01-01

Background Health numeracy can be defined as the ability to understand and use numeric information and quantitative concepts in the context of health. We previously reported the development of the Numeracy Understanding in Medicine Instrument (NUMi); a 20-item test developed using item response theory. We now report the development and validation of a short form of the NUMi. Methods Item statistics were used to identify a subset of 8-items representing a range of difficulty and content areas. Internal reliability was evaluated with Cronbach's alpha. Divergent and convergent validity was assessed by comparing scores of the S-NUMI with existing measures of education, print and numeric health literacy, mathematic achievement, cognitive reasoning, and the original NUMi. Results The 8-item scale had adequate reliability (Cronbach's alpha: 0.72) and was strongly correlated to the 20-item NUMi (0.92). The S-NUMi scores were strongly correlated with the Lipkus numeracy test (0.62), Wide Range of Achievement Test-Mathematics (WRAT-M) (0.72), and Wonderlic cognitive reasoning test (0.76). Moderate correlation was found with education level (0.58) and print literacy as measured by the TOFHLA (0.49). Conclusion The short Numeracy Understanding in Medicine Instrument is a reliable and valid measure of health numeracy feasible for use in clinical and research settings. PMID:25315596

Psychometric properties of the Global Operative Assessment of Laparoscopic Skills (GOALS) using item response theory.

PubMed

Watanabe, Yusuke; Madani, Amin; Ito, Yoichi M; Bilgic, Elif; McKendy, Katherine M; Feldman, Liane S; Fried, Gerald M; Vassiliou, Melina C

2017-02-01

The extent to which each item assessed using the Global Operative Assessment of Laparoscopic Skills (GOALS) contributes to the total score remains unknown. The purpose of this study was to evaluate the level of difficulty and discriminative ability of each of the 5 GOALS items using item response theory (IRT). A total of 396 GOALS assessments for a variety of laparoscopic procedures over a 12-year time period were included. Threshold parameters of item difficulty and discrimination power were estimated for each item using IRT. The higher slope parameters seen with "bimanual dexterity" and "efficiency" are indicative of greater discriminative ability than "depth perception", "tissue handling", and "autonomy". IRT psychometric analysis indicates that the 5 GOALS items do not demonstrate uniform difficulty and discriminative power, suggesting that they should not be scored equally. "Bimanual dexterity" and "efficiency" seem to have stronger discrimination. Weighted scores based on these findings could improve the accuracy of assessing individual laparoscopic skills. Copyright © 2016 Elsevier Inc. All rights reserved.
The Caregiver Contribution to Heart Failure Self-Care (CACHS): Further Psychometric Testing of a Novel Instrument.

PubMed

Buck, Harleah G; Harkness, Karen; Ali, Muhammad Usman; Carroll, Sandra L; Kryworuchko, Jennifer; McGillion, Michael

2017-04-01

Caregivers (CGs) contribute important assistance with heart failure (HF) self-care, including daily maintenance, symptom monitoring, and management. Until CGs' contributions to self-care can be quantified, it is impossible to characterize it, account for its impact on patient outcomes, or perform meaningful cost analyses. The purpose of this study was to conduct psychometric testing and item reduction on the recently developed 34-item Caregiver Contribution to Heart Failure Self-care (CACHS) instrument using classical and item response theory methods. Fifty CGs (mean age 63 years ±12.84; 70% female) recruited from a HF clinic completed the CACHS in 2014 and results evaluated using classical test theory and item response theory. Items would be deleted for low (<.05) or high (>.95) endorsement, low (<.3) or high (>.7) corrected item-total correlations, significant pairwise correlation coefficients, floor or ceiling effects, relatively low latent trait and item information function levels (<1.5 and p > .5), and differential item functioning. After analysis, 14 items were excluded, resulting in a 20-item instrument (self-care maintenance eight items; monitoring seven items; and management five items). Most items demonstrated moderate to high discrimination (median 2.13, minimum .77, maximum 5.05), and appropriate item difficulty (-2.7 to 1.4). Internal consistency reliability was excellent (Cronbach α = .94, average inter-item correlation = .41) with no ceiling effects. The newly developed 20-item version of the CACHS is supported by rigorous instrument development and represents a novel instrument to measure CGs' contribution to HF self-care. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
FIM-Minimum Data Set Motor Item Bank: Short Forms Development and Precision Comparison in Veterans.

PubMed

Li, Chih-Ying; Romero, Sergio; Simpson, Annie N; Bonilha, Heather S; Simpson, Kit N; Hong, Ickpyo; Velozo, Craig A

2018-03-01

To improve the practical use of the short forms (SFs) developed from the item bank, we compared the measurement precision of the 4- and 8-item SFs generated from a motor item bank composed of the FIM and the Minimum Data Set (MDS). The FIM-MDS motor item bank allowed scores generated from different instruments to be co-calibrated. The 4- and 8-item SFs were developed based on Rasch analysis procedures. This article compared person strata, ceiling/floor effects, and test SE plots for each administration form and examined 95% confidence interval error bands of anchored person measures with the corresponding SFs. We used 0.3 SE as a criterion to reflect a reliability level of .90. Veterans' inpatient rehabilitation facilities and community living centers. Veterans (N=2500) who had both FIM and the MDS data within 6 days during 2008 through 2010. Not applicable. Four- and 8-item SFs of FIM, MDS, and FIM-MDS motor item bank. Six SFs were generated with 4 and 8 items across a range of difficulty levels from the FIM-MDS motor item bank. The three 8-item SFs all had higher correlations with the item bank (r=.82-.95), higher person strata, and less test error than the corresponding 4-item SFs (r=.80-.90). The three 4-item SFs did not meet the criteria of SE <0.3 for any theta values. Eight-item SFs could improve clinical use of the item bank composed of existing instruments across the continuum of care in veterans. We also found that the number of items, not test specificity, determines the precision of the instrument. Copyright © 2017 American Congress of Rehabilitation Medicine. All rights reserved.
Factor and Rasch analysis of the Fonseca anamnestic index for the diagnosis of myogenous temporomandibular disorder.

PubMed

Rodrigues-Bigaton, Delaine; de Castro, Ester M; Pires, Paulo F

Rasch analysis has been used in recent studies to test the psychometric properties of a questionnaire. The conditions for use of the Rasch model are one-dimensionality (assessed via prior factor analysis) and local independence (the probability of getting a particular item right or wrong should not be conditioned upon success or failure in another). To evaluate the dimensionality and the psychometric properties of the Fonseca anamnestic index (FAI), such as the fit of the data to the model, the degree of difficulty of the items, and the ability to respond in patients with myogenous temporomandibular disorder (TMD). The sample consisted of 94 women with myogenous TMD, diagnosed by the Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD), who answered the FAI. For the factor analysis, we applied the Kaiser-Meyer-Olkin test, Bartlett's sphericity, Spearman's correlation, and the determinant of the correlation matrix. For extraction of the factors/dimensions, an eigenvalue >1.0 was used, followed by oblique oblimin rotation. The Rasch analysis was conducted on the dimension that showed the highest proportion of variance explained. Adequate sample "n" and FAI multidimensionality were observed. Dimension 1 (primary) consisted of items 1, 2, 3, 6, and 7. All items of dimension 1 showed adequate fit to the model, being observed according to the degree of difficulty (from most difficult to easiest), respectively, items 2, 1, 3, 6, and 7. The FAI presented multidimensionality with its main dimension consisting of five reliable items with adequate fit to the composition of its structure. Copyright © 2017 Associação Brasileira de Pesquisa e Pós-Graduação em Fisioterapia. Publicado por Elsevier Editora Ltda. All rights reserved.
Development of the Consumer Refrigerator Safety Questionnaire: A Measure of Consumer Perceptions and Practices.

PubMed

Cairnduff, Victoria; Dean, Moira; Koidis, Anastasios

2016-09-01

Food preparation and storage behaviors in the home deviating from the "best practice" food safety recommendations may result in foodborne illnesses. Currently, there are limited tools available to fully evaluate the consumer knowledge, perceptions, and behavior in the area of refrigerator safety. The current study aimed to develop a valid and reliable tool in the form of a questionnaire, the Consumer Refrigerator Safety Questionnaire (CRSQ), for assessing systematically all these aspects. Items relating to refrigerator safety knowledge (n =17), perceptions (n =46), and reported behavior (n =30) were developed and pilot tested by an expert reference group and various consumer groups to assess face and content validity (n =20), item difficulty and consistency (n =55), and construct validity (n =23). The findings showed that the CRSQ has acceptable face and content validity with acceptable levels of item difficulty. Item consistency was observed for 12 of 15 in refrigerator safety knowledge. Further, all 5 of the subscales of consumer perceptions of refrigerator safety practices relating to risk of developing foodborne disease showed acceptable internal consistency (Cronbach's α value > 0.8). Construct validity of the CRSQ was shown to be very good (P = 0.022). The CRSQ exhibited acceptable test-retest reliability at 14 days with the majority of knowledge items (93.3%) and reported behavior items (96.4%) having correlation coefficients of greater than 0.70. Overall, the CRSQ was deemed valid and reliable in assessing refrigerator safety knowledge and behavior; therefore, it has the potential for future use in identifying groups of individuals at increased risk of deviating from recommended refrigerator safety practices, as well as the assessment of refrigerator safety knowledge and behavior for use before and after an intervention.
Incorporation of core competency questions into an annual national self-assessment examination for residents in physical medicine and rehabilitation: results and implications.

PubMed

Webster, Joseph B

2009-03-01

To determine the performance and change over time when incorporating questions in the core competency domains of practice-based learning and improvement (PBLI), systems-based practice (SBP), and professionalism (PROF) into the national PM&R Self-Assessment Examination for Residents (SAER). Prospective, longitudinal analysis. The national Self-Assessment Examination for Residents (SAER) in Physical Medicine and Rehabilitation, which is administered annually. Approximately 1100 PM&R residents who take the examination annually. Inclusion of progressively more challenging questions in the core competency domains of PBLI, SBP, and PROF. Individual test item level of difficulty (P value) and discrimination (point biserial index). Compared with the overall test, questions in the subtopic areas of PBLI, SBP, and PROF were relatively easier and less discriminating (correlation of resident performance on these domains compared with that on the total test). These differences became smaller during the 3-year time period. The difficulty level of the questions in each of the subtopic domains was raised during the 3 year period to a level close to the overall exam. Discrimination of the test items improved or remained stable. This study demonstrates that, with careful item writing and review, multiple-choice items in the PBLI, SBP, and PROF domains can be successfully incorporated into an annual, national self-assessment examination for residents. The addition of these questions had value in assessing competency while not compromising the overall validity and reliability of the exam. It is yet to be determined if resident performance on these questions corresponds to performance on other measures of competency in the areas of PBLI, SBP, and PROF.
An assessment of functioning and non-functioning distractors in multiple-choice questions: a descriptive analysis.

PubMed

Tarrant, Marie; Ware, James; Mohammed, Ahmed M

2009-07-07

Four- or five-option multiple choice questions (MCQs) are the standard in health-science disciplines, both on certification-level examinations and on in-house developed tests. Previous research has shown, however, that few MCQs have three or four functioning distractors. The purpose of this study was to investigate non-functioning distractors in teacher-developed tests in one nursing program in an English-language university in Hong Kong. Using item-analysis data, we assessed the proportion of non-functioning distractors on a sample of seven test papers administered to undergraduate nursing students. A total of 514 items were reviewed, including 2056 options (1542 distractors and 514 correct responses). Non-functioning options were defined as ones that were chosen by fewer than 5% of examinees and those with a positive option discrimination statistic. The proportion of items containing 0, 1, 2, and 3 functioning distractors was 12.3%, 34.8%, 39.1%, and 13.8% respectively. Overall, items contained an average of 1.54 (SD = 0.88) functioning distractors. Only 52.2% (n = 805) of all distractors were functioning effectively and 10.2% (n = 158) had a choice frequency of 0. Items with more functioning distractors were more difficult and more discriminating. The low frequency of items with three functioning distractors in the four-option items in this study suggests that teachers have difficulty developing plausible distractors for most MCQs. Test items should consist of as many options as is feasible given the item content and the number of plausible distractors; in most cases this would be three. Item analysis results can be used to identify and remove non-functioning distractors from MCQs that have been used in previous tests.
Ubiquitous testing using tablets: its impact on medical student perceptions of and engagement in learning

PubMed Central

Kim, Kyong-Jee; Hwang, Jee-Young

2016-01-01

Purpose: Ubiquitous testing has the potential to affect medical education by enhancing the authenticity of the assessment using multimedia items. This study explored medical students’ experience with ubiquitous testing and its impact on student learning. Methods: A cohort (n=48) of third-year students at a medical school in South Korea participated in this study. The students were divided into two groups and were given different versions of 10 content-matched items: one in text version (the text group) and the other in multimedia version (the multimedia group). Multimedia items were delivered using tablets. Item response analyses were performed to compare item characteristics between the two versions. Additionally, focus group interviews were held to investigate the students’ experiences of ubiquitous testing. Results: The mean test score was significantly higher in the text group. Item difficulty and discrimination did not differ between text and multimedia items. The participants generally showed positive responses on ubiquitous testing. Still, they felt that the lectures that they had taken in preclinical years did not prepare them enough for this type of assessment and clinical encounters during clerkships were more helpful. To be better prepared, the participants felt that they needed to engage more actively in learning in clinical clerkships and have more access to multimedia learning resources. Conclusion: Ubiquitous testing can positively affect student learning by reinforcing the importance of being able to understand and apply knowledge in clinical contexts, which drives students to engage more actively in learning in clinical settings. PMID:26838569
Linguistic Determinants of the Difficulty of True-False Test Items

ERIC Educational Resources Information Center

Peterson, Candida C.; Peterson, James L.

1976-01-01

Adults read a prose passage and responded to passages based on it which were either true or false and were phrased either affirmatively or negatively. True negatives yielded most errors, followed in order by false negatives, true affirmatives, and false affirmatives. (Author/RC)
Physical performance testing in mucopolysaccharidosis I: a pilot study.

PubMed

Dumas, Helene M; Fragala, Maria A; Haley, Stephen M; Skrinar, Alison M; Wraith, James E; Cox, Gerald F

2004-01-01

To develop and field-test a physical performance measure (MPS-PPM) for individuals with Mucopolysaccharidosis I (MPS I), a rare genetic disorder. Motor performance and endurance items were developed based on literature review, clinician feedback, feasibility, and equipment and training needs. A standardized testing protocol and scoring rules were created. The MPS-PPM includes: Arm Function (7 items), Leg Function (5 items), and Endurance (2 items). Pilot data were collected for 10 subjects (ages 5-29 years). We calculated Spearman's rho correlations between age, severity and summary z-scores on the MPS-PPM. Subjects had variable presentations, as correlations among the three sub-test scores were not significant. Increasing age was related to greater severity in physical performance (r = 0.72, p<0.05) and lower scores on the Leg Function (r = -0.67, p<0.05) and Endurance (r = -0.65, p<0.05) sub-tests. The MPS-PPM was sensitive to detecting physical performance deficits, as six subjects could not complete the full battery of Arm Function items and eight subjects were unable to complete all Leg Function items. Subjects walked more slowly and expended more energy than typically developing peers. Individuals with MPS I have difficulty with arm and leg function and reduced endurance. The MPS-PPM is a clinically feasible measure that detects limitations in physical performance and may have potential to quantify changes in function following intervention. Copyright 2004 Taylor and Francis Ltd.
Reliability of self-rated tinnitus distress and association with psychological symptom patterns.

PubMed

Hiller, W; Goebel, G; Rief, W

1994-05-01

Psychological complaints were investigated in two samples of 60 and 138 in-patients suffering from chronic tinnitus. We administered the Tinnitus Questionnaire (TQ), a 52-item self-rating scale which differentiates between dimensions of emotional and cognitive distress, intrusiveness, auditory perceptual difficulties, sleep disturbances and somatic complaints. The test-retest reliability was .94 for the TQ global score and between .86 and .93 for subscales. Three independent analyses were conducted to estimate the split-half reliability (internal consistency) which was only slightly lower than the test-retest values for scales with a relatively small number of items. Reliability was sufficient also on the level of single items. Low correlation between the TQ and the Hopkins Symptom Checklist (SCL-90-R) indicate a distinct quality of tinnitus-related and general psychological disturbances.
Using Classical Test Theory and Item Response Theory to Evaluate the LSCI

NASA Astrophysics Data System (ADS)

Schlingman, Wayne M.; Prather, E. E.; Collaboration of Astronomy Teaching Scholars CATS

2011-01-01

Analyzing the data from the recent national study using the Light and Spectroscopy Concept Inventory (LSCI), this project uses both Classical Test Theory (CTT) and Item Response Theory (IRT) to investigate the LSCI itself in order to better understand what it is actually measuring. We use Classical Test Theory to form a framework of results that can be used to evaluate the effectiveness of individual questions at measuring differences in student understanding and provide further insight into the prior results presented from this data set. In the second phase of this research, we use Item Response Theory to form a theoretical model that generates parameters accounting for a student's ability, a question's difficulty, and estimate the level of guessing. The combined results from our investigations using both CTT and IRT are used to better understand the learning that is taking place in classrooms across the country. The analysis will also allow us to evaluate the effectiveness of individual questions and determine whether the item difficulties are appropriately matched to the abilities of the students in our data set. These results may require that some questions be revised, motivating the need for further development of the LSCI. This material is based upon work supported by the National Science Foundation under Grant No. 0715517, a CCLI Phase III Grant for the Collaboration of Astronomy Teaching Scholars (CATS). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Development and validity of a questionnaire to test the knowledge of primary care personnel regarding nutrition in obese adolescents.

PubMed

de Pinho, Lucinéia; Moura, Paulo Henrique Tolentino; Silveira, Marise Fagundes; de Botelho, Ana Cristina Carvalho; Caldeira, Antônio Prates

2013-07-18

In light of its epidemic proportions in developed and developing countries, obesity is considered a serious public health issue. In order to increase knowledge concerning the ability of health care professionals in caring for obese adolescents and adopt more efficient preventive and control measures, a questionnaire was developed and validated to assess non-dietitian health professionals regarding their Knowledge of Nutrition in Obese Adolescents (KNOA). The development and evaluation of a questionnaire to assess the knowledge of primary care practitioners with respect to nutrition in obese adolescents was carried out in five phases, as follows: 1) definition of study dimensions 2) development of 42 questions and preliminary evaluation of the questionnaire by a panel of experts; 3) characterization and selection of primary care practitioners (35 dietitians and 265 non-dietitians) and measurement of questionnaire criteria by contrasting the responses of dietitians and non-dietitians; 4) reliability assessment by question exclusion based on item difficulty (too easy and too difficult for non-dietitian practitioners), item discrimination, internal consistency and reproducibility index determination; and 5) scoring the completed questionnaires. Dietitians obtained higher scores than non-dietitians (Mann-Whitney U test, P < 0.05), confirming the validity of the questionnaire criteria. Items were discriminated by correlating the score for each item with the total score, using a minimum of 0.2 as a correlation coefficient cutoff value. Item difficulty was controlled by excluding questions answered correctly by more than 90% of the non-dietitian subjects (too easy) or by less than 10% of them (too difficult). The final questionnaire contained 26 of the original 42 questions, increasing Cronbach's α value from 0.788 to 0.807. Test-retest agreement between respondents was classified as good to very good (Kappa test, >0.60). The KNOA questionnaire developed for primary care practitioners is a valid, consistent and suitable instrument that can be applied over time, making it a promising tool for developing and guiding public health policies.
Why Are the Mathematics National Examination Items Difficult and What Is Teachers' Strategy to Overcome It?

ERIC Educational Resources Information Center

Retnawati, Heri; Kartowagiran, Badrun; Arlinwibowo, Janu; Sulistyaningsih, Eny

2017-01-01

The quality of national examination items plays an enormous role in identifying students' competencies mastery and their difficulties. This study aims to identify the difficult items in the Junior High School Mathematics National Examination, to find the factors that cause students' difficulty and to reveal the strategies that the teachers and the…
Psychometric properties of the NEPSY-II affect recognition subtest in a preschool sample: a Rasch modeling approach.

PubMed

Yao, Shih-Ying; Bull, Rebecca; Khng, Kiat Hui; Rahim, Anisa

2018-01-01

Understanding a child's ability to decode emotion expressions is important to allow early interventions for potential difficulties in social and emotional functioning. This study applied the Rasch model to investigate the psychometric properties of the NEPSY-II Affect Recognition subtest, a U.S. normed measure for 3-16 year olds which assesses the ability to recognize facial expressions of emotion. Data were collected from 1222 children attending preschools in Singapore. We first performed the Rasch analysis with the raw item data, and examined the technical qualities and difficulty pattern of the studied items. We subsequently investigated the relation of the estimated affect recognition ability from the Rasch analysis to a teacher-reported measure of a child's behaviors, emotions, and relationships. Potential gender differences were also examined. The Rasch model fits our data well. Also, the NEPSY-II Affect Recognition subtest was found to have reasonable technical qualities, expected item difficulty pattern, and desired association with the external measure of children's behaviors, emotions, and relationships for both boys and girls. Overall, findings from this study suggest that the NEPSY-II Affect Recognition subtest is a promising measure of young children's affect recognition ability. Suggestions for future test improvement and research were discussed.
Normative Performance on the Brief Smell Identification Test (BSIT) in a Multi-Ethnic Bilingual Cohort: A Project FRONTIER Study

PubMed Central

Menon, Chloe; Westervelt, Holly James; Jahn, Danielle R.; Dressel, Jeffrey A.; O’Bryant, Sid E.

2013-01-01

The Brief Smell Identification Test (BSIT) is a commonly used measure of olfactory functioning in elderly populations. Few studies have provided normative data for this measure, and minimal data are available regarding the impact of sociodemographic factors on test scores. This study presents normative data for the BSIT in a sample of English- and Spanish-speaking Hispanic and non-Hispanic Whites. A Rasch analysis was also conducted to identify the items that best discriminated between varying levels of olfactory functioning, as measured by the BSIT. The total sample included 302 older adults seen as part of an ongoing study of rural cognitive aging, Project FRONTIER. Hierarchical regression analyses revealed that BSIT scores require adjustment by age and gender, but years of education, ethnicity, and language did not significantly influence BSIT performance. Four items best discriminated between varying levels of smell identification, accounting for 59.44% of total information provided by the measure. However, items did not represent a continuum of difficulty on the BSIT. The results of this study indicate that the BSIT appears to be well-suited for assessing odor identification deficits in older adults of diverse backgrounds, but that fine-tuning of this instrument may be recommended in light of its items’ difficulty and discrimination parameters. Clinical and empirical implications are discussed. PMID:23634698
Validation of the Malay Version of the Parental Bonding Instrument among Malaysian Youths Using Exploratory Factor Analysis.

PubMed

Muhammad, Noor Azimah; Shamsuddin, Khadijah; Omar, Khairani; Shah, Shamsul Azhar; Mohd Amin, Rahmah

2014-01-01

Parenting behaviour is culturally sensitive. The aims of this study were (1) to translate the Parental Bonding Instrument into Malay (PBI-M) and (2) to determine its factorial structure and validity among the Malaysian population. The PBI-M was generated from a standard translation process and comprehension testing. The validation study of the PBI-M was administered to 248 college students aged 18 to 22 years. Participants in the comprehension testing had difficulty understanding negative items. Five translated double negative items were replaced with five positive items with similar meanings. Exploratory factor analysis showed a three-factor model for the PBI-M with acceptable reliability. Four negative items (items 3, 4, 8, and 16) and item 19 were omitted from the final PBI-M list because of incorrect placement or low factor loading (< 0.32). Out of the final 20 items of the PBI-M, there were 10 items for the care factor, five items for the autonomy factor and five items for the overprotection factor. All the items loaded positively on their respective factors. The Malaysian population favoured positive items in answering questions. The PBI-M confirmed the three-factor model that consisted of care, autonomy and overprotection. The PBI-M is a valid and reliable instrument to assess the Malaysian parenting style. Confirmatory factor analysis may further support this finding. Malaysia, parenting, questionnaire, validity.
Development of the PROMIS coping expectancies of smoking item banks.

PubMed

Shadel, William G; Edelen, Maria Orlando; Tucker, Joan S; Stucky, Brian D; Hansen, Mark; Cai, Li

2014-09-01

Smoking is a coping strategy for many smokers who then have difficulty finding new ways to cope with negative affect when they quit. This paper describes analyses conducted to develop and evaluate item banks for assessing the coping expectancies of smoking for daily and nondaily smokers. Using data from a large sample of daily (N = 4,201) and nondaily (N = 1,183) smokers, we conducted a series of item factor analyses, item response theory analyses, and differential item functioning (DIF) analyses (according to gender, age, and ethnicity) to arrive at a unidimensional set of items for daily and nondaily smokers. We also evaluated performance of short forms (SFs) and computer adaptive tests (CATs) for assessing coping expectancies of smoking. For both daily and nondaily smokers, the unidimensional Coping Expectancies item banks (21 items) are relatively DIF free and are highly reliable (0.96 and 0.97, respectively). A common 4-item SF for daily and nondaily smokers also showed good reliability (0.85). Adaptive tests required an average of 4.3 and 3.7 items for simulated daily and nondaily respondents, respectively, and achieved reliabilities of 0.91 for both when the maximum test length was 10 items. This research provides a new set of items that can be used to reliably assess coping expectancies of smoking, through a SF, CAT, or a tailored set selected for a specific research purpose. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Sentence comprehension in specific language impairment: a task designed to distinguish between cognitive capacity and syntactic complexity.

PubMed

Leonard, Laurence B; Deevy, Patricia; Fey, Marc E; Bredin-Oja, Shelley L

2013-04-01

This study examined sentence comprehension in children with specific language impairment (SLI) in a manner designed to separate the contribution of cognitive capacity from the effects of syntactic structure. Nineteen children with SLI, 19 typically developing children matched for age (TD-A), and 19 younger typically developing children (TD-Y) matched according to sentence comprehension test scores responded to sentence comprehension items that varied in either length or their demands on cognitive capacity, based on the nature of the foils competing with the target picture. The TD-A children were accurate across all item types. The SLI and TD-Y groups were less accurate than the TD-A group on items with greater length and, especially, on items with the greatest demands on cognitive capacity. The types of errors were consistent with failure to retain details of the sentence apart from syntactic structure. The difficulty in the more demanding conditions seemed attributable to interference. Specifically, the children with SLI and the TD-Y children appeared to have difficulty retaining details of the target sentence when the information reflected in the foils closely resembled the information in the target sentence.
Utilizing the Zero-One Linear Programming Constraints to Draw Multiple Sets of Matched Samples from a Non-Treatment Population as Control Groups for the Quasi-Experimental Design

ERIC Educational Resources Information Center

Li, Yuan H.; Yang, Yu N.; Tompkins, Leroy J.; Modarresi, Shahpar

2005-01-01

The statistical technique, "Zero-One Linear Programming," that has successfully been used to create multiple tests with similar characteristics (e.g., item difficulties, test information and test specifications) in the area of educational measurement, was deemed to be a suitable method for creating multiple sets of matched samples to be…

Calculator Use on the "GRE"® Revised General Test Quantitative Reasoning Measure. ETS GRE® Board Research Report. ETS GRE®-14-02. ETS Research Report. RR-14-25

ERIC Educational Resources Information Center

Attali, Yigal

2014-01-01

Previous research on calculator use in standardized assessments of quantitative ability focused on the effect of calculator availability on item difficulty and on whether test developers can predict these effects. With the introduction of an on-screen calculator on the Quantitative Reasoning measure of the "GRE"® revised General Test, it…
Assessing the Unidimensionality of the School and College Ability Test (SCAT, Spanish Version) Using Non-Parametric Methods Based on Item Response Theory

ERIC Educational Resources Information Center

Touron, Javier; Lizasoain, Luis; Joaristi, Luis

2012-01-01

The aim of this work is to analyze the dimensional structure of the Spanish version of the School and College Ability Test, employed in the process for the identification of students with high intellectual abilities. This test measures verbal and mathematical (or quantitative) abilities at three levels of difficulty: elementary (3rd, 4th, and 5th…
Sources of Interactional Problems in a Survey of Racial/Ethnic Discrimination

PubMed Central

Johnson, Timothy P.; Shariff-Marco, Salma; Willis, Gordon; Cho, Young Ik; Breen, Nancy; Gee, Gilbert C.; Krieger, Nancy; Grant, David; Alegria, Margarita; Mays, Vickie M.; Williams, David R.; Landrine, Hope; Liu, Benmei; Reeve, Bryce B.; Takeuchi, David; Ponce, Ninez A.

2014-01-01

Cross-cultural variability in respondent processing of survey questions may bias results from multiethnic samples. We analyzed behavior codes, which identify difficulties in the interactions of respondents and interviewers, from a discrimination module contained within a field test of the 2007 California Health Interview Survey. In all, 553 (English) telephone interviews yielded 13,999 interactions involving 22 items. Multilevel logistic regression modeling revealed that respondent age and several item characteristics (response format, customized questions, length, and first item with new response format), but not race/ethnicity, were associated with interactional problems. These findings suggest that item function within a multi-cultural, albeit English language, survey may be largely influenced by question features, as opposed to respondent characteristics such as race/ethnicity. PMID:26166949
Detecting unexpected variables in the MMPI 2 Social Introversion scale.

PubMed

Chang, C H; Wright, B D

2001-01-01

The standard scoring structure of the revised Minnesota Multiphasic Personality Inventory (MMPI-2) Social Introversion (Si) scale was reexamined with Rasch Measurement. The 69-item Si scale split into two distinct dimensions when their standardized residuals were factor analyzed. Items keyed "true" to Si defined one dimension and items keyed "false" defined another. Relationships between Lexile values (an index of reading difficulty and comprehension) and item difficulties were also explored. The article shows how to use Rasch Measurement to understand and improve personality assessment.
Who was that masked man? Conjoint representations of intrinsic motions with actor appearance.

PubMed

Kersten, Alan W; Earles, Julie L; Negri, Leehe

2018-09-01

Motion plays an important role in recognising animate creatures. This research supports a distinction between intrinsic and extrinsic motions in their relationship to identifying information about the characters performing the motions. Participants viewed events involving costumed human characters. Intrinsic motions involved relative movements of a character's body parts, whereas extrinsic motions involved movements with respect to external landmarks. Participants were later tested for recognition of the motions and who had performed them. The critical test items involved familiar characters performing motions that had previously been performed by other characters. Participants falsely recognised extrinsic conjunction items, in which characters followed the paths of other characters, more often than intrinsic conjunction items, in which characters moved in the manner of other characters. In contrast, participants falsely recognised new extrinsic motions less often than new intrinsic motions, suggesting that they remembered extrinsic motions but had difficulty remembering who had performed them. Modelling of receiver operating characteristics indicated that participants discriminated old items from intrinsic conjunction items via familiarity, consistent with conjoint representations of intrinsic motion and identity information. In contrast, participants used recollection to distinguish old items from extrinsic conjunction items, consistent with separate but associated representations of extrinsic motion and identity information.
Parietal cortex and episodic memory retrieval in schizophrenia.

PubMed

Lepage, Martin; Pelletier, Marc; Achim, Amélie; Montoya, Alonso; Menear, Matthew; Lal, Sam

2010-06-30

People with schizophrenia consistently show memory impairment on varying tasks including item recognition memory. Relative to the correct rejection of distracter items, the correct recognition of studied items consistently produces an effect termed the old/new effect that is characterized by increased activity in parietal and frontal cortical regions. This effect has received only scant attention in schizophrenia. We examined the old/new effect in 15 people with schizophrenia and 18 controls during an item recognition test, and neural activity was examined with event-related functional magnetic resonance imaging. Both groups performed equally well during the recognition test and showed increased activity in a left dorsolateral prefrontal region and in the precuneus bilaterally during the successful recognition of old items relative to the correct rejection of new items. The control group also exhibited increased activity in the dorsal left parietal cortex. This region has been implicated in the top-down modulation of memory which involves control processes that support memory-retrieval search, monitoring and verification. Although these processes may not be of paramount importance in item recognition memory performance, the present findings suggest that people with schizophrenia may have difficulty with such top-down modulation, a finding consistent with many other studies in information processing.
A new item response theory model to adjust data allowing examinee choice

PubMed Central

Costa, Marcelo Azevedo; Braga Oliveira, Rivert Paulo

2018-01-01

In a typical questionnaire testing situation, examinees are not allowed to choose which items they answer because of a technical issue in obtaining satisfactory statistical estimates of examinee ability and item difficulty. This paper introduces a new item response theory (IRT) model that incorporates information from a novel representation of questionnaire data using network analysis. Three scenarios in which examinees select a subset of items were simulated. In the first scenario, the assumptions required to apply the standard Rasch model are met, thus establishing a reference for parameter accuracy. The second and third scenarios include five increasing levels of violating those assumptions. The results show substantial improvements over the standard model in item parameter recovery. Furthermore, the accuracy was closer to the reference in almost every evaluated scenario. To the best of our knowledge, this is the first proposal to obtain satisfactory IRT statistical estimates in the last two scenarios. PMID:29389996
Development of the Serenity Scale.

PubMed

Roberts, K T; Aspy, C B

1993-01-01

Serenity is a sustained inner peace. Nurses can use knowledge about serenity to help clients cope with harsh circumstances. The Serenity Scale is a 40-item self-report, summated scale that evaluates clients' serenity status. Critical attributes, identified by serenity experts, served as the theoretical framework. Sixty-five items were given to 542 male and female subjects age 20 to 95 (73% Caucasians and 27% minority) from varying income and educational levels yielding an alpha of .93. Forty items (SS.V2) were extracted for further analysis. The alpha coefficient was .92 with item-to-total correlations ranging from .25 to .67. Item means ranged from 2.6-3.7 (grand mean = 3.4). A principal components factor analysis with varimax rotation revealed nine factors explaining 58.2% of the variance. Limitations are that SS.V2 has not been tested with an independent sample and subjects with low educational levels had difficulty with some items.
Study on the Automatic Detection Method and System of Multifunctional Hydrocephalus Shunt

NASA Astrophysics Data System (ADS)

Sun, Xuan; Wang, Guangzhen; Dong, Quancheng; Li, Yuzhong

2017-07-01

Aiming to the difficulty of micro pressure detection and the difficulty of micro flow control in the testing process of hydrocephalus shunt, the principle of the shunt performance detection was analyzed.In this study, the author analyzed the principle of several items of shunt performance detection,and used advanced micro pressure sensor and micro flow peristaltic pump to overcome the micro pressure detection and micro flow control technology.At the same time,This study also puted many common experimental projects integrated, and successfully developed the automatic detection system for a shunt performance detection function, to achieve a test with high precision, high efficiency and automation.
Psychometrics of the self-report safe driving behavior measure for older adults.

PubMed

Classen, Sherrilene; Wen, Pey-Shan; Velozo, Craig A; Bédard, Michel; Winter, Sandra M; Brumback, Babette; Lanford, Desiree N

2012-01-01

We investigated the psychometric properties of the 68-item Safe Driving Behavior Measure (SDBM) with 80 older drivers, 80 caregivers, and 2 evaluators from two sites. Using Rasch analysis, we examined unidimensionality and local dependence; rating scale; item- and person-level psychometrics; and item hierarchy of older drivers, caregivers, and driving evaluators who had completed the SDBM. The evidence suggested the SDBM is unidimensional, but pairs of items showed local dependency. Across the three rater groups, the data showed good person (≥3.4) and item (≥3.6) separation as well as good person (≥.93) and item reliability (≥.92). Cronbach's α was ≥.96, and few items were misfitting. Some of the items did not follow the hypothesized order of item difficulty. The SDBM classified the older drivers into six ability levels, but to fully calibrate the instrument it must be refined in terms of its items (e.g., item exclusion) and then tested among participants of lesser ability. Copyright © 2012 by the American Occupational Therapy Association, Inc.
Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures.

PubMed

Cappelleri, Joseph C; Jason Lundy, J; Hays, Ron D

2014-05-01

The US Food and Drug Administration's guidance for industry document on patient-reported outcomes (PRO) defines content validity as "the extent to which the instrument measures the concept of interest" (FDA, 2009, p. 12). According to Strauss and Smith (2009), construct validity "is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity" (p. 7). Hence, both qualitative and quantitative information are essential in evaluating the validity of measures. We review classical test theory and item response theory (IRT) approaches to evaluating PRO measures, including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized "difficulty" (severity) order of items is represented by observed responses. If a researcher has few qualitative data and wants to get preliminary information about the content validity of the instrument, then descriptive assessments using classical test theory should be the first step. As the sample size grows during subsequent stages of instrument development, confidence in the numerical estimates from Rasch and other IRT models (as well as those of classical test theory) would also grow. Classical test theory and IRT can be useful in providing a quantitative assessment of items and scales during the content-validity phase of PRO-measure development. Depending on the particular type of measure and the specific circumstances, the classical test theory and/or the IRT should be considered to help maximize the content validity of PRO measures. Copyright © 2014 Elsevier HS Journals, Inc. All rights reserved.
Comparison of self-reported pain intensity, sleeping difficulty, and treatment outcomes of patients with myofascial temporomandibular disorders by age group: a prospective outcome study.

PubMed

Karibe, Hiroyuki; Goddard, Greg; Shimazu, Kisaki; Kato, Yuichi; Warita-Naoi, Sachie; Kawakami, Tomomi

2014-12-11

Subjective symptoms of temporomandibular disorders (TMDs) have rarely been studied by age group. We aimed to compare self-reported pain intensity, sleeping difficulty, and treatment outcomes of patients with myofascial TMDs among three age groups. The study population included 179 consecutive patients (151 women and 28 men) who underwent comprehensive clinical examinations at a university-based orofacial pain center. They were classified into myofascial pain subgroups based on the Research Diagnostic Criteria for Temporomandibular Disorders. They were stratified by age group: M1, under 20 years; M2, 20-39 years; and M3, 40 years and older. The patients scored their pretreatment symptoms (first visit) and post-treatment symptoms (last visit) on a form composed of three items that assessed pain intensity and one item that assessed sleeping difficulty. Their treatment options (i.e., pharmacotherapy, physical therapy, and orthopedic appliances) and duration were recorded. All variables were compared between sexes in each group and between the age groups by using the Kruskal-Wallis test, the Mann-Whitney U test, the chi-square test, and analysis of variance (p < 0.05). No significant sex differences were found in any age group. Only sleeping difficulty was significantly different before treatment (p = 0.009). No significant differences were observed in the treatment options or treatment duration. After treatment, the intensity of jaw/face pain and headache and sleeping difficulty was significantly reduced in groups M2 and M3, but only the intensity of jaw/face pain was significantly decreased in group M1. The changes in the scores of pain intensity and sleeping difficulty were not different between the groups. Pain intensity does not differ by age group, but older patients with myofascial TMDs had greater sleeping difficulties. However, there were no differences between the age groups in the treatment outcomes. Clinicians should carefully consider the age-related characteristics of patients with myofascial TMDs when developing appropriate management strategies.
Analysis of Bilingual Children’s Performance on the English and Spanish Versions of the Woodcock-Muñoz Language Survey-R (WMLS-R)

PubMed Central

Sandilos, Lia E.; Lewis, Kandia; Komaroff, Eugene; Hammer, Carol Scheffner; Scarpino, Shelley E.; Lopez, Lisa; Rodriguez, Barbara; Goldstein, Brian

2015-01-01

The purpose of this study was to investigate the way in which items on the Woodcock-Muñoz Language Survey Revised (WMLS-R) Spanish and English versions function for bilingual children from different ethnic subgroups who speak different dialects of Spanish. Using data from a sample of 324 bilingual Hispanic families and their children living on the United States mainland, differential item functioning (DIF) was conducted to determine if test items in English and Spanish functioned differently for Mexican, Cuban, and Puerto Rican bilingual children. Data on child and parent language characteristics and children’s scores on Picture Vocabulary and Story Recall subtests in English and Spanish were collected. DIF was not detected for items on the Spanish subtests. Results revealed that some items on English subtests displayed statistically and practically significant DIF. The findings indicate that there are differences in the difficulty level of WMLS-R English-form test items depending on the examinees’ ethnic subgroup membership. This outcome suggests that test developers need to be mindful of potential differences in performance based on ethnic subgroup and dialect when developing standardized language assessments that may be administered to bilingual students. PMID:26705400
Which dimensions of disability does the HIV Disability Questionnaire (HDQ) measure? A factor analysis.

PubMed

O'Brien, Kelly K; Bayoumi, Ahmed M; Stratford, Paul; Solomon, Patricia

2015-01-01

To assess the dimensions of disability measured by the HIV Disability Questionnaire (HDQ), a newly developed 72-item self-administered questionnaire that describes the presence, severity and episodic nature of disability experienced by people living with HIV. We recruited adults living with HIV from hospital clinics, AIDS service organizations and a specialty hospital and administered the HDQ followed by a demographic questionnaire. We conducted an exploratory factor analysis using disability severity scores to determine the domains of disability in the HDQ. We used the following steps: (a) ensured correlations between items were >0.30 and <0.80; (b) conducted a principal components analysis to extract factors; (c) used the Scree Test and eigenvalue threshold >1.5 to determine the number of factors to retain; and d) used oblique rotation to simplify the factor loading matrix. We assigned items to factors based on factor loadings of >0.30. Of the 361 participants, 80% were men and 77% reported living with at least two concurrent health conditions in addition to HIV. The exploratory factor analysis suggested retaining six factors. Items related to symptoms and impairments loaded on three factors (physical [20 items], cognitive [3 items], and mental and emotional health [11 items]) and items related to worrying about the future, daily activities, and personal relationships loaded on three additional factors (uncertainty [14 items], difficulties with day-to-day activities [9 items], social inclusion [12 items]). The HDQ has six domains: physical symptoms and impairments; cognitive symptoms and impairments; mental and emotional health symptoms and impairments; uncertainty; difficulties with day-to-day activities and challenges to social inclusion. These domains establish the scoring structure for the dimensions of disability measured by the HDQ. Implications for Rehabilitation As individuals live longer and age with HIV, they may be living with the health-related consequences of HIV and concurrent health conditions, a concept that may be termed disability. Measuring disability is important to understand the impact of HIV and its comorbidities. The HIV Disability Questionnaire (HDQ) is a self-administered questionnaire developed to describe the presence, severity and episodic nature of disability experienced by people living with HIV. The HDQ is comprised of six domains of disability including: physical symptoms and impairments (20 items); cognitive symptoms and impairments (3 items); mental and emotional health symptoms and impairments (11 items); uncertainty (14 items); difficulties with day-to-day activities (9 items) and challenges to social inclusion (12 items). These domains represent the dimensions of disability measured by the HDQ. The HDQ is the first known HIV-specific disability measure for adults living with HIV. The HDQ may be used by clinicians and researchers to assess disability experienced by adults living with HIV.
Controlling Guessing Bias in the Dichotomous Rasch Model Applied to a Large-Scale, Vertically Scaled Testing Program

ERIC Educational Resources Information Center

Andrich, David; Marais, Ida; Humphry, Stephen Mark

2016-01-01

Recent research has shown how the statistical bias in Rasch model difficulty estimates induced by guessing in multiple-choice items can be eliminated. Using vertical scaling of a high-profile national reading test, it is shown that the dominant effect of removing such bias is a nonlinear change in the unit of scale across the continuum. The…
Assessing Validity of Measurement in Learning Disabilities Using Hierarchical Generalized Linear Modeling: The Roles of Anxiety and Motivation

ERIC Educational Resources Information Center

Sideridis, Georgios D.

2016-01-01

The purpose of the present studies was to test the hypothesis that the psychometric characteristics of ability scales may be significantly distorted if one accounts for emotional factors during test taking. Specifically, the present studies evaluate the effects of anxiety and motivation on the item difficulties of the Rasch model. In Study 1, the…
Assessing Children's Mathematical Knowledge: Social Class, Sex and Problem-Solving.

ERIC Educational Resources Information Center

Cooper, Barry; Dunne, Mairead

This book draws on the analysis of national curriculum test data from more than 600 children of 10-11 and 13-14 years of age, as well as in-depth interviews with 250 of these students, as they attempt to solve test problems, in order to explore the nature of the difficulties children experience with realistic items. It is shown, by comparing test…
The Effects of Different Types of Anchor Tests on Observed Score Equating. Research Report. ETS RR-09-41

ERIC Educational Resources Information Center

Liu, Jinghua; Sinharay, Sandip; Holland, Paul W.; Feigenbaum, Miriam; Curley, Edward

2009-01-01

This study explores the use of a different type of anchor, a "midi anchor", that has a smaller spread of item difficulties than the tests to be equated, and then contrasts its use with the use of a "mini anchor". The impact of different anchors on observed score equating were evaluated and compared with respect to systematic…
Rasch analysis for psychometric improvement of science attitude rating scales

NASA Astrophysics Data System (ADS)

Oon, Pey-Tee; Fan, Xitao

2017-04-01

Students' attitude towards science (SAS) is often a subject of investigation in science education research. Survey of rating scale is commonly used in the study of SAS. The present study illustrates how Rasch analysis can be used to provide psychometric information of SAS rating scales. The analyses were conducted on a 20-item SAS scale used in an existing dataset of The Trends in International Mathematics and Science Study (TIMSS) (2011). Data of all the eight-grade participants from Hong Kong and Singapore (N = 9942) were retrieved for analyses. Additional insights from Rasch analysis that are not commonly available from conventional test and item analyses were discussed, such as invariance measurement of SAS, unidimensionality of SAS construct, optimum utilization of SAS rating categories, and item difficulty hierarchy in the SAS scale. Recommendations on how TIMSS items on the measurement of SAS can be better designed were discussed. The study also highlights the importance of using Rasch estimates for statistical parametric tests (e.g. ANOVA, t-test) that are common in science education research for group comparisons.
Development and assessment of floor and ceiling items for the PROMIS physical function item bank

PubMed Central

2013-01-01

Introduction Disability and Physical Function (PF) outcome assessment has had limited ability to measure functional status at the floor (very poor functional abilities) or the ceiling (very high functional abilities). We sought to identify, develop and evaluate new floor and ceiling items to enable broader and more precise assessment of PF outcomes for the NIH Patient-Reported-Outcomes Measurement Information System (PROMIS). Methods We conducted two cross-sectional studies using NIH PROMIS item improvement protocols with expert review, participant survey and focus group methods. In Study 1, respondents with low PF abilities evaluated new floor items, and those with high PF abilities evaluated new ceiling items for clarity, importance and relevance. In Study 2, we compared difficulty ratings of new floor items by low functioning respondents and ceiling items by high functioning respondents to reference PROMIS PF-10 items. We used frequencies, percentages, means and standard deviations to analyze the data. Results In Study 1, low (n = 84) and high (n = 90) functioning respondents were mostly White, women, 70 years old, with some college, and disability scores of 0.62 and 0.30. More than 90% of the 31 new floor and 31 new ceiling items were rated as clear, important and relevant, leaving 26 ceiling and 30 floor items for Study 2. Low (n = 246) and high (n = 637) functioning Study 2 respondents were mostly White, women, 70 years old, with some college, and Health Assessment Questionnaire (HAQ) scores of 1.62 and 0.003. Compared to difficulty ratings of reference items, ceiling items were rated to be 10% more to greater than 40% more difficult to do, and floor items were rated to be about 12% to nearly 90% less difficult to do. Conclusions These new floor and ceiling items considerably extend the measurable range of physical function at either extreme. They will help improve instrument performance in populations with broad functional ranges and those concentrated at one or the other extreme ends of functioning. Optimal use of these new items will be assisted by computerized adaptive testing (CAT), reducing questionnaire burden and insuring item administration to appropriate individuals. PMID:24286166

A signal detection-item response theory model for evaluating neuropsychological measures.

PubMed

Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Risbrough, Victoria B; Baker, Dewleen G

2018-02-05

Models from signal detection theory are commonly used to score neuropsychological test data, especially tests of recognition memory. Here we show that certain item response theory models can be formulated as signal detection theory models, thus linking two complementary but distinct methodologies. We then use the approach to evaluate the validity (construct representation) of commonly used research measures, demonstrate the impact of conditional error on neuropsychological outcomes, and evaluate measurement bias. Signal detection-item response theory (SD-IRT) models were fitted to recognition memory data for words, faces, and objects. The sample consisted of U.S. Infantry Marines and Navy Corpsmen participating in the Marine Resiliency Study. Data comprised item responses to the Penn Face Memory Test (PFMT; N = 1,338), Penn Word Memory Test (PWMT; N = 1,331), and Visual Object Learning Test (VOLT; N = 1,249), and self-report of past head injury with loss of consciousness. SD-IRT models adequately fitted recognition memory item data across all modalities. Error varied systematically with ability estimates, and distributions of residuals from the regression of memory discrimination onto self-report of past head injury were positively skewed towards regions of larger measurement error. Analyses of differential item functioning revealed little evidence of systematic bias by level of education. SD-IRT models benefit from the measurement rigor of item response theory-which permits the modeling of item difficulty and examinee ability-and from signal detection theory-which provides an interpretive framework encompassing the experimentally validated constructs of memory discrimination and response bias. We used this approach to validate the construct representation of commonly used research measures and to demonstrate how nonoptimized item parameters can lead to erroneous conclusions when interpreting neuropsychological test data. Future work might include the development of computerized adaptive tests and integration with mixture and random-effects models.
Fostering a student's skill for analyzing test items through an authentic task

NASA Astrophysics Data System (ADS)

Setiawan, Beni; Sabtiawan, Wahyu Budi

2017-08-01

Analyzing test items is a skill that must be mastered by prospective teachers, in order to determine the quality of test questions which have been written. The main aim of this research was to describe the effectiveness of authentic task to foster the student's skill for analyzing test items involving validity, reliability, item discrimination index, level of difficulty, and distractor functioning through the authentic task. The participant of the research is students of science education study program, science and mathematics faculty, Universitas Negeri Surabaya, enrolled for assessment course. The research design was a one-group posttest design. The treatment in this study is that the students were provided an authentic task facilitating the students to develop test items, then they analyze the items like a professional assessor using Microsoft Excel and Anates Software. The data of research obtained were analyzed descriptively, such as the analysis was presented by displaying the data of students' skill, then they were associated with theories or previous empirical studies. The research showed the task facilitated the students to have the skills. Thirty-one students got a perfect score for the analyzing, five students achieved 97% mastery, two students had 92% mastery, and another two students got 89% and 79% of mastery. The implication of the finding was the students who get authentic tasks forcing them to perform like a professional, the possibility of the students for achieving the professional skills will be higher at the end of learning.
Development and psychometric evaluation of a cardiovascular risk and disease management knowledge assessment tool.

PubMed

Rosneck, James S; Hughes, Joel; Gunstad, John; Josephson, Richard; Noe, Donald A; Waechter, Donna

2014-01-01

This article describes the systematic construction and psychometric analysis of a knowledge assessment instrument for phase II cardiac rehabilitation (CR) patients measuring risk modification disease management knowledge and behavioral outcomes derived from national standards relevant to secondary prevention and management of cardiovascular disease. First, using adult curriculum based on disease-specific learning outcomes and competencies, a systematic test item development process was completed by clinical staff. Second, a panel of educational and clinical experts used an iterative process to identify test content domain and arrive at consensus in selecting items meeting criteria. Third, the resulting 31-question instrument, the Cardiac Knowledge Assessment Tool (CKAT), was piloted in CR patients to ensure use of application. Validity and reliability analyses were performed on 3638 adults before test administrations with additional focused analyses on 1999 individuals completing both pretreatment and posttreatment administrations within 6 months. Evidence of CKAT content validity was substantiated, with 85% agreement among content experts. Evidence of construct validity was demonstrated via factor analysis identifying key underlying factors. Estimates of internal consistency, for example, Cronbach's α = .852 and Spearman-Brown split-half reliability = 0.817 on pretesting, support test reliability. Item analysis, using point biserial correlation, measured relationships between performance on single items and total score (P < .01). Analyses using item difficulty and item discrimination indices further verified item stability and validity of the CKAT. A knowledge instrument specifically designed for an adult CR population was systematically developed and tested in a large representative patient population, satisfying psychometric parameters, including validity and reliability.
Do item-writing flaws reduce examinations psychometric quality?

PubMed

Pais, João; Silva, Artur; Guimarães, Bruno; Povo, Ana; Coelho, Elisabete; Silva-Pereira, Fernanda; Lourinho, Isabel; Ferreira, Maria Amélia; Severo, Milton

2016-08-11

The psychometric characteristics of multiple-choice questions (MCQ) changed when taking into account their anatomical sites and the presence of item-writing flaws (IWF). The aim is to understand the impact of the anatomical sites and the presence of IWF in the psychometric qualities of the MCQ. 800 Clinical Anatomy MCQ from eight examinations were classified as standard or flawed items and according to one of the eight anatomical sites. An item was classified as flawed if it violated at least one of the principles of item writing. The difficulty and discrimination indices of each item were obtained. 55.8 % of the MCQ were flawed items. The anatomical site of the items explained 6.2 and 3.2 % of the difficulty and discrimination parameters and the IWF explained 2.8 and 0.8 %, respectively. The impact of the IWF was heterogeneous, the Writing the Stem and Writing the Choices categories had a negative impact (higher difficulty and lower discrimination) while the other categories did not have any impact. The anatomical site effect was higher than IWF effect in the psychometric characteristics of the examination. When constructing MCQ, the focus should be in the topic/area of the items and only after in the presence of IWF.
Validation of the Malay Version of the Parental Bonding Instrument among Malaysian Youths Using Exploratory Factor Analysis

PubMed Central

MUHAMMAD, Noor Azimah; SHAMSUDDIN, Khadijah; OMAR, Khairani; SHAH, Shamsul Azhar; MOHD AMIN, Rahmah

2014-01-01

Background: Parenting behaviour is culturally sensitive. The aims of this study were (1) to translate the Parental Bonding Instrument into Malay (PBI-M) and (2) to determine its factorial structure and validity among the Malaysian population. Methods: The PBI-M was generated from a standard translation process and comprehension testing. The validation study of the PBI-M was administered to 248 college students aged 18 to 22 years. Results: Participants in the comprehension testing had difficulty understanding negative items. Five translated double negative items were replaced with five positive items with similar meanings. Exploratory factor analysis showed a three-factor model for the PBI-M with acceptable reliability. Four negative items (items 3, 4, 8, and 16) and item 19 were omitted from the final PBI-M list because of incorrect placement or low factor loading (< 0.32). Out of the final 20 items of the PBI-M, there were 10 items for the care factor, five items for the autonomy factor and five items for the overprotection factor. All the items loaded positively on their respective factors. Conclusion: The Malaysian population favoured positive items in answering questions. The PBI-M confirmed the three-factor model that consisted of care, autonomy and overprotection. The PBI-M is a valid and reliable instrument to assess the Malaysian parenting style. Confirmatory factor analysis may further support this finding. Keywords: Malaysia, parenting, questionnaire, validity PMID:25977634
Validation of the dutch version of the health education impact questionnaire (HEIQ) and comparison of the Dutch translation with the English, German and French HEIQ.

PubMed

Ammerlaan, Judy W; van Os-Medendorp, Harmieke; Sont, Jacob K; Elsworth, Gerald R; Osborne, Richard H

2017-01-31

The Health Education Impact Questionnaire (heiQ) evaluates the effectiveness of health education and self-management programs provided to people dealing with a wide range of conditions. Aim of this study was to translate, culturally adapt and validate the Dutch translation of the heiQ and to compare the results with the English, German and French translations. A systematic translation process was undertaken. Psychometric properties were studied among patients with arthritis, atopic dermatitis, food allergy and asthma (n = 286). Factorial validity using confirmatory factor analysis, item difficulty (D), item remainder correlation and composite reliability were conducted. Stability was tested using the intra-class correlation coefficient (ICC). Items were well understood and only minor language adjustments were required. Confirmatory fit indices were >0.95 and item difficulty was D ≥ 0.65 for all items in scales showing acceptable fit indices, except for the reversed Emotional distress scale. Composite reliability ranged between 0.67 and 0.85. Test-retest reliability (n = 93) ICC varied between 0.61 and 0.84. Comparisons with other translations showed comparable fit indices. A lower ICC on Self-monitoring and insight scale was observed. The Dutch translation of the heiQ was found to be well understood and user friendly by patients with Rheumatoid Arthritis, Atopic Dermatitis, Food allergy and asthma and to have robust psychometric properties for evaluating the impact of health education and self-management programs. Given the wide applications of the heiQ and the comparability of the Dutch results with the English, German and French version, the heiQ is a practical and useful questionnaire to evaluate the impact of self-management support programs in different countries and populations with different diseases.
Clinical vs. Self-report Versions of the Quick Inventory of Depressive Symptomatology in a Public Sector Sample

PubMed Central

Bernstein, Ira H.; Rush, A. John; Carmody, Thomas J.; Woo, Ada; Trivedi, Madhukar H.

2007-01-01

Objectives Recent work using classical test theory (CTT) and item response theory (IRT) has found that the self-report (QIDS-SR16) and clinician-rated (QIDS-C16) versions of the 16-item Quick Inventory of Depressive Symptomatology were generally comparable in outpatients with nonpsychotic major depressive disorder (MDD). This report extends this comparison to a less well-educated, more treatment-resistant sample that included more ethnic/racial minorities using IRT and selected classical test analyses. Methods The QIDS-SR16 and QIDS-C16 were obtained in a sample of 441 outpatients with nonpsychotic MDD seen in the public sector in the Texas Medication Algorithm Project (TMAP). The Samejima graded response IRT model was used to compare the QIDS-SR16 and QIDS-C16. Results The nine symptom domains in the QIDS-SR16 and QIDS-C16 related well to overall depression. The slopes of the item response functions a), which index the strength of relationship between overall depression and each symptom, were extremely similar with the two measures. Likewise, the CTT and IRT indices of symptom frequency (item means and locations of the item response functions, bi) were also similar with these two measures. For example, sad mood and difficulty with concentration/decision making were highly related to the overall depression severity with both the QIDS-C16 and QIDS-SR16. Likewise, sleeping difficulties were commonly reported, even though they were not as strongly related to overall magnitude of depression. Conclusion In this less educated, socially disadvantaged sample, differences between the QIDS-C16 and QIDS-SR16 were minor. The QIDS-SR16 is a satisfactory substitute for the more time-consuming QIDS-C16 in a broad range of adult, nonpsychotic, depressed outpatients. PMID:16716351
Clinical vs. self-report versions of the quick inventory of depressive symptomatology in a public sector sample.

PubMed

Bernstein, Ira H; Rush, A John; Carmody, Thomas J; Woo, Ada; Trivedi, Madhukar H

2007-01-01

Recent work using classical test theory (CTT) and item response theory (IRT) has found that the self-report (QIDS-SR(16)) and clinician-rated (QIDS-C(16)) versions of the 16-item quick inventory of depressive symptomatology were generally comparable in outpatients with nonpsychotic major depressive disorder (MDD). This report extends this comparison to a less well-educated, more treatment-resistant sample that included more ethnic/racial minorities using IRT and selected classical test analyses. The QIDS-SR(16) and QIDS-C(16) were obtained in a sample of 441 outpatients with nonpsychotic MDD seen in the public sector in the Texas Medication Algorithm Project (TMAP). The Samejima graded response IRT model was used to compare the QIDS-SR(16) and QIDS-C(16). The nine symptom domains in the QIDS-SR(16) and QIDS-C(16) related well to overall depression. The slopes of the item response functions, a, which index the strength of relationship between overall depression and each symptom, were extremely similar with the two measures. Likewise, the CTT and IRT indices of symptom frequency (item means and locations of the item response functions, b(i) were also similar with these two measures. For example, sad mood and difficulty with concentration/decision making were highly related to the overall depression severity with both the QIDS-C(16) and QIDS-SR(16). Likewise, sleeping difficulties were commonly reported, even though they were not as strongly related to overall magnitude of depression. In this less educated, socially disadvantaged sample, differences between the QIDS-C(16) and QIDS-SR(16) were minor. The QIDS-SR(16) is a satisfactory substitute for the more time-consuming QIDS-C(16) in a broad range of adult, nonpsychotic, depressed outpatients.
Defining and validating a short form Montreal Cognitive Assessment (s-MoCA) for use in neurodegenerative disease

PubMed Central

Roalf, David R; Moore, Tyler M; Wolk, David A; Arnold, Steven E; Mechanic-Hamilton, Dawn; Rick, Jacqueline; Kabadi, Sushila; Ruparel, Kosha; Chen-Plotkin, Alice S; Chahine, Lama M; Dahodwala, Nabila A; Duda, John E; Weintraub, Daniel A; Moberg, Paul J

2016-01-01

Introduction Screening for cognitive deficits is essential in neurodegenerative disease. Screening tests, such as the Montreal Cognitive Assessment (MoCA), are easily administered, correlate with neuropsychological performance and demonstrate diagnostic utility. Yet, administration time is too long for many clinical settings. Methods Item response theory and computerised adaptive testing simulation were employed to establish an abbreviated MoCA in 1850 well-characterised community-dwelling individuals with and without neurodegenerative disease. Results 8 MoCA items with high item discrimination and appropriate difficulty were identified for use in a short form (s-MoCA). The s-MoCA was highly correlated with the original MoCA, showed robust diagnostic classification and cross-validation procedures substantiated these items. Discussion Early detection of cognitive impairment is an important clinical and public health concern, but administration of screening measures is limited by time constraints in demanding clinical settings. Here, we provide as-MoCA that is valid across neurological disorders and can be administered in approximately 5 min. PMID:27071646
Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns.

PubMed

Wolfe, Edward W; McGill, Michael T

2011-01-01

This article summarizes a simulation study of the performance of five item quality indicators (the weighted and unweighted versions of the mean square and standardized mean square fit indices and the point-measure correlation) under conditions of relatively high and low amounts of missing data under both random and conditional patterns of missing data for testing contexts such as those encountered in operational administrations of a computerized adaptive certification or licensure examination. The results suggest that weighted fit indices, particularly the standardized mean square index, and the point-measure correlation provide the most consistent information between random and conditional missing data patterns and that these indices perform more comparably for items near the passing score than for items with extreme difficulty values.
Item Difficulty Modeling of Paragraph Comprehension Items

ERIC Educational Resources Information Center

Gorin, Joanna S.; Embretson, Susan E.

2006-01-01

Recent assessment research joining cognitive psychology and psychometric theory has introduced a new technology, item generation. In algorithmic item generation, items are systematically created based on specific combinations of features that underlie the processing required to correctly solve a problem. Reading comprehension items have been more…
Using Rasch Analysis to Evaluate the Reliability and Validity of the Swallowing Quality of Life Questionnaire: An Item Response Theory Approach.

PubMed

Cordier, Reinie; Speyer, Renée; Schindler, Antonio; Michou, Emilia; Heijnen, Bas Joris; Baijens, Laura; Karaduman, Ayşe; Swan, Katina; Clavé, Pere; Joosten, Annette Veronica

2018-02-01

The Swallowing Quality of Life questionnaire (SWAL-QOL) is widely used clinically and in research to evaluate quality of life related to swallowing difficulties. It has been described as a valid and reliable tool, but was developed and tested using classic test theory. This study describes the reliability and validity of the SWAL-QOL using item response theory (IRT; Rasch analysis). SWAL-QOL data were gathered from 507 participants at risk of oropharyngeal dysphagia (OD) across four European countries. OD was confirmed in 75.7% of participants via videofluoroscopy and/or fiberoptic endoscopic evaluation, or a clinical diagnosis based on meeting selected criteria. Patients with esophageal dysphagia were excluded. Data were analysed using Rasch analysis. Item and person reliability was good for all the items combined. However, person reliability was poor for 8 subscales and item reliability was poor for one subscale. Eight subscales exhibited poor person separation and two exhibited poor item separation. Overall item and person fit statistics were acceptable. However, at an individual item fit level results indicated unpredictable item responses for 28 items, and item redundancy for 10 items. The item-person dimensionality map confirmed these findings. Results from the overall Rasch model fit and Principal Component Analysis were suggestive of a second dimension. For all the items combined, none of the item categories were 'category', 'threshold' or 'step' disordered; however, all subscales demonstrated category disordered functioning. Findings suggest an urgent need to further investigate the underlying structure of the SWAL-QOL and its psychometric characteristics using IRT.
Exploratory Item Classification Via Spectral Graph Clustering

PubMed Central

Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

2017-01-01

Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476
North American Veterinary Licensing Examination pacing study.

PubMed

Subhiyah, Raja G; Boyce, John R

2010-01-01

The National Board of Veterinary Medical Examiners was interested in the possible effects of word count on the outcomes of the North American Veterinary Licensing Examination. In this study, the authors investigated the effects of increasing word count on the pacing of examinees during each section of the examination and on the performance of examinees on the items. Specifically, the authors analyzed the effect of item word count on the average time spent on each item within a section of the examination, the average number of items omitted at the end of a section, and the average difficulty of items as a function of presentation order. The average word count per item increased from 2001 to 2008. As expected, there was a relationship between word count and time spent on the item. No significant relationship was found between word count and item difficulty, and an analysis of omitted items and pacing patterns showed no indication of overall pacing problems.
Item selection via Bayesian IRT models.

PubMed

Arima, Serena

2015-02-10

With reference to a questionnaire that aimed to assess the quality of life for dysarthric speakers, we investigate the usefulness of a model-based procedure for reducing the number of items. We propose a mixed cumulative logit model, which is known in the psychometrics literature as the graded response model: responses to different items are modelled as a function of individual latent traits and as a function of item characteristics, such as their difficulty and their discrimination power. We jointly model the discrimination and the difficulty parameters by using a k-component mixture of normal distributions. Mixture components correspond to disjoint groups of items. Items that belong to the same groups can be considered equivalent in terms of both difficulty and discrimination power. According to decision criteria, we select a subset of items such that the reduced questionnaire is able to provide the same information that the complete questionnaire provides. The model is estimated by using a Bayesian approach, and the choice of the number of mixture components is justified according to information criteria. We illustrate the proposed approach on the basis of data that are collected for 104 dysarthric patients by local health authorities in Lecce and in Milan. Copyright © 2014 John Wiley & Sons, Ltd.
Effects of Enhanced Anchored Instruction on Skills Aligned to Common Core Math Standards

ERIC Educational Resources Information Center

Bottge, Brian A.; Cho, Sun-Joo

2013-01-01

This study compared how students with learning difficulties in math (MLD) who were randomly assigned to two instructional conditions answered items on problem solving tests aligned to the Common Core State Standards Initiative for Mathematics. Posttest scores showed improvement in the math performance of students receiving Enhanced Anchored…
The Effects of Test Characteristics on the Hierarchical Order of Reading Skills

ERIC Educational Resources Information Center

Badrasawi, Kamal J. I.; Abu Kassim, Noor Lide; Daud, Nuraihan Mat

2017-01-01

Purpose: The study sought to determine the hierarchical nature of reading skills. Whether reading is a "unitary" or "multi-divisible" skill is still a contentious issue. So is the hierarchical order of reading skills. Determining the hierarchy of reading skills is challenging as item difficulty is greatly influenced by factors…
The development of a knowledge test of depression and its treatment for patients suffering from non-psychotic depression: a psychometric assessment

PubMed Central

Gabriel, Adel; Violato, Claudio

2009-01-01

Background To develop and psychometrically assess a multiple choice question (MCQ) instrument to test knowledge of depression and its treatments in patients suffering from depression. Methods A total of 63 depressed patients and twelve psychiatric experts participated. Based on empirical evidence from an extensive review, theoretical knowledge and in consultations with experts, 27-item MCQ knowledge of depression and its treatment test was constructed. Data collected from the psychiatry experts were used to assess evidence of content validity for the instrument. Results Cronbach's alpha of the instrument was 0.68, and there was an overall 87.8% agreement (items are highly relevant) between experts about the relevance of the MCQs to test patient knowledge on depression and its treatments. There was an overall satisfactory patients' performance on the MCQs with 78.7% correct answers. Results of an item analysis indicated that most items had adequate difficulties and discriminations. Conclusion There was adequate reliability and evidence for content and convergent validity for the instrument. Future research should employ a lager and more heterogeneous sample from both psychiatrist and community samples, than did the present study. Meanwhile, the present study has resulted in psychometrically tested instruments for measuring knowledge of depression and its treatment of depressed patients. PMID:19754944
Mokken scaling of the Myocardial Infarction Dimensional Assessment Scale (MIDAS).

PubMed

Thompson, David R; Watson, Roger

2011-02-01

The purpose of this study was to examine the hierarchical and cumulative nature of the 35 items of the Myocardial Infarction Dimensional Assessment Scale (MIDAS), a disease-specific health-related quality of life measure. Data from 668 participants who completed the MIDAS were analysed using the Mokken Scaling Procedure, which is a computer program that searches polychotomous data for hierarchical and cumulative scales on the basis of a range of diagnostic criteria. Fourteen MIDAS items were retained in a Mokken scale and these items included physical activity, insecurity, emotional reaction and dependency items but excluded items related to diet, medication or side-effects. Item difficulty, in item response theory terms, ran from physical activity items (low difficulty) to insecurity, suggesting that the most severe quality of life effect of myocardial infarction is loneliness and isolation. Items from the MIDAS form a strong and reliable Mokken scale, which provides new insight into the relationship between items in the MIDAS and the measurement of quality of life after myocardial infarction. © 2010 Blackwell Publishing Ltd.
Evaluation of questionnaire-based information on previous physical work loads. Stockholm MUSIC 1 Study Group. Musculoskeletal Intervention Center.

PubMed

Torgén, M; Winkel, J; Alfredsson, L; Kilbom, A

1999-06-01

The principal aim of the present study was to evaluate questionnaire-based information on past physical work loads (6-year recall). Effects of memory difficulties on reproducibility were evaluated for 82 subjects by comparing previously reported results on current work loads (test-retest procedure) with the same items recalled 6 years later. Validity was assessed by comparing self-reports in 1995, regarding work loads in 1989, with worksite measurements performed in 1989. Six-year reproducibility, calculated as weighted kappa coefficients (k(w)), varied between 0.36 and 0.86, with the highest values for proportion of the workday spent sitting and for perceived general exertion and the lowest values for trunk and neck flexion. The six-year reproducibility results were similar to previously reported test-retest results for these items; this finding indicates that memory difficulties was a minor problem. The validity of the questionnaire responses, expressed as rank correlations (r(s)) between the questionnaire responses and workplace measurements, varied between -0.16 and 0.78. The highest values were obtained for the items sitting and repetitive work, and the lowest and "unacceptable" values were for head rotation and neck flexion. Misclassification of exposure did not appear to be differential with regard to musculoskeletal symptom status, as judged by the calculated risk estimates. The validity of some of these self-administered questionnaire items appears sufficient for a crude assessment of physical work loads in the past in epidemiologic studies of the general population with predominantly low levels of exposure.

[The French translation and cultural adaptation of the SRI questionnaire. A questionnaire to assess health-related quality of life in patients with chronic respiratory failure and domiciliary ventilation].

PubMed

Cuvelier, A; Lamia, B; Molano, L-C; Muir, J-F; Windisch, W

2012-05-01

We performed the French translation and cross-cultural adaptation of the Severe Respiratory Insufficiency (SRI) questionnaire. Written and validated in German, this questionnaire evaluates health-related quality of life in patients treated with domiciliary ventilation for chronic respiratory failure. Four bilingual German-French translators and a linguist were recruited to produce translations and back-translations of the questionnaire constituted of 49 items in seven domains. Two successive versions were generated and compared to the original questionnaire. The difficulty of the translation and the naturalness were quantified for each item using a 1-10 scale and their equivalence to their original counterpart was graded from A to C. The translated questionnaire was finally tested in a pilot study, which included 15 representative patients. The difficulty of the first translation and the first back-translation was respectively quantified as 2.5 (range 1-5.5) and 1.5 (range 1-6) on the 10-point scale (P=0.0014). The naturalness and the equivalence of 8/49 items were considered as insufficient, which led to the production of a second translation and a second back-translation. The meanings of two items needed clarification during the pilot study. The French translation of the SRI questionnaire represents a new instrument for clinical research in patients treated with domiciliary ventilation for chronic respiratory failure. Its validity needs to be tested in a multicenter study. Copyright © 2012 SPLF. Published by Elsevier Masson SAS. All rights reserved.
Dutch-Flemish translation of 17 item banks from the patient-reported outcomes measurement information system (PROMIS).

PubMed

Terwee, C B; Roorda, L D; de Vet, H C W; Dekker, J; Westhovens, R; van Leeuwen, J; Cella, D; Correia, H; Arnold, B; Perez, B; Boers, M

2014-08-01

The Patient-Reported Outcomes Measurement Information System (PROMIS(®)) is a new, state-of-the-art assessment system for measuring patient-reported health and well-being of adults and children that has the potential to be more valid, reliable and responsive than existing PROMs. The PROMIS items can be administered in short forms or, more efficiently, through computerized adaptive testing. This paper describes the translation of 563 items from 17 PROMIS item banks (domains) for adults from the English source into Dutch-Flemish. The translation was performed by FACITtrans using standardized methodology and approved by the PROMIS Statistical Center. The translation included four forward translations, two back-translations, three to five independent reviews (at least two Dutch, one Flemish) and pre-testing in 70 adults (age range 20-77) from the Netherlands and Flanders. A small number of items required separate translations for Dutch and Flemish: physical function (five items), pain behaviour (two items), pain interference (one item), social isolation (one item) and global health (one item). Challenges faced in the translation process included: scarcity or overabundance of possible translations, unclear item descriptions, constructs broader/smaller in the target language, difficulties in rank ordering items, differences in unit of measurement, irrelevant items or differences in performance of activities. By addressing these challenges, acceptable translations were obtained for all items. The methodology used and experience gained in this study can be used as an example for researchers in other countries interested in translating PROMIS. The Dutch-Flemish PROMIS items are linguistically equivalent. Short forms will soon be available for use and entire item banks are ready for cross-cultural validation in the Netherlands and Flanders.
Analysis of Validity and Reliability of the Health Literacy Index for Female Marriage Immigrants (HLI-FMI).

PubMed

Yang, Sook Ja; Chee, Yeon Kyung; An, Jisook; Park, Min Hee; Jung, Sunok

2016-05-01

The purpose of this study was to obtain an independent evaluation of the factor structure of the 12-item Health Literacy Index for Female Marriage Immigrants (HLI-FMI), the first measure for assessing health literacy for FMIs in Korea. Participants were 250 Asian women who migrated from China, Vietnam, and the Philippines to marry. The HLI-FMI was originally developed and administered in Korean, and other questionnaires were translated into participants' native languages. The HLI-FMI consisted of 2 factors: (1) Access-Understand Health Literacy (7 items) and (2) Appraise-Apply Health Literacy (5 items); Cronbach's α = .73. Confirmatory factor analysis indicated adequate fit for the 2-factor model. HLI-FMI scores were positively associated with time since immigration and Korean proficiency. Based on classical test theory and item response theory, strong support was provided for item discrimination and item difficulty. Findings suggested that the HLI-FMI is an easily administered, reliable, and valid scale. © 2016 APJPH.
Development and validity of a questionnaire to test the knowledge of primary care personnel regarding nutrition in obese adolescents

PubMed Central

2013-01-01

Background In light of its epidemic proportions in developed and developing countries, obesity is considered a serious public health issue. In order to increase knowledge concerning the ability of health care professionals in caring for obese adolescents and adopt more efficient preventive and control measures, a questionnaire was developed and validated to assess non-dietitian health professionals regarding their Knowledge of Nutrition in Obese Adolescents (KNOA). Methods The development and evaluation of a questionnaire to assess the knowledge of primary care practitioners with respect to nutrition in obese adolescents was carried out in five phases, as follows: 1) definition of study dimensions 2) development of 42 questions and preliminary evaluation of the questionnaire by a panel of experts; 3) characterization and selection of primary care practitioners (35 dietitians and 265 non-dietitians) and measurement of questionnaire criteria by contrasting the responses of dietitians and non-dietitians; 4) reliability assessment by question exclusion based on item difficulty (too easy and too difficult for non-dietitian practitioners), item discrimination, internal consistency and reproducibility index determination; and 5) scoring the completed questionnaires. Results Dietitians obtained higher scores than non-dietitians (Mann–Whitney U test, P < 0.05), confirming the validity of the questionnaire criteria. Items were discriminated by correlating the score for each item with the total score, using a minimum of 0.2 as a correlation coefficient cutoff value. Item difficulty was controlled by excluding questions answered correctly by more than 90% of the non-dietitian subjects (too easy) or by less than 10% of them (too difficult). The final questionnaire contained 26 of the original 42 questions, increasing Cronbach’s α value from 0.788 to 0.807. Test-retest agreement between respondents was classified as good to very good (Kappa test, >0.60). Conclusion The KNOA questionnaire developed for primary care practitioners is a valid, consistent and suitable instrument that can be applied over time, making it a promising tool for developing and guiding public health policies. PMID:23865564
A Psychometric Analysis of the Italian Version of the eHealth Literacy Scale Using Item Response and Classical Test Theory Methods

PubMed Central

Dima, Alexandra Lelia; Schulz, Peter Johannes

2017-01-01

Background The eHealth Literacy Scale (eHEALS) is a tool to assess consumers’ comfort and skills in using information technologies for health. Although evidence exists of reliability and construct validity of the scale, less agreement exists on structural validity. Objective The aim of this study was to validate the Italian version of the eHealth Literacy Scale (I-eHEALS) in a community sample with a focus on its structural validity, by applying psychometric techniques that account for item difficulty. Methods Two Web-based surveys were conducted among a total of 296 people living in the Italian-speaking region of Switzerland (Ticino). After examining the latent variables underlying the observed variables of the Italian scale via principal component analysis (PCA), fit indices for two alternative models were calculated using confirmatory factor analysis (CFA). The scale structure was examined via parametric and nonparametric item response theory (IRT) analyses accounting for differences between items regarding the proportion of answers indicating high ability. Convergent validity was assessed by correlations with theoretically related constructs. Results CFA showed a suboptimal model fit for both models. IRT analyses confirmed all items measure a single dimension as intended. Reliability and construct validity of the final scale were also confirmed. The contrasting results of factor analysis (FA) and IRT analyses highlight the importance of considering differences in item difficulty when examining health literacy scales. Conclusions The findings support the reliability and validity of the translated scale and its use for assessing Italian-speaking consumers’ eHealth literacy. PMID:28400356
Assisting Australians with mental health problems and financial difficulties: a Delphi study to develop guidelines for financial counsellors, financial institution staff, mental health professionals and carers.

PubMed

Bond, Kathy S; Chalmers, Kathryn J; Jorm, Anthony F; Kitchener, Betty A; Reavley, Nicola J

2015-06-03

There is a strong association between mental health problems and financial difficulties. Therefore, people who work with those who have financial difficulties (financial counsellors and financial institution staff) need to have knowledge and helping skills relevant to mental health problems. Conversely, people who support those with mental health problems (mental health professionals and carers) may need to have knowledge and helping skills relevant to financial difficulties. The Delphi expert consensus method was used to develop guidelines for people who work with or support those with mental health problems and financial difficulties. A systematic review of websites, books and journal articles was conducted to develop a questionnaire containing items about the knowledge, skills and actions relevant to working with or supporting someone with mental health problems and financial difficulties. These items were rated over three rounds by five Australian expert panels comprising of financial counsellors (n = 33), financial institution staff (n = 54), mental health professionals (n = 31), consumers (n = 20) and carers (n = 24). A total of 897 items were rated, with 462 items endorsed by at least 80 % of members of each of the expert panels. These endorsed statements were used to develop a set of guidelines for financial counsellors, financial institution staff, mental health professionals and carers about how to assist someone with mental health problems and financial difficulties. A diverse group of expert panel members were able to reach substantial consensus on the knowledge, skills and actions needed to work with and support people with mental health problems and financial difficulties. These guidelines can be used to inform policy and practice in the financial and mental health sectors.
Volume 42, Issue5 (May 2005)Articles in the Current Issue:Developmental growth in students' concept of energy: Analysis of selected items from the TIMSS database

NASA Astrophysics Data System (ADS)

Liu, Xiufeng; McKeough, Anne

2005-05-01

The aim of this study was to develop a model of students' energy concept development. Applying Case's (1985, 1992) structural theory of cognitive development, we hypothesized that students' concept of energy undergoes a series of transitions, corresponding to systematic increases in working memory capacity. The US national sample from the Third International Mathematics and Science Study (TIMSS) database was used to test our hypothesis. Items relevant to the energy concept in the TIMSS test booklets for three populations were identified. Item difficulty from Rasch modeling was used to test the hypothesized developmental sequence, and percentage of students' correct responses was used to test the correspondence between students' age/grade level and level of the energy concepts. The analysis supported our hypothesized sequence of energy concept development and suggested mixed effects of maturation and schooling on energy concept development. Further, the results suggest that curriculum and instruction design take into consideration the developmental progression of students' concept of energy.
Object-location memory in adults with autism spectrum disorder.

PubMed

Ring, Melanie; Gaigg, Sebastian B; Bowler, Dermot M

2015-10-01

This study tested implicit and explicit spatial relational memory in Autism Spectrum Disorder (ASD). Participants were asked to study pictures of rooms and pictures of daily objects for which locations were highlighted in the rooms. Participants were later tested for their memory of the object locations either by being asked to place objects back into their original locations or into new locations. Proportions of times when participants choose the previously studied locations for the objects irrespective of the instruction were used to derive indices of explicit and implicit memory [process-dissociation procedure, Jacoby, 1991, 1998]. In addition, participants performed object and location recognition and source memory tasks where they were asked about which locations belonged to the objects and which objects to the locations. The data revealed difficulty for ASD individuals in actively retrieving object locations (explicit memory) but not in subconsciously remembering them (implicit memory). These difficulties cannot be explained by difficulties in memory for objects or locations per se (i.e., the difficulty pertains to object-location relations). Together these observations lend further support to the idea that ASD is characterised by relatively circumscribed difficulties in relational rather than item-specific memory processes and show that these difficulties extend to the domain of spatial information. They also lend further support to the idea that memory difficulties in ASD can be reduced when support is provided at test. © 2015 International Society for Autism Research, Wiley Periodicals, Inc.
A New Functional Health Literacy Scale for Japanese Young Adults Based on Item Response Theory.

PubMed

Tsubakita, Takashi; Kawazoe, Nobuo; Kasano, Eri

2017-03-01

Health literacy predicts health outcomes. Despite concerns surrounding the health of Japanese young adults, to date there has been no objective assessment of health literacy in this population. This study aimed to develop a Functional Health Literacy Scale for Young Adults (funHLS-YA) based on item response theory. Each item in the scale requires participants to choose the most relevant term from 3 choices in relation to a target item, thus assessing objective rather than perceived health literacy. The 20-item scale was administered to 1816 university students and 1751 responded. Cronbach's α coefficient was .73. Difficulty and discrimination parameters of each item were estimated, resulting in the exclusion of 1 item. Some items showed different difficulty parameters for male and female participants, reflecting that some aspects of health literacy may differ by gender. The current 19-item version of funHLS-YA can reliably assess the objective health literacy of Japanese young adults.
Exploring Differential Effects across Two Decoding Treatments on Item-Level Transfer in Children with Significant Word Reading Difficulties: A New Approach for Testing Intervention Elements

ERIC Educational Resources Information Center

Steacy, Laura M.; Elleman, Amy M.; Lovett, Maureen W.; Compton, Donald L.

2016-01-01

In English, gains in decoding skill do not map directly onto increases in word reading. However, beyond the Self-Teaching Hypothesis, little is known about the transfer of decoding skills to word reading. In this study, we offer a new approach to testing specific decoding elements on transfer to word reading. To illustrate, we modeled word-reading…
Measuring disability across cultures — the psychometric properties of the WHODAS II in older people from seven low- and middle-income countries. The 10/66 Dementia Research Group population-based survey

PubMed Central

Sousa, Renata M; Dewey, Michael E; Acosta, Daisy; Jotheeswaran, AT; Castro-Costa, Erico; Ferri, Cleusa P; Guerra, Mariella; Huang, Yueqin; Jacob, KS; Pichardo, Juana Guillermina Rodriguez; Ramírez, Nayeli Garcia; Rodriguez, Juan Llibre; Rodriguez, Marina Calvo; Salas, Aquiles; Sosa, Ana Luisa; Williams, Joseph; Prince, Martin J

2010-01-01

We evaluated the psychometric properties of the 12-item interviewer-administered screener version of the World Health Organization Disability Assessment Schedule – version II (WHODAS II) among older people living in seven low- and middle-income countries. Principal component analysis (PCA), confirmatory factor analysis (CFA) and Mokken analyses were carried out to test for unidimensionality, hierarchical structure, and measurement invariance across 10/66 Dementia Research Group sites. PCA generated a one-factor solution in most sites. In CFA, the two-factor solution generated in Dominican Republic fitted better for all sites other than rural China. The two factors were not easily interpretable, and may have been an artefact of differing item difficulties. Strong internal consistency and high factor loadings for the one-factor solution supported unidimensionality. Furthermore, the WHODAS II was found to be a ‘strong’ Mokken scale. Measurement invariance was supported by the similarity of factor loadings across sites, and by the high between-site correlations in item difficulties. The Mokken results strongly support that the WHODAS II 12-item screener is a unidimensional and hierarchical scale confirming to item response theory (IRT) principles, at least at the monotone homogeneity model level. More work is needed to assess the generalizability of our findings to different populations. Copyright © 2010 John Wiley & Sons, Ltd. PMID:20104493
Reliability and Validity of the Math Essential Skill Screener Elementary Version (MESS-E).

ERIC Educational Resources Information Center

Erford, Bradley T.; Bagley, Donna L.; Hopper, James A.; Lee, Ramona M.; Panagopulos, Kathleen A.; Preller, Denise B.

1998-01-01

The Math Essential Skill Screener Elementary Version (MESS-E) is a screener devised to identify primary grade students at risk for math difficulties. Item analysis, interitem consistency, test-retest reliability, decision efficiency, and construct validity of the MESS-E were studied using four independent samples of boys and girls grades 1-3. The…
Sensitivity of Cross-State Assessment Item Difficulty to Differences in State Curricular Content Standards

ERIC Educational Resources Information Center

Traynor, Anne

2017-01-01

It has long been argued that U.S. states' differential performance on nationwide assessments may reflect differences in students' opportunity to learn the tested content that is primarily due to variation in curricular content standards, rather than in instructional quality or educational investment. To quantify the effect of differences in…
The SAT Gender Gap: Identifying the Causes.

ERIC Educational Resources Information Center

Rosser, Phyllis

Questions on the Scholastic Aptitude Test (SAT) with the largest score differences between women and men of all racial and ethnic groups were identified. Patterns of difficulty that would explain the SAT's continuing underprediction of female first-year college performance were studied. An item analysis of one form of the June 1986 SAT for 1,112…
Prior Experience Shapes Metacognitive Judgments at the Category Level: The Role of Testing and Category Difficulty

ERIC Educational Resources Information Center

Thomas, Ruthann C.; Finn, Bridgid; Jacoby, Larry L.

2016-01-01

Most metacognition research has focused on aggregate judgments of overall performance or item-level judgments about performance on particular questions. However, metacognitive judgments at the category level, which have not been as extensively explored, also play a role in students' study strategies, for example, when students determine what…
Using Student Ability and Item Difficulty for Making Defensible Pass/Fail Decisions for Borderline Grades

ERIC Educational Resources Information Center

Shulruf, Boaz; Jones, Phil; Turner, Rolf

2015-01-01

The determination of Pass/Fail decisions over Borderline grades, (i.e., grades which do not clearly distinguish between the competent and incompetent examinees) has been an ongoing challenge for academic institutions. This study utilises the Objective Borderline Method (OBM) to determine examinee ability and item difficulty, and from that…
Psychometric Properties of the Chinese Version of the Beck Depression Inventory-II Using the Rasch Model

ERIC Educational Resources Information Center

Wu, Pei-Chen; Chang, Lily

2008-01-01

The authors investigated the Chinese version of the Beck Depression Inventory-II (BDI-II-C; Chinese Behavioral Science Corporation, 2000) within the Rasch framework in terms of dimensionality, item difficulty, and category functioning. Two underlying scale dimensions, relatively high item difficulties, and a need for collapsing 2 response…
Effects of Anchor Item Methods on the Detection of Differential Item Functioning within the Family of Rasch Models

ERIC Educational Resources Information Center

Wang, Wen-Chung

2004-01-01

Scale indeterminacy in analysis of differential item functioning (DIF) within the framework of item response theory can be resolved by imposing 3 anchor item methods: the equal-mean-difficulty method, the all-other anchor item method, and the constant anchor item method. In this article, applicability and limitations of these 3 methods are…
Using Explanatory Item Response Models to Evaluate Complex Scientific Tasks Designed for the Next Generation Science Standards

NASA Astrophysics Data System (ADS)

Chiu, Tina

This dissertation includes three studies that analyze a new set of assessment tasks developed by the Learning Progressions in Middle School Science (LPS) Project. These assessment tasks were designed to measure science content knowledge on the structure of matter domain and scientific argumentation, while following the goals from the Next Generation Science Standards (NGSS). The three studies focus on the evidence available for the success of this design and its implementation, generally labelled as "validity" evidence. I use explanatory item response models (EIRMs) as the overarching framework to investigate these assessment tasks. These models can be useful when gathering validity evidence for assessments as they can help explain student learning and group differences. In the first study, I explore the dimensionality of the LPS assessment by comparing the fit of unidimensional, between-item multidimensional, and Rasch testlet models to see which is most appropriate for this data. By applying multidimensional item response models, multiple relationships can be investigated, and in turn, allow for a more substantive look into the assessment tasks. The second study focuses on person predictors through latent regression and differential item functioning (DIF) models. Latent regression models show the influence of certain person characteristics on item responses, while DIF models test whether one group is differentially affected by specific assessment items, after conditioning on latent ability. Finally, the last study applies the linear logistic test model (LLTM) to investigate whether item features can help explain differences in item difficulties.
Classical Item Analysis Using Latent Variable Modeling: A Note on a Direct Evaluation Procedure

ERIC Educational Resources Information Center

Raykov, Tenko; Marcoulides, George A.

2011-01-01

A directly applicable latent variable modeling procedure for classical item analysis is outlined. The method allows one to point and interval estimate item difficulty, item correlations, and item-total correlations for composites consisting of categorical items. The approach is readily employed in empirical research and as a by-product permits…

Rasch Mixture Models for DIF Detection

PubMed Central

Strobl, Carolin; Zeileis, Achim

2014-01-01

Rasch mixture models can be a useful tool when checking the assumption of measurement invariance for a single Rasch model. They provide advantages compared to manifest differential item functioning (DIF) tests when the DIF groups are only weakly correlated with the manifest covariates available. Unlike in single Rasch models, estimation of Rasch mixture models is sensitive to the specification of the ability distribution even when the conditional maximum likelihood approach is used. It is demonstrated in a simulation study how differences in ability can influence the latent classes of a Rasch mixture model. If the aim is only DIF detection, it is not of interest to uncover such ability differences as one is only interested in a latent group structure regarding the item difficulties. To avoid any confounding effect of ability differences (or impact), a new score distribution for the Rasch mixture model is introduced here. It ensures the estimation of the Rasch mixture model to be independent of the ability distribution and thus restricts the mixture to be sensitive to latent structure in the item difficulties only. Its usefulness is demonstrated in a simulation study, and its application is illustrated in a study of verbal aggression. PMID:29795819
Development and Psychometric Properties of the Instrumental Activities of Daily Living: Compensation Scale

PubMed Central

Schmitter-Edgecombe, Maureen; Parsey, Carolyn; Lamb, Richard

2014-01-01

The Instrumental Activities of Daily Living – Compensation (IADL-C) scale was developed to capture early functional difficulties and to quantify compensatory strategy use that may mitigate functional decline in the aging population. The IADL-C was validated in a sample of cognitively healthy older adults (N=184) and individuals with mild cognitive impairment (MCI; N=92) and dementia (N=24). Factor analysis and Rasch item analysis led to the 27-item IADL-C informant questionnaire with four functional domain subscales (money and self-management, home daily living, travel and event memory, and social skills). The subscales demonstrated good internal consistency (Rasch reliability 0.80 to 0.93) and test-retest reliability (Spearman coefficients 0.70 to 0.91). The IADL-C total score and subscales showed convergent validity with other IADL measures, discriminant validity with psychosocial measures, and the ability to discriminate between diagnostic groups. The money and self management subscale showed notable difficulties for individuals with MCI, whereas difficulties with home daily living became more prominent for dementia participants. Compensatory strategy use increased in the MCI group and decreased in the dementia group. PMID:25344901
The Standardization of the Clock Drawing Test (CDT) for People with Stroke Using Rasch Analysis

PubMed Central

Yoo, Doo Han; Hong, Deok Gi; Lee, Jae Shin

2014-01-01

[Purpose] The aim of this study was to standardize the clock drawing test (CDT) for people with stroke using Rasch analysis. [Subjects and Methods] Seventeen items of the CDT identified through a literature review were performed by 159 stroke patients. The data was analyzed with Winstep version 3.57 using the Rasch model to examine the unidimensionality of the items’ fit, the distribution of the items’ difficulty, and the reliability and appropriateness of the rating scale. [Result] Ten out of the 159 participations (6.2%) were considered misfit subjects, and one item of the CDT was determined to be a misfit item based on Rasch analysis. The rating scales were judged as suitable because the observed average showed an array of vertical orders and MNSQ values < 2. The separate index and reliability of the subject (1.98, 0.80) and item (6.45, 0.97) showed relatively high values. [Conclusion] This study is the first to examine the CDT scale in stroke patients by Rasch analysis. The CDT is expected to be useful for screening stroke patients with cognitive problems. PMID:24409026
Pilot-testing the French version of a provisional European organisation for research and treatment of cancer (EORTC) measure of spiritual well-being for people receiving palliative care for cancer.

PubMed

Lucette, A; Brédart, A; Vivat, B; Young, T

2014-03-01

Spiritual well-being is increasingly recognised as an important aspect of patients' quality of life when living with a potentially life-limiting illness such as cancer. The European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Group is developing a measure for assessing spiritual well-being cross-culturally for people receiving palliative care for cancer. The pilot-testing phase of the study explored potential problems related to the content and administration of a provisional version of this measure. The French version was pilot-tested with 12 patients in a palliative and supportive day care unit in Paris. Participants were asked to complete the measure and the EORTC QLQ-C15-PAL before being interviewed about their responses. The administration of the measure enabled participants to express the difficulties and existential concerns they experienced. The items were not considered intrusive, despite the sensitive topic of the measure. This article considers difficulties with items pertaining to 'religion' and 'spirituality' in the context of French culture. Overall, this measure appears to enhance holistic care, by providing caregivers with a means of broaching spirituality issues, a topic otherwise difficult to discuss in the context of palliative care. © 2013 John Wiley & Sons Ltd.
Adaptable Learning Assistant for Item Bank Management

ERIC Educational Resources Information Center

Nuntiyagul, Atorn; Naruedomkul, Kanlaya; Cercone, Nick; Wongsawang, Damras

2008-01-01

We present PKIP, an adaptable learning assistant tool for managing question items in item banks. PKIP is not only able to automatically assist educational users to categorize the question items into predefined categories by their contents but also to correctly retrieve the items by specifying the category and/or the difficulty level. PKIP adapts…
Development and evaluation of the Korean Health Literacy Instrument.

PubMed

Kang, Soo Jin; Lee, Tae Wha; Paasche-Orlow, Michael K; Kim, Gwang Suk; Won, Hee Kwan

2014-01-01

The purpose of this study is to develop and validate the Korean Health Literacy Instrument, which measures the capacity to understand and use health-related information and make informed health decisions in Korean adults. In Phase 1, 33 initial items were generated to measure functional, interactive, and critical health literacy with prose, document, and numeracy tasks. These items included content from health promotion, disease management, and health navigation contexts. Content validity assessment was conducted by an expert panel, and 11 items were excluded. In Phase 2, the 22 remaining items were administered to a convenience sample of 292 adults from community and clinical settings. Exploratory factor and item difficulty and discrimination analyses were conducted and four items with low discrimination were deleted. In Phase 3, the remaining 18 items were administered to a convenience sample of 315 adults 40-64 years of age from community and clinical settings. A confirmatory factor analysis was performed to test the construct validity of the instrument. The Korean Health Literacy Instrument has a range of 0 to 18. The mean score in our validation study was 11.98. The instrument exhibited an internal consistency reliability coefficient of 0.82, and a test-retest reliability of 0.89. The instrument is suitable for screening individuals who have limited health literacy skills. Future studies are needed to further define the psychometric properties and predictive validity of the Korean Health Literacy Instrument.
Development and initial evaluation of the SCI-FI/AT

PubMed Central

Jette, Alan M.; Slavin, Mary D.; Ni, Pengsheng; Kisala, Pamela A.; Tulsky, David S.; Heinemann, Allen W.; Charlifue, Susie; Tate, Denise G.; Fyffe, Denise; Morse, Leslie; Marino, Ralph; Smith, Ian; Williams, Steve

2015-01-01

Objectives To describe the domain structure and calibration of the Spinal Cord Injury Functional Index for samples using Assistive Technology (SCI-FI/AT) and report the initial psychometric properties of each domain. Design Cross sectional survey followed by computerized adaptive test (CAT) simulations. Setting Inpatient and community settings. Participants A sample of 460 adults with traumatic spinal cord injury (SCI) stratified by level of injury, completeness of injury, and time since injury. Interventions None Main outcome measure SCI-FI/AT Results Confirmatory factor analysis (CFA) and Item response theory (IRT) analyses identified 4 unidimensional SCI-FI/AT domains: Basic Mobility (41 items) Self-care (71 items), Fine Motor Function (35 items), and Ambulation (29 items). High correlations of full item banks with 10-item simulated CATs indicated high accuracy of each CAT in estimating a person's function, and there was high measurement reliability for the simulated CAT scales compared with the full item bank. SCI-FI/AT item difficulties in the domains of Self-care, Fine Motor Function, and Ambulation were less difficult than the same items in the original SCI-FI item banks. Conclusion With the development of the SCI-FI/AT, clinicians and investigators have available multidimensional assessment scales that evaluate function for users of AT to complement the scales available in the original SCI-FI. PMID:26010975
Development and initial evaluation of the SCI-FI/AT.

PubMed

Jette, Alan M; Slavin, Mary D; Ni, Pengsheng; Kisala, Pamela A; Tulsky, David S; Heinemann, Allen W; Charlifue, Susie; Tate, Denise G; Fyffe, Denise; Morse, Leslie; Marino, Ralph; Smith, Ian; Williams, Steve

2015-05-01

To describe the domain structure and calibration of the Spinal Cord Injury Functional Index for samples using Assistive Technology (SCI-FI/AT) and report the initial psychometric properties of each domain. Cross sectional survey followed by computerized adaptive test (CAT) simulations. Inpatient and community settings. A sample of 460 adults with traumatic spinal cord injury (SCI) stratified by level of injury, completeness of injury, and time since injury. None SCI-FI/AT RESULTS: Confirmatory factor analysis (CFA) and Item response theory (IRT) analyses identified 4 unidimensional SCI-FI/AT domains: Basic Mobility (41 items) Self-care (71 items), Fine Motor Function (35 items), and Ambulation (29 items). High correlations of full item banks with 10-item simulated CATs indicated high accuracy of each CAT in estimating a person's function, and there was high measurement reliability for the simulated CAT scales compared with the full item bank. SCI-FI/AT item difficulties in the domains of Self-care, Fine Motor Function, and Ambulation were less difficult than the same items in the original SCI-FI item banks. With the development of the SCI-FI/AT, clinicians and investigators have available multidimensional assessment scales that evaluate function for users of AT to complement the scales available in the original SCI-FI.
Validating Translation Test Items via the Many-Facet Rasch Model.

PubMed

Tseng, Wen-Ta; Su, Tzi-Ying; Nix, John-Michael L

2018-01-01

This study applied the many-facet Rasch model to assess learners' translation ability in an English as a foreign language context. Few attempts have been made in extant research to detect and calibrate rater severity in the domain of translation testing. To fill the research gap, this study documented the process of validating a test of Chinese-to-English sentence translation and modeled raters' scoring propensity defined by harshness or leniency, expert/novice effects on severity, and concomitant effects on item difficulty. Two hundred twenty-five, third-year senior high school Taiwanese students and six educators from tertiary and secondary educational institutions served as participants. The students' mean age was 17.80 years ( SD = 1.20, range 17-19). The exam consisted of 10 translation items adapted from two entrance exam tests. The results showed that this subjectively scored performance assessment exhibited robust unidimensionality, thus reliably measuring translation ability free from unmodeled disturbances. Furthermore, discrepancies in ratings between novice and expert raters were also identified and modeled by the many-facet Rasch model. The implications for applying the many-facet Rasch model in translation tests at the tertiary level were discussed.
The relationship between brain reaction and English reading tests for non-native English speakers.

PubMed

Cheng, Pei-Wen; Tian, Yu-Jie; Kuo, Ting-Hua; Sun, Koun-Tem

2016-07-01

This research analyzed the brain activity of non-native English speakers while engaged in English reading tests. The brain wave event-related potentials (ERPs) of participants were used to analyze the difference between making correct and incorrect choices on English reading test items. Three English reading tests of differing levels were designed and 20 participants, 10 males and 10 females whose ages ranged from 20 to 24, voluntarily participated in the experiment. Experimental results were analyzed by performing independent t-tests on the ERPs of participants for gender, difficulty level, and correct versus wrong options. Participants who chose incorrect options elicited a larger N600, verifying results found in the literature. Another interesting result was found: For incorrectly answered items, different areas of brain showing a significant difference in ERPs between the chosen and non-chosen options corresponded to gender differences; for males, this area was located in the right hemisphere whereas for females, it was located in the left. Experimental results imply that non-native English speaking males and females employ different areas of the brain to comprehend the meaning of difficult items. Copyright © 2016 Elsevier B.V. All rights reserved.
Modifying the test of understanding graphs in kinematics

NASA Astrophysics Data System (ADS)

Zavala, Genaro; Tejeda, Santa; Barniol, Pablo; Beichner, Robert J.

2017-12-01

In this article, we present several modifications to the Test of Understanding Graphs in Kinematics. The most significant changes are (i) the addition and removal of items to achieve parallelism in the objectives (dimensions) of the test, thus allowing comparisons of students' performance that were not possible with the original version, and (ii) changes to the distractors of some of the original items that represent the most frequent alternative conceptions. The final modified version (after an iterative process involving four administrations of test variations over two years) was administered to 471 students of an introductory university physics course at a large private university in Mexico. When analyzing the final modified version of the test it was found that the added items satisfied the statistical tests of difficulty, discriminatory power, and reliability; also, that the great majority of the modified distractors were effective in terms of their frequency selection and discriminatory power; and, that the final modified version of the test satisfied the reliability and discriminatory power criteria as well as the original test. Here, we also show the use of the new version of the test, presenting a new analysis of students' understanding not possible to do before with the original version of the test, specifically regarding the objectives and items that in the new version meet parallelisms. Finally, in the PhysPort project (physport.org), we present the final modified version of the test. It can be used by teachers and researchers to assess students' understanding of graphs in kinematics, as well as their learning about them.
Designing a Clinical Dashboard to Fill Information Gaps in the Emergency Department

PubMed Central

Swartz, Jordan L.; Cimino, James J.; Fred, Matthew R.; Green, Robert A.; Vawdrey, David K.

2014-01-01

Data fragmentation within electronic health records causes gaps in the information readily available to clinicians. We investigated the information needs of emergency medicine clinicians in order to design an electronic dashboard to fill information gaps in the emergency department. An online survey was distributed to all emergency medicine physicians at a large, urban academic medical center. The survey response rate was 48% (52/109). The clinical information items reported to be most helpful while caring for patients in the emergency department were vital signs, electrocardiogram (ECG) reports, previous discharge summaries, and previous lab results. Brief structured interviews were also conducted with 18 clinicians during their shifts in the emergency department. From the interviews, three themes emerged: 1) difficulty accessing vital signs, 2) difficulty accessing point-of-care tests, and 3) difficulty comparing the current ECG with the previous ECG. An emergency medicine clinical dashboard was developed to address these difficulties. PMID:25954420
Designing a clinical dashboard to fill information gaps in the emergency department.

PubMed

Swartz, Jordan L; Cimino, James J; Fred, Matthew R; Green, Robert A; Vawdrey, David K

2014-01-01

Data fragmentation within electronic health records causes gaps in the information readily available to clinicians. We investigated the information needs of emergency medicine clinicians in order to design an electronic dashboard to fill information gaps in the emergency department. An online survey was distributed to all emergency medicine physicians at a large, urban academic medical center. The survey response rate was 48% (52/109). The clinical information items reported to be most helpful while caring for patients in the emergency department were vital signs, electrocardiogram (ECG) reports, previous discharge summaries, and previous lab results. Brief structured interviews were also conducted with 18 clinicians during their shifts in the emergency department. From the interviews, three themes emerged: 1) difficulty accessing vital signs, 2) difficulty accessing point-of-care tests, and 3) difficulty comparing the current ECG with the previous ECG. An emergency medicine clinical dashboard was developed to address these difficulties.
Cancer Health Literacy Test-30-Spanish (CHLT-30-DKspa), a new Spanish- language version of the Cancer Health Literacy Test (CHLT-30) for Spanish-speaking Latinos

PubMed Central

Echeverri, Margarita; Anderson, David; Nápoles, Anna María

2016-01-01

Objective Describe adaptation and initial validation of the Cancer Health Literacy Test (CHLT) for Spanish-speakers. Methods Cross-sectional field test of the CHLT Spanish version (CHLT-30-DKspa) among healthy Latinos in Louisiana. Diagonally Weighted Least Squares were used to confirm the factor structure. Item-Response Analysis using 2-parameter logistic estimates were used to identify questions that may require modification to avoid bias. Cronbach's alpha coefficients estimated scale internal consistency reliability. Analysis of variance was used to test for significant differences in CHLT-30-DKspa scores by gender, origin, age and education. Results Mean CHLT-30-DKspa score (N=400) was 17.13 (range 0 to 30; SD 6.65). Results confirmed a unidimensional structure (X2[405] =461.55, p=.027, CFI=.993; TLI=.992, RMSEA=.0180). Cronbach's alpha was 0.88. Items Q1-High calorie and Q15-Tumor spread had the lowest item-scale correlations (.148 and .288) and standardized factor loadings (.152 and .302). Items Q1-High Calories, Q8-Palliative Care, and Q19-Smoking Risk had the highest item-difficulty parameters (diff=1.12, 1.21, and 2.40). Conclusions Results generally supported the applicability of the CHLT-30-DKspa for Spanish-speaking healthy populations, with the exception of four items that need to be deleted or revised and further studied Q1, Q8, Q15, and Q19). Practical Implications The CHLT-30-DKspa can be used to assess cancer health literacy among Spanish-speaking populations to advance research on cancer health literacy and outcomes. PMID:27043760
The Development of the Post-Divorce Parental Conflict Scale.

ERIC Educational Resources Information Center

Sonnenblick, Renee; Schwarz, J. Conrad

One difficulty in studying the long-term impact of divorce on children has been the lack of a reliable and valid measure of parental conflict for divorced parents. Items for a post-divorce conflict scale were written and tested using 32 male and 63 female college students from divorced families for Study 1 and 60 male and 75 female students from…
Modeling the Psychometric Properties of Complex Performance Assessment Tasks Using Confirmatory Factor Analysis: A Multistage Model for Calibrating Tasks

ERIC Educational Resources Information Center

Kahraman, Nilufer; De Champlain, Andre; Raymond, Mark

2012-01-01

Item-level information, such as difficulty and discrimination are invaluable to the test assembly, equating, and scoring practices. Estimating these parameters within the context of large-scale performance assessments is often hindered by the use of unbalanced designs for assigning examinees to tasks and raters because such designs result in very…
Effects of Related and Unrelated Context on Recall and Recognition by Adults with High-Functioning Autism Spectrum Disorder

ERIC Educational Resources Information Center

Bowler, Dermot M.; Gaigg, Sebastian B.; Gardiner, John M.

2008-01-01

Memory in autism spectrum disorder (ASD) is characterised by greater difficulties with recall rather than recognition and with a diminished use of semantic or associative relatedness in the aid of recall. Two experiments are reported that test the effects of item-context relatedness on recall and recognition in adults with high-functioning ASD…
A Psychometric Analysis of the Italian Version of the eHealth Literacy Scale Using Item Response and Classical Test Theory Methods.

PubMed

Diviani, Nicola; Dima, Alexandra Lelia; Schulz, Peter Johannes

2017-04-11

The eHealth Literacy Scale (eHEALS) is a tool to assess consumers' comfort and skills in using information technologies for health. Although evidence exists of reliability and construct validity of the scale, less agreement exists on structural validity. The aim of this study was to validate the Italian version of the eHealth Literacy Scale (I-eHEALS) in a community sample with a focus on its structural validity, by applying psychometric techniques that account for item difficulty. Two Web-based surveys were conducted among a total of 296 people living in the Italian-speaking region of Switzerland (Ticino). After examining the latent variables underlying the observed variables of the Italian scale via principal component analysis (PCA), fit indices for two alternative models were calculated using confirmatory factor analysis (CFA). The scale structure was examined via parametric and nonparametric item response theory (IRT) analyses accounting for differences between items regarding the proportion of answers indicating high ability. Convergent validity was assessed by correlations with theoretically related constructs. CFA showed a suboptimal model fit for both models. IRT analyses confirmed all items measure a single dimension as intended. Reliability and construct validity of the final scale were also confirmed. The contrasting results of factor analysis (FA) and IRT analyses highlight the importance of considering differences in item difficulty when examining health literacy scales. The findings support the reliability and validity of the translated scale and its use for assessing Italian-speaking consumers' eHealth literacy. ©Nicola Diviani, Alexandra Lelia Dima, Peter Johannes Schulz. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 11.04.2017.
Dual processing theory and experts' reasoning: exploring thinking on national multiple-choice questions.

PubMed

Durning, Steven J; Dong, Ting; Artino, Anthony R; van der Vleuten, Cees; Holmboe, Eric; Schuwirth, Lambert

2015-08-01

An ongoing debate exists in the medical education literature regarding the potential benefits of pattern recognition (non-analytic reasoning), actively comparing and contrasting diagnostic options (analytic reasoning) or using a combination approach. Studies have not, however, explicitly explored faculty's thought processes while tackling clinical problems through the lens of dual process theory to inform this debate. Further, these thought processes have not been studied in relation to the difficulty of the task or other potential mediating influences such as personal factors and fatigue, which could also be influenced by personal factors such as sleep deprivation. We therefore sought to determine which reasoning process(es) were used with answering clinically oriented multiple-choice questions (MCQs) and if these processes differed based on the dual process theory characteristics: accuracy, reading time and answering time as well as psychometrically determined item difficulty and sleep deprivation. We performed a think-aloud procedure to explore faculty's thought processes while taking these MCQs, coding think-aloud data based on reasoning process (analytic, nonanalytic, guessing or combination of processes) as well as word count, number of stated concepts, reading time, answering time, and accuracy. We also included questions regarding amount of work in the recent past. We then conducted statistical analyses to examine the associations between these measures such as correlations between frequencies of reasoning processes and item accuracy and difficulty. We also observed the total frequencies of different reasoning processes in the situations of getting answers correctly and incorrectly. Regardless of whether the questions were classified as 'hard' or 'easy', non-analytical reasoning led to the correct answer more often than to an incorrect answer. Significant correlations were found between self-reported recent number of hours worked with think-aloud word count and number of concepts used in the reasoning but not item accuracy. When all MCQs were included, 19 % of the variance of correctness could be explained by the frequency of expression of these three think-aloud processes (analytic, nonanalytic, or combined). We found evidence to support the notion that the difficulty of an item in a test is not a systematic feature of the item itself but is always a result of the interaction between the item and the candidate. Use of analytic reasoning did not appear to improve accuracy. Our data suggest that individuals do not apply either System 1 or System 2 but instead fall along a continuum with some individuals falling at one end of the spectrum.
Psychometrics of the preschool behavioral and emotional rating scale with children from early childhood special education settings.

PubMed

Lambert, Matthew C; Cress, Cynthia J; Epstein, Michael H

2015-01-01

In a previous study with a nationally representative sample, researchers found that the items of the Preschool Behavioral and Emotional Rating Scale can best be described by a four-factor structure model (Emotional Regulation, School Readiness, Social Confidence, and Family Involvement). The findings of this investigation replicate and extend these previous results with a national sample of children (N = 1,075) with disabilities enrolled in early childhood special education programs. Data were analyzed using classical tests theory, Rasch modeling, and confirmatory factor analysis. Results confirmed that for the most part, individual items were internally consistent within a four-factor model and showed consistent item difficulty, discrimination, and fit relative to their respective subscale scores. © 2015 Michigan Association for Infant Mental Health.

Automatic Item Generation of Probability Word Problems

ERIC Educational Resources Information Center

Holling, Heinz; Bertling, Jonas P.; Zeuch, Nina

2009-01-01

Mathematical word problems represent a common item format for assessing student competencies. Automatic item generation (AIG) is an effective way of constructing many items with predictable difficulties, based on a set of predefined task parameters. The current study presents a framework for the automatic generation of probability word problems…
Development and Validation of a Multimedia-based Assessment of Scientific Inquiry Abilities

NASA Astrophysics Data System (ADS)

Kuo, Che-Yu; Wu, Hsin-Kai; Jen, Tsung-Hau; Hsu, Ying-Shao

2015-09-01

The potential of computer-based assessments for capturing complex learning outcomes has been discussed; however, relatively little is understood about how to leverage such potential for summative and accountability purposes. The aim of this study is to develop and validate a multimedia-based assessment of scientific inquiry abilities (MASIA) to cover a more comprehensive construct of inquiry abilities and target secondary school students in different grades while this potential is leveraged. We implemented five steps derived from the construct modeling approach to design MASIA. During the implementation, multiple sources of evidence were collected in the steps of pilot testing and Rasch modeling to support the validity of MASIA. Particularly, through the participation of 1,066 8th and 11th graders, MASIA showed satisfactory psychometric properties to discriminate students with different levels of inquiry abilities in 101 items in 29 tasks when Rasch models were applied. Additionally, the Wright map indicated that MASIA offered accurate information about students' inquiry abilities because of the comparability of the distributions of student abilities and item difficulties. The analysis results also suggested that MASIA offered precise measures of inquiry abilities when the components (questioning, experimenting, analyzing, and explaining) were regarded as a coherent construct. Finally, the increased mean difficulty thresholds of item responses along with three performance levels across all sub-abilities supported the alignment between our scoring rubrics and our inquiry framework. Together with other sources of validity in the pilot testing, the results offered evidence to support the validity of MASIA.
Greater loss of object than spatial mnemonic discrimination in aged adults.

PubMed

Reagh, Zachariah M; Ho, Huy D; Leal, Stephanie L; Noche, Jessica A; Chun, Amanda; Murray, Elizabeth A; Yassa, Michael A

2016-04-01

Previous studies across species have established that the aging process adversely affects certain memory-related brain regions earlier than others. Behavioral tasks targeted at the function of vulnerable regions can provide noninvasive methods for assessing the integrity of particular components of memory throughout the lifespan. The present study modified a previous task designed to separately but concurrently test detailed memory for object identity and spatial location. Memory for objects or items is thought to rely on perirhinal and lateral entorhinal cortices, among the first targets of Alzheimer's related neurodegeneration. In line with prior work, we split an aged adult sample into "impaired" and "unimpaired" groups on the basis of a standardized word-learning task. The "impaired" group showed widespread difficulty with memory discrimination, whereas the "unimpaired" group showed difficulty with object, but not spatial memory discrimination. These findings support the hypothesized greater age-related impacts on memory for objects or items in older adults, perhaps even with healthy aging. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
A developmental study of proverb comprehension.

PubMed

Resnick, D A

1982-09-01

Growth in proverb comprehension was hypothesized to result from the gradual emergence of cognitive abilities reflected in a sequence of increasingly complex abilities: story matching, transfer of relations, desymbolization, proverb matching, and paraphrase. Items for these abilities for each of 10 proverbs of two structural types were administered in three test sessions to 438 students in grades three to seven. An analogy subtest was used to measure general intelligence. ANOVA yielded significant main effects for grade, tasks, and proverbs (all p's less than .01). A significant task x proverb interaction (p less than .01) revealed the difficulty of precise control over the language of the items. Proverb structure had no measurable impact on difficulty. Analogy score was a significant factor in performance (p less than .01) but not as potent as age (p less than .01). The sequential order of abilities received only weak confirmation, though tasks did correlate among themselves with medium strength (r's = .50-.70). Individual interviews added a qualitative dimension to the findings. The suitability of cognitive hierarchical models for proverb comprehension was questioned.
The Strengths and Difficulties Questionnaire (SDQ) Revisited in a French-Speaking Population: Proposition of a Reduced Version of the Parent SDQ

ERIC Educational Resources Information Center

Chauvin, Bruno; Leonova, Tamara

2016-01-01

Key concerns about the psychometric properties of the 25-item version of the Strengths and Difficulties Questionnaire (SDQ) have consistently been raised in the literature. The present study aimed at examining the meaningfulness of an alternative model to the SDQ in which 7 problematic items are excluded. French-speaking parents of 262 boys and…
Establishing Reliability and Validity of the Criterion Referenced Exam of GeoloGy Standards EGGS

NASA Astrophysics Data System (ADS)

Guffey, S. K.; Slater, S. J.; Slater, T. F.; Schleigh, S.; Burrows, A. C.

2016-12-01

Discipline-based geoscience education researchers have considerable need for a criterion-referenced, easy-to-administer and -score conceptual diagnostic survey for undergraduates taking introductory science survey courses in order for faculty to better be able to monitor the learning impacts of various interactive teaching approaches. To support ongoing education research across the geosciences, we are continuing to rigorously and systematically work to firmly establish the reliability and validity of the recently released Exam of GeoloGy Standards, EGGS. In educational testing, reliability refers to the consistency or stability of test scores whereas validity refers to the accuracy of the inferences or interpretations one makes from test scores. There are several types of reliability measures being applied to the iterative refinement of the EGGS survey, including test-retest, alternate form, split-half, internal consistency, and interrater reliability measures. EGGS rates strongly on most measures of reliability. For one, Cronbach's alpha provides a quantitative index indicating the extent to which if students are answering items consistently throughout the test and measures inter-item correlations. Traditional item analysis methods further establish the degree to which a particular item is reliably assessing students is actually quantifiable, including item difficulty and item discrimination. Validity, on the other hand, is perhaps best described by the word accuracy. For example, content validity is the to extent to which a measurement reflects the specific intended domain of the content, stemming from judgments of people who are either experts in the testing of that particular content area or are content experts. Perhaps more importantly, face validity is a judgement of how representative an instrument is reflective of the science "at face value" and refers to the extent to which a test appears to measure a the targeted scientific domain as viewed by laypersons, examinees, test users, the public, and other invested stakeholders.
Measuring emotion socialization in families affected by pediatric cancer: Refinement and reduction of the Parents' Beliefs about Children's Emotions questionnaire.

PubMed

Beitra, Danette; El-Behadli, Ana F; Faith, Melissa A

2018-01-01

The aim of this study is to conduct a multimethod psychometric reduction in the Parents' Beliefs about Children's Emotions (PBCE) questionnaire using an item response theory framework with a pediatric oncology sample. Participants were 216 pediatric oncology caregivers who completed the PBCE. The PBCE contains 105 items (11 subscales) rated on a 6-point Likert-type scale. We evaluated the PBCE subscale performance by applying a partial credit model in WINSTEPS. Sixty-six statistically weak items were removed, creating a 44-item PBCE questionnaire with 10 subscales and 3 response options per item. The refined scale displayed good psychometric properties and correlated .910 with the original PBCE. Additional analyses examined dimensionality, item-level (e.g. difficulty), and person-level (e.g. ethnicity) characteristics. The refined PBCE questionnaire provides better test information, improves instrument reliability, and reduces burden on families, providers, and researchers. With this improved measure, providers can more easily identify families who may benefit from psychosocial interventions targeting emotion socialization. The results of the multistep approach presented should be considered preliminary, given the limited sample size.
Study protocol of psychometric properties of the Spanish translation of a competence test in evidence based practice: the Fresno test.

PubMed

Argimon-Pallàs, Josep M; Flores-Mateo, Gemma; Jiménez-Villa, Josep; Pujol-Ribera, Enriqueta; Foz, Gonçal; Bundó-Vidiella, Magda; Juncosa, Sebastià; Fuentes-Bellido, Cruz M; Pérez-Rodríguez, Belén; Margalef-Pallarès, Francesc; Villafafila-Ferrero, Rosa; Forès-Garcia, Dolors; Roman-Martínez, Josep; Vilert-Garroga, Esther

2009-02-24

There are few high-quality instruments for evaluating the effectiveness of Evidence-Based Practice (EBP) curricula with objective outcomes measures. The Fresno test is an instrument that evaluates most of EBP steps with a high reliability and validity in the English original version. The present study has the aims to translate the Fresno questionnaire into Spanish and its subsequent validation to ensure the equivalence of the Spanish version against the English original. The questionnaire will be translated with the back translation technique and tested in Primary Care Teaching Units in Catalonia (PCTU). Participants will be: (a) tutors of Family Medicine residents (expert group); (b) Family Medicine residents in their second year of the Family Medicine training program (novice group), and (c) Family Medicine physicians (intermediate group). The questionnaire will be administered before and after an educational intervention. The educational intervention will be an interactive four half-day sessions designed to develop the knowledge and skills required to EBP. Responsiveness statistics used in the analysis will be the effect size, the standardised response mean and Guyatt's method. For internal consistency reliability, two measures will be used: corrected item-total correlations and Cronbach's alpha. Inter-rater reliability will be tested using Kappa coefficient for qualitative items and intra-class correlation coefficient for quantitative items and the overall score. Construct validity, item difficulty, item discrimination and feasibility will be determined. The validation of the Fresno questionnaire into different languages will enable the expansion of the questionnaire, as well as allowing comparison between countries and the evaluation of different teaching models.
Self-reported walking ability predicts functional mobility performance in frail older adults.

PubMed

Alexander, N B; Guire, K E; Thelen, D G; Ashton-Miller, J A; Schultz, A B; Grunawalt, J C; Giordani, B

2000-11-01

To determine how self-reported physical function relates to performance in each of three mobility domains: walking, stance maintenance, and rising from chairs. Cross-sectional analysis of older adults. University-based laboratory and community-based congregate housing facilities. Two hundred twenty-one older adults (mean age, 79.9 years; range, 60-102 years) without clinical evidence of dementia (mean Folstein Mini-Mental State score, 28; range, 24-30). We compared the responses of these older adults on a questionnaire battery used by the Established Populations for the Epidemiologic Study of the Elderly (EPESE) project, to performance on mobility tasks of graded difficulty. Responses to the EPESE battery included: (1) whether assistance was required to perform seven Katz activities of daily living (ADL) items, specifically with walking and transferring; (2) three Rosow-Breslau items, including the ability to walk up stairs and walk a half mile; and (3) five Nagi items, including difficulty stooping, reaching, and lifting objects. The performance measures included the ability to perform, and time taken to perform, tasks in three summary score domains: (1) walking ("Walking," seven tasks, including walking with an assistive device, turning, stair climbing, tandem walking); (2) stance maintenance ("Stance," six tasks, including unipedal, bipedal, tandem, and maximum lean); and (3) chair rise ("Chair Rise," six tasks, including rising from a variety of seat heights with and without the use of hands for assistance). A total score combines scores in each Walking, Stance, and Chair Rise domain. We also analyzed how cognitive/ behavioral factors such as depression and self-efficacy related to the residuals from the self-report and performance-based ANOVA models. Rosow-Breslau items have the strongest relationship with the three performance domains, Walking, Stance, and Chair Rise (eta-squared ranging from 0.21 to 0.44). These three performance domains are as strongly related to one Katz ADL item, walking (eta-squared ranging from 0.15 to 0.33) as all of the Katz ADL items combined (eta-squared ranging from 0.21 to 0.35). Tests of problem solving and psychomotor speed, the Trails A and Trails B tests, are significantly correlated with the residuals from the self-report and performance-based ANOVA models. Compared with the rest of the EPESE self-report items, self-report items related to walking (such as Katz walking and Rosow-Breslau items) are better predictors of functional mobility performance on tasks involving walking, stance maintenance, and rising from chairs. Compared with other self-report items, self-reported walking ability may be the best predictor of overall functional mobility.
The Testing Methods and Gender Differences in Multiple-Choice Assessment

NASA Astrophysics Data System (ADS)

Ng, Annie W. Y.; Chan, Alan H. S.

2009-10-01

This paper provides a comprehensive review of the multiple-choice assessment in the past two decades for facilitating people to conduct effective testing in various subject areas. It was revealed that a variety of multiple-choice test methods viz. conventional multiple-choice, liberal multiple-choice, elimination testing, confidence marking, probability testing, and order-of-preference scheme are available for use in assessing subjects' knowledge and decision ability. However, the best multiple-choice test method for use has not yet been identified. The review also indicated that the existence of gender differences in multiple-choice task performance might be due to the test area, instruction/scoring condition, and item difficulty.
Validation of an instrument for assessing teacher knowledge of basic language constructs of literacy.

PubMed

Binks-Cantrell, Emily; Joshi, R Malatesha; Washburn, Erin K

2012-10-01

Recent national reports have stressed the importance of teacher knowledge in teaching reading. However, in the past, teachers' knowledge of language and literacy constructs has typically been assessed with instruments that are not fully tested for validity. In the present study, an instrument was developed; and its reliability, item difficulty, and item discrimination were computed and examined to identify model fit by applying exploratory factor analysis. Such analyses showed that the instrument demonstrated adequate estimates of reliability in assessing teachers' knowledge of language constructs. The implications for professional development of in-service teachers as well as preservice teacher education are also discussed.
Measuring Student Learning with Item Response Theory

ERIC Educational Resources Information Center

Lee, Young-Jin; Palazzo, David J.; Warnakulasooriya, Rasil; Pritchard, David E.

2008-01-01

We investigate short-term learning from hints and feedback in a Web-based physics tutoring system. Both the skill of students and the difficulty and discrimination of items were determined by applying item response theory (IRT) to the first answers of students who are working on for-credit homework items in an introductory Newtonian physics…
Combining the Best of Two Standard Setting Methods: The Ordered Item Booklet Angoff

ERIC Educational Resources Information Center

Smith, Russell W.; Davis-Becker, Susan L.; O'Leary, Lisa S.

2014-01-01

This article describes a hybrid standard setting method that combines characteristics of the Angoff (1971) and Bookmark (Mitzel, Lewis, Patz & Green, 2001) methods. The proposed approach utilizes strengths of each method while addressing weaknesses. An ordered item booklet, with items sorted based on item difficulty, is used in combination…
Development of the Assessment of Belief Conflict in Relationship-14 (ABCR-14).

PubMed

Kyougoku, Makoto; Teraoka, Mutsumi; Masuda, Noriko; Ooura, Mariko; Abe, Yasushi

2015-01-01

Nurses and other healthcare workers frequently experience belief conflict, one of the most important, new stress-related problems in both academic and clinical fields. In this study, using a sample of 1,683 nursing practitioners, we developed The Assessment of Belief Conflict in Relationship-14 (ABCR-14), a new scale that assesses belief conflict in the healthcare field. Standard psychometric procedures were used to develop and test the scale, including a qualitative framework concept and item-pool development, item reduction, and scale development. We analyzed the psychometric properties of ABCR-14 according to entropy, polyserial correlation coefficient, exploratory factor analysis, confirmatory factor analysis, average variance extracted, Cronbach's alpha, Pearson product-moment correlation coefficient, and multidimensional item response theory (MIRT). The results of the analysis supported a three-factor model consisting of 14 items. The validity and reliability of ABCR-14 was suggested by evidence from high construct validity, structural validity, hypothesis testing, internal consistency reliability, and concurrent validity. The result of the MIRT offered strong support for good item response of item slope parameters and difficulty parameters. However, the ABCR-14 Likert scale might need to be explored from the MIRT point of view. Yet, as mentioned above, there is sufficient evidence to support that ABCR-14 has high validity and reliability. The ABCR-14 demonstrates good psychometric properties for nursing belief conflict. Further studies are recommended to confirm its application in clinical practice.
Comparative Racial Analysis of Enlisted Advancement Exams: Item- Difficulty.

DTIC Science & Technology

1975-07-01

11cm-ana lysis Promotion Racial comparison Equal opportunity 1 20. ABSTRACT (Continue on reveree aide 11 neceeemry mnd Identity by block...improving equal oppor- tunity in career growth for minority groups. The study of exam item- difficulty levels is the first of a series of technical reports...under Exploratory Development Task Area PF55.521.032 (Contemporary Social Issues). J. J. CLARKIN Commanding Officer SUMMARY Purpose A number of
What Aspect of Dependence Does the Fagerström Test for Nicotine Dependence Measure?

PubMed Central

DiFranza, Joseph R.; Wellman, Robert J.; Savageau, Judith A.; Beccia, Ariel; Ursprung, W. W. Sanouri A.; McMillen, Robert

2013-01-01

Although the Fagerström Test for Nicotine Dependence (FTND) and the Heaviness of Smoking Index (HSI) are widely used, there is a uncertainty regarding what is measured by these scales. We examined associations between these instruments and items assessing different aspects of dependence. Adult current smokers (n = 422, mean age 33.3 years, 61.9% female) completed a web-based survey comprised of items related to demographics and smoking behavior plus (1) the FTND and HSI; (2) the Autonomy over Tobacco Scale (AUTOS) with subscales measuring Withdrawal, Psychological Dependence, and Cue-Induced Cravings; (3) 6 questions tapping smokers' wanting, craving, or needing experiences in response to withdrawal and the latency to each experience during abstinence; (4) 3 items concerning how smokers prepare to cope with periods of abstinence. In regression analyses the Withdrawal subscale of the AUTOS was the strongest predictor of FTND and HSI scores, followed by taking precautions not to run out of cigarettes or smoking extra to prepare for abstinence. The FTND and its six items, including the HSI, consistently showed the strongest correlations with withdrawal, suggesting that the behaviors described by the items of the FTND are primarily indicative of a difficulty maintaining abstinence because of withdrawal symptoms. PMID:25969829
What aspect of dependence does the fagerström test for nicotine dependence measure?

PubMed

DiFranza, Joseph R; Wellman, Robert J; Savageau, Judith A; Beccia, Ariel; Ursprung, W W Sanouri A; McMillen, Robert

2013-01-01

Although the Fagerström Test for Nicotine Dependence (FTND) and the Heaviness of Smoking Index (HSI) are widely used, there is a uncertainty regarding what is measured by these scales. We examined associations between these instruments and items assessing different aspects of dependence. Adult current smokers (n = 422, mean age 33.3 years, 61.9% female) completed a web-based survey comprised of items related to demographics and smoking behavior plus (1) the FTND and HSI; (2) the Autonomy over Tobacco Scale (AUTOS) with subscales measuring Withdrawal, Psychological Dependence, and Cue-Induced Cravings; (3) 6 questions tapping smokers' wanting, craving, or needing experiences in response to withdrawal and the latency to each experience during abstinence; (4) 3 items concerning how smokers prepare to cope with periods of abstinence. In regression analyses the Withdrawal subscale of the AUTOS was the strongest predictor of FTND and HSI scores, followed by taking precautions not to run out of cigarettes or smoking extra to prepare for abstinence. The FTND and its six items, including the HSI, consistently showed the strongest correlations with withdrawal, suggesting that the behaviors described by the items of the FTND are primarily indicative of a difficulty maintaining abstinence because of withdrawal symptoms.
The Utrecht questionnaire (U-CEP) measuring knowledge on clinical epidemiology proved to be valid.

PubMed

Kortekaas, Marlous F; Bartelink, Marie-Louise E L; de Groot, Esther; Korving, Helen; de Wit, Niek J; Grobbee, Diederick E; Hoes, Arno W

2017-02-01

Knowledge on clinical epidemiology is crucial to practice evidence-based medicine. We describe the development and validation of the Utrecht questionnaire on knowledge on Clinical epidemiology for Evidence-based Practice (U-CEP); an assessment tool to be used in the training of clinicians. The U-CEP was developed in two formats: two sets of 25 questions and a combined set of 50. The validation was performed among postgraduate general practice (GP) trainees, hospital trainees, GP supervisors, and experts. Internal consistency, internal reliability (item-total correlation), item discrimination index, item difficulty, content validity, construct validity, responsiveness, test-retest reliability, and feasibility were assessed. The questionnaire was externally validated. Internal consistency was good with a Cronbach alpha of 0.8. The median item-total correlation and mean item discrimination index were satisfactory. Both sets were perceived as relevant to clinical practice. Construct validity was good. Both sets were responsive but failed on test-retest reliability. One set took 24 minutes and the other 33 minutes to complete, on average. External GP trainees had comparable results. The U-CEP is a valid questionnaire to assess knowledge on clinical epidemiology, which is a prerequisite for practicing evidence-based medicine in daily clinical practice. Copyright © 2016 Elsevier Inc. All rights reserved.
An algorithm for calculating exam quality as a basis for performance-based allocation of funds at medical schools.

PubMed

Kirschstein, Timo; Wolters, Alexander; Lenz, Jan-Hendrik; Fröhlich, Susanne; Hakenberg, Oliver; Kundt, Günther; Darmüntzel, Martin; Hecker, Michael; Altiner, Attila; Müller-Hilke, Brigitte

2016-01-01

The amendment of the Medical Licensing Act (ÄAppO) in Germany in 2002 led to the introduction of graded assessments in the clinical part of medical studies. This, in turn, lent new weight to the importance of written tests, even though the minimum requirements for exam quality are sometimes difficult to reach. Introducing exam quality as a criterion for the award of performance-based allocation of funds is expected to steer the attention of faculty members towards more quality and perpetuate higher standards. However, at present there is a lack of suitable algorithms for calculating exam quality. In the spring of 2014, the students' dean commissioned the "core group" for curricular improvement at the University Medical Center in Rostock to revise the criteria for the allocation of performance-based funds for teaching. In a first approach, we developed an algorithm that was based on the results of the most common type of exam in medical education, multiple choice tests. It included item difficulty and discrimination, reliability as well as the distribution of grades achieved. This algorithm quantitatively describes exam quality of multiple choice exams. However, it can also be applied to exams involving short assay questions and the OSCE. It thus allows for the quantitation of exam quality in the various subjects and - in analogy to impact factors and third party grants - a ranking among faculty. Our algorithm can be applied to all test formats in which item difficulty, the discriminatory power of the individual items, reliability of the exam and the distribution of grades are measured. Even though the content validity of an exam is not considered here, we believe that our algorithm is suitable as a general basis for performance-based allocation of funds.
Validating the Assessment for Measuring Indonesian Secondary School Students Performance in Ecology

NASA Astrophysics Data System (ADS)

Rachmatullah, A.; Roshayanti, F.; Ha, M.

2017-09-01

The aims of this current study are validating the American Association for the Advancement of Science (AAAS) Ecology assessment and examining the performance of Indonesian secondary school students on the assessment. A total of 611 Indonesian secondary school students (218 middle school students and 393 high school students) participated in the study. Forty-five items of AAAS assessment in the topic of Interdependence in Ecosystems were divided into two versions which every version has 21 similar items. Linking item method was used as the method to combine those two versions of assessment and further Rasch analyses were utilized to validate the instrument. Independent sample t-test was also run to compare the performance of Indonesian students and American students based on the mean of item difficulty. We found that from the total of 45 items, three items were identified as misfitting items. Later on, we also found that both Indonesian middle and high school students were significantly lower performance with very large and medium effect size compared to American students. We will discuss our findings in the regard of validation issue and the connection to Indonesian student’s science literacy.

Improving Person-Job Congruence during the Classification Process: Item Development and Initial Testing of a Pictorial Interest Instrument

DTIC Science & Technology

2006-09-01

classification by making it applicant- centric while improving job satisfaction and performance , reducing attrition, and increasing continuation...produce greater job satisfaction , increase performance , and lengthen tenure. The difficulty the Navy faces is that enlisted applicants have limited work...P-J) fit. Empirically, job performance , employee satisfaction , and retention are contingent upon appropriately matching personnel with their desired
Predictive value of health-related fitness tests for self-reported mobility difficulties among high-functioning elderly men and women.

PubMed

Hämäläinen, H Pauliina; Suni, Jaana H; Pasanen, Matti E; Malmberg, Jarmo J; Miilunpalo, Seppo I

2006-06-01

The functional independence of elderly populations deteriorates with age. Several tests of physical performance have been developed for screening elderly persons who are at risk of losing their functional independence. The purpose of the present study was to investigate whether several components of health-related fitness (HRF) are valid in predicting the occurrence of self-reported mobility difficulties (MD) among high-functioning older adults. Subjects were community-dwelling men and women, born 1917-1941, who participated in the assessment of HRF [6.1-m (20-ft) walk, one-leg stand, backwards walk, trunk side-bending, dynamic back extension, one-leg squat, 1-km walk] and who were free of MD in 1996 (no difficulties in walking 2- km, n=788; no difficulties in climbing stairs, n=647). Postal questionnaires were used to assess the prevalence of MD in 1996 and the occurrence of new MD in 2002. Logistic regression analysis was used as the statistical method. Both inability to perform the backwards walk and a poorer result in it were associated with risk of walking difficulties in the logistic model, with all the statistically significant single test items included. Results of 1-km walk time and one-leg squat strength test were also associated with risk, although the squat was statistically significant only in two older birth cohorts. Regarding stair-climbing difficulties, poorer results in the 1-km walk, dynamic back extension and one-leg squat tests were associated with increased risk of MD. The backwards walk, one-leg squat, dynamic back extension and 1-km walk tests were the best predictors of MD. These tests are recommended for use in screening high-functioning older people at risk of MD, as well as to target physical activity counseling to those components of HRF that are important for functional independence.
Validation of an instrument to assess visual ability in children with visual impairment in China.

PubMed

Huang, Jinhai; Khadka, Jyoti; Gao, Rongrong; Zhang, Sifang; Dong, Wenpeng; Bao, Fangjun; Chen, Haisi; Wang, Qinmei; Chen, Hao; Pesudovs, Konrad

2017-04-01

To validate a visual ability instrument for school-aged children with visual impairment in China by translating, culturally adopting and Rasch scaling the Cardiff Visual Ability Questionnaire for Children (CVAQC). The 25-item CVAQC was translated into Mandarin using a standard protocol. The translated version (CVAQC-CN) was subjected to cognitive testing to ensure a proper cultural adaptation of its content. Then, the CVAQC-CN was interviewer-administered to 114 school-aged children and young people with visual impairment. Rasch analysis was carried out to assess its psychometric properties. The correlation between the CVAQC-CN visual ability scores and clinical measure of vision (visual acuity; VA and contrast sensitivity, CS) were assessed using Spearman's r. Based on cultural adaptation exercise, cognitive testing, missing data and Rasch metrics-based iterative item removal, three items were removed from the original 25. The 22-item CVAQC-CN demonstrated excellent measurement precision (person separation index, 3.08), content validity (item separation, 10.09) and item reliability (0.99). Moreover, the CVAQC-CN was unidimensional and had no item bias. The person-item map indicated good targeting of item difficulty to person ability. The CVAQC-CN had moderate correlations between CS (-0.53, p<0.00001) and VA (0.726, p<0.00001), respectively, indicating its validity. The 22-item CVAQC-CN is a psychometrically robust and valid instrument to measure visual ability in children with visual impairment in China. The instrument can be used as a clinical and research outcome measure to assess the change in visual ability after low vision rehabilitation intervention. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Understanding Orgasmic Difficulty in Women.

PubMed

Rowland, David L; Kolba, Tiffany N

2016-08-01

Women's primary issue with the orgasmic phase is usually difficulty reaching orgasm. To identify predictors of orgasmic difficulty in women within the context of a partnered sexual experience; to assess the relation between orgasmic difficulty and self-reported levels of sexual desire or interest and arousal in women; and to assess the interrelations among three dimensions of orgasmic response during partnered sex: self-reported time to reach orgasm, general difficulty or ease of reaching orgasm, and level of distress or concern. Drawing from a community-based sample using the Internet, 866 women were queried on a 26-item survey regarding their difficulty reaching orgasm during partnered sex. Four hundred sixteen women who indicated difficulty also responded to items assessing arousal and desire difficulties, level of distress about their condition, and their estimated time to reach orgasm. Answers to a 26-item survey on surveyed women's difficulty reaching orgasm during partnered sex. Age, arousal difficulty, and lubrication difficulty predicted difficulty reaching orgasm in the overall sample. In the subsample of women reporting difficulty, approximately half reported issues with arousal. Women with arousal problems reported greater difficulty reaching orgasm but did not differ from those without arousal problems on measurements of orgasm latency or levels of distress. Slightly more than half the women experiencing difficulty reaching orgasm were distressed by their condition; distressed women reported greater difficulty reaching orgasm and longer latencies to orgasm than non-distressed counterparts. They also reported lower satisfaction with their sexual relationship. This study indicates the importance of assessing multiple parameters when investigating orgasmic problems in women, including arousal issues, levels of distress, and latency to orgasm. Results also clarify that women with arousal problems do not differ substantially from those without arousal problems; in contrast, women distressed by their condition differ from non-distressed women along some critical dimensions. Although orgasmic problems decreased with age, the overall relation of this variable to distress, arousal, and latency to orgasm was essentially unchanged across age groups. Copyright © 2016 International Society for Sexual Medicine. Published by Elsevier Inc. All rights reserved.
Psychometric Evaluation of a Cultural Competency Assessment Instrument for Health Professionals

PubMed Central

Haywood, Sonja H.; Goode, Tawara; Gao, Yong; Smith, Kristyn; Bronheim, Suzanne; Flocke, Susan A; Zyzanski, Steve

2012-01-01

Background Few valid and reliable measures exist for health care professionals interested in determining their levels of cultural and linguistic competence. Objective To evaluate the measurement properties of the Cultural Competence Health Practitioner Assessment (CCHPA-129). Methods The CCHPA-129 is a 129-item web-based instrument, developed by the National Center for Cultural Competence (NCCC). Responses on the CCHPA -129 were examined using factor analysis; Rasch modeling; and Differential Item Functioning (DIF) across race, ethnicity, gender, and profession. Subjects 2504 practitioners, including 1864 nurses (RN/LPN,/BSN); 341 clinicians (PA/NP); and 299 physicians (MD/DO), who completed the CCHPA-129 online between 2005 and 2008. Results Three factors representing domains of knowledge, adapting practice, and promoting health for culturally and linguistically diverse populations accounted for 46% of the variance. Among Knowledge factor items, 53% (23/43) fit the Rasch model, item difficulties ranged from −1.01 logits (least difficult) to +1.11 logits (most difficult), separation index (SI) 13.82, and Cronbach’s α 0.92. Forty-seven percent (21/44) Adapting Practice factor items fit the model, item difficulties −0.07 to +1.11 logits, SI 11.59, Cronbach’s α 0.88; and 58% (23/39). Promoting Health factor items fit the model, item difficulties −1.01 to +1.38 logits, SI 22.64, Cronbach’s α 0.92. Early evidence of validity was established by known groups having statistically different scores. Conclusion The 67-item CCHPA-67 is psychometrically sound. This shorted instrument can be used to establish associations between practitioners’ cultural and linguistic competence and health outcomes as well as to evaluate interventions to increase practitioners’ cultural and linguistic competence. PMID:22437625
Some factors underlying individual differences in speech recognition on PRESTO: a first report.

PubMed

Tamati, Terrin N; Gilbert, Jaimie L; Pisoni, David B

2013-01-01

Previous studies investigating speech recognition in adverse listening conditions have found extensive variability among individual listeners. However, little is currently known about the core underlying factors that influence speech recognition abilities. To investigate sensory, perceptual, and neurocognitive differences between good and poor listeners on the Perceptually Robust English Sentence Test Open-set (PRESTO), a new high-variability sentence recognition test under adverse listening conditions. Participants who fell in the upper quartile (HiPRESTO listeners) or lower quartile (LoPRESTO listeners) on key word recognition on sentences from PRESTO in multitalker babble completed a battery of behavioral tasks and self-report questionnaires designed to investigate real-world hearing difficulties, indexical processing skills, and neurocognitive abilities. Young, normal-hearing adults (N = 40) from the Indiana University community participated in the current study. Participants' assessment of their own real-world hearing difficulties was measured with a self-report questionnaire on situational hearing and hearing health history. Indexical processing skills were assessed using a talker discrimination task, a gender discrimination task, and a forced-choice regional dialect categorization task. Neurocognitive abilities were measured with the Auditory Digit Span Forward (verbal short-term memory) and Digit Span Backward (verbal working memory) tests, the Stroop Color and Word Test (attention/inhibition), the WordFam word familiarity test (vocabulary size), the Behavioral Rating Inventory of Executive Function-Adult Version (BRIEF-A) self-report questionnaire on executive function, and two performance subtests of the Wechsler Abbreviated Scale of Intelligence (WASI) Performance Intelligence Quotient (IQ; nonverbal intelligence). Scores on self-report questionnaires and behavioral tasks were tallied and analyzed by listener group (HiPRESTO and LoPRESTO). The extreme groups did not differ overall on self-reported hearing difficulties in real-world listening environments. However, an item-by-item analysis of questions revealed that LoPRESTO listeners reported significantly greater difficulty understanding speakers in a public place. HiPRESTO listeners were significantly more accurate than LoPRESTO listeners at gender discrimination and regional dialect categorization, but they did not differ on talker discrimination accuracy or response time, or gender discrimination response time. HiPRESTO listeners also had longer forward and backward digit spans, higher word familiarity ratings on the WordFam test, and lower (better) scores for three individual items on the BRIEF-A questionnaire related to cognitive load. The two groups did not differ on the Stroop Color and Word Test or either of the WASI performance IQ subtests. HiPRESTO listeners and LoPRESTO listeners differed in indexical processing abilities, short-term and working memory capacity, vocabulary size, and some domains of executive functioning. These findings suggest that individual differences in the ability to encode and maintain highly detailed episodic information in speech may underlie the variability observed in speech recognition performance in adverse listening conditions using high-variability PRESTO sentences in multitalker babble. American Academy of Audiology.
Validation of a General and Sport Nutrition Knowledge Questionnaire in Adolescents and Young Adults: GeSNK.

PubMed

Calella, Patrizia; Iacullo, Vittorio Maria; Valerio, Giuliana

2017-04-29

Good knowledge of nutrition is widely thought to be an important aspect to maintaining a balanced and healthy diet. The aim of this study was to develop and validate a new reliable tool to measure the general and the sport nutrition knowledge (GeSNK) in people who used to practice sports at different levels. The development of (GeSNK) was carried out in six phases as follows: (1) item development and selection by a panel of experts; (2) pilot study in order to assess item difficulty and item discrimination; (3) measurement of the internal consistency; (4) reliability assessment with a 2-week test-retest analysis; (5) concurrent validity was tested by administering the questionnaire along with other two similar tools; (6) construct validity by administering the questionnaire to three groups of young adults with different general nutrition and sport nutrition knowledge. The final questionnaire, consisted of 62 items of the original 183 questions. It is a consistent, valid, and suitable instrument that can be applied over time, making it a promising tool to look at the relationship between nutrition knowledge, demographic characteristics, and dietary behavior in adolescents and young adults.
The optimal sequence and selection of screening test items to predict fall risk in older disabled women: the Women's Health and Aging Study.

PubMed

Lamb, Sarah E; McCabe, Chris; Becker, Clemens; Fried, Linda P; Guralnik, Jack M

2008-10-01

Falls are a major cause of disability, dependence, and death in older people. Brief screening algorithms may be helpful in identifying risk and leading to more detailed assessment. Our aim was to determine the most effective sequence of falls screening test items from a wide selection of recommended items including self-report and performance tests, and to compare performance with other published guidelines. Data were from a prospective, age-stratified, cohort study. Participants were 1002 community-dwelling women aged 65 years old or older, experiencing at least some mild disability. Assessments of fall risk factors were conducted in participants' homes. Fall outcomes were collected at 6 monthly intervals. Algorithms were built for prediction of any fall over a 12-month period using tree classification with cross-set validation. Algorithms using performance tests provided the best prediction of fall events, and achieved moderate to strong performance when compared to commonly accepted benchmarks. The items selected by the best performing algorithm were the number of falls in the last year and, in selected subpopulations, frequency of difficulty balancing while walking, a 4 m walking speed test, body mass index, and a test of knee extensor strength. The algorithm performed better than that from the American Geriatric Society/British Geriatric Society/American Academy of Orthopaedic Surgeons and other guidance, although these findings should be treated with caution. Suggestions are made on the type, number, and sequence of tests that could be used to maximize estimation of the probability of falling in older disabled women.
Task-based learning versus problem-oriented lecture in neurology continuing medical education.

PubMed

Vakani, Farhan; Jafri, Wasim; Ahmad, Amina; Sonawalla, Aziz; Sheerani, Mughis

2014-01-01

To determine whether general practitioners learned better with task-based learning or problem-oriented lecture in a Continuing Medical Education (CME) set-up. Quasi-experimental study. The Aga Khan University, Karachi campus, from April to June 2012. Fifty-nine physicians were given a choice to opt for either Task-based Learning (TBL) or Problem Oriented Lecture (PBL) in a continuing medical education set-up about headaches. The TBL group had 30 participants divided into 10 small groups, and were assigned case-based tasks. The lecture group had 29 participants. Both groups were given a pre and a post-test. Pre/post assessment was done using one-best MCQs. The reliability coefficient of scores for both the groups was estimated through Cronbach's alpha. An item analysis for difficulty and discriminatory indices was calculated for both the groups. Paired t-test was used to determine the difference between pre- and post-test scores of both groups. Independent t-test was used to compare the impact of the two teaching methods in terms of learning through scores produced by MCQ test. Cronbach's alpha was 0.672 for the lecture group and 0.881 for TBL group. Item analysis for difficulty (p) and discriminatory indexes (d) was obtained for both groups. The results for the lecture group showed pre-test (p) = 42% vs. post-test (p) = 43%; pre- test (d) = 0.60 vs. post-test (d) = 0.40. The TBL group showed pre -test (p) = 48% vs. post-test (p) = 70%; pre-test (d) = 0.69 vs. post-test (d) = 0.73. Lecture group pre-/post-test mean scores were (8.52 ± 2.95 vs. 12.41 ± 2.65; p < 0.001), where TBL group showed (9.70 ± 3.65 vs. 14 ± 3.99; p < 0.001). Independent t-test exhibited an insignificant difference at baseline (lecture 8.52 ± 2.95 vs. TBL 9.70 ± 3.65; p = 0.177). The post-scores were not statistically different lecture 12.41 ± 2.65 vs. TBL 14 ± 3.99; p = 0.07). Both delivery methods were found to be equally effective, showing statistically insignificant differences. However, TBL groups' post-test higher mean scores and radical increase in the post-test difficulty index demonstrated improved learning through TBL delivery and calls for further exploration of longitudinal studies in the context of CME.
A Comparison of Alternate-Choice and True-False Item Forms Used in Classroom Examinations.

ERIC Educational Resources Information Center

Maihoff, N. A.; Mehrens, Wm. A.

A comparison is presented of alternate-choice and true-false item forms used in an undergraduate natural science course. The alternate-choice item is a modified two-choice multiple-choice item in which the two responses are included within the question stem. This study (1) compared the difficulty level, discrimination level, reliability, and…
Explaining and Controlling for the Psychometric Properties of Computer-Generated Figural Matrix Items

ERIC Educational Resources Information Center

Freund, Philipp Alexander; Hofer, Stefan; Holling, Heinz

2008-01-01

Figural matrix items are a popular task type for assessing general intelligence (Spearman's g). Items of this kind can be constructed rationally, allowing the implementation of computerized generation algorithms. In this study, the influence of different task parameters on the degree of difficulty in matrix items was investigated. A sample of N =…
Estimation of Item Response Theory Parameters in the Presence of Missing Data

ERIC Educational Resources Information Center

Finch, Holmes

2008-01-01

Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same…
The consequences of language proficiency and difficulty of lexical access for translation performance and priming.

PubMed

Francis, Wendy S; Tokowicz, Natasha; Kroll, Judith F

2014-01-01

Repetition priming was used to assess how proficiency and the ease or difficulty of lexical access influence bilingual translation. Two experiments, conducted at different universities with different Spanish-English bilingual populations and materials, showed repetition priming in word translation for same-direction and different-direction repetitions. Experiment 1, conducted in an English-dominant environment, revealed an effect of translation direction but not of direction match, whereas Experiment 2, conducted in a more balanced bilingual environment, showed an effect of direction match but not of translation direction. A combined analysis on the items common to both studies revealed that bilingual proficiency was negatively associated with response time (RT), priming, and the degree of translation asymmetry in RTs and priming. An item analysis showed that item difficulty was positively associated with RTs, priming, and the benefit of same-direction over different-direction repetition. Thus, although both participant accuracy and item accuracy are indices of learning, they have distinct effects on translation RTs and on the learning that is captured by the repetition-priming paradigm.
The Quality of Working Life Questionnaire for Cancer Survivors (QWLQ-CS): a Pre-test Study.

PubMed

de Jong, Merel; Tamminga, Sietske J; de Boer, Angela G E M; Frings-Dresen, Monique H W

2016-06-02

Returning to and continuing work is important to many cancer survivors, but also represents a challenge. We know little about subjective work outcomes and how cancer survivors perceive being returned to work. Therefore, we developed the Quality of Working Life Questionnaire for Cancer Survivors (QWLQ-CS). Our aim was to pre-test the items of the initial QWLQ-CS on acceptability and comprehensiveness. In addition, item retention was performed by pre-assessing the relevance scores and response distributions of the items in the QWLQ-CS. Semi-structured interviews were conducted after cancer survivors, who had returned to work, filled in the 102 items of the QWLQ-CS. To improve acceptability and comprehensiveness, the semi-structured interview inquired about items that were annoying, difficult, confusing, twofold or redundant. If cancer survivors had difficulty explaining their opinion or emotion about an item, the interviewer used verbal probing technique to investigate the cancer survivor's underlying thoughts. The cancer survivors' comments on the items were analysed, and items were revised accordingly. Decisions on item retention regarding the relevance of items and the response distributions were made by means of pre-set decision rules. The 19 cancer survivors (53 % male) had a mean age of 51 ± 11 years old. They were diagnosed between 2009 and 2013 with lymphoma, leukaemia, prostate cancer, breast cancer, or colon cancer. Acceptability of the QWLQ-CS was good - none of the items were annoying - but 73 items were considered difficult, confusing, twofold or redundant. To improve acceptability, for instance, the authors replaced the phrase 'disease' with 'health situation' in several items. Consequently, comprehensiveness was improved by the authors rephrasing and adjusting items by adding clarifying words, such as 'in the work situation'. The pre-assessment of the relevance scores resulted in a sufficient number of cancer survivors indicating the items as relevant to their quality of working life, and no evident indication for uneven response distributions. Therefore, all items were retained. The 104 items of the preliminary QWLQ-CS were found relevant, acceptable and comprehensible by cancer survivors who have returned to work. The QWLQ-CS is now suitable for larger sample sizes of cancer survivors, which is necessary to test the psychometric properties of this questionnaire.
Team-based learning on a third-year pediatric clerkship improves NBME subject exam blood disorder scores.

PubMed

Saudek, Kris; Treat, Robert

2015-01-01

Purpose At our institution, speculation amongst medical students and faculty exists as to whether team-based learning (TBL) can improve scores on high-stakes examinations over traditional didactic lectures. Faculty with experience using TBL developed and piloted a required TBL blood disorders (BD) module for third-year medical students on their pediatric clerkship. The purpose of this study is to analyze the BD scores from the NBME subject exams before and after the introduction of the module. Methods We analyzed institutional and national item difficulties for BD items from the NBME pediatrics content area item analysis reports from 2011 to 2014 before (pre) and after (post) the pilot (October 2012). Total scores of 590 NBME subject examination students from examinee performance profiles were analyzed pre/post. t-Tests and Cohen's d effect sizes were used to analyze item difficulties for institutional versus national scores and pre/post comparisons of item difficulties and total scores. Results BD scores for our institution were 0.65 (±0.19) compared to 0.62 (±0.15) nationally (P=0.346; Cohen's d=0.15). The average of post-consecutive BD scores for our students was 0.70(±0.21) compared to examinees nationally [0.64 (±0.15)] with a significant mean difference (P=0.031; Cohen's d=0.43). The difference in our institutions pre [0.65 (±0.19)] and post [0.70 (±0.21)] BD scores trended higher (P=0.391; Cohen's d=0.27). Institutional BD scores were higher than national BD scores for both pre and post, with an effect size that tripled from pre to post scores. Institutional BD scores increased after the use of the TBL module, while overall exam scores remained steadily above national norms. Conclusions Institutional BD scores were higher than national BD scores for both pre and post, with an effect size that tripled from pre to post scores. Institutional BD scores increased after the use of the TBL module, while overall exam scores remained steadily above national norms.
Dutch-Flemish translation of nine pediatric item banks from the Patient-Reported Outcomes Measurement Information System (PROMIS)®.

PubMed

Haverman, Lotte; Grootenhuis, Martha A; Raat, Hein; van Rossum, Marion A J; van Dulmen-den Broeder, Eline; Hoppenbrouwers, Karel; Correia, Helena; Cella, David; Roorda, Leo D; Terwee, Caroline B

2016-03-01

The Patient-Reported Outcomes Measurement Information System (PROMIS(®)) is a new, state-of-the-art assessment system for measuring patient-reported health and well-being of adults and children. It has the potential to be more valid, reliable, and responsive than existing PROMs. The items banks are designed to be self-reported and completed by children aged 8-18 years. The PROMIS items can be administered in short forms or through computerized adaptive testing. This paper describes the translation and cultural adaption of nine PROMIS item banks (151 items) for children in Dutch-Flemish. The translation was performed by FACITtrans using standardized PROMIS methodology and approved by the PROMIS Statistical Center. The translation included four forward translations, two back-translations, three independent reviews (at least two Dutch, one Flemish), and pretesting in 24 children from the Netherlands and Flanders. For some items, it was necessary to have separate translations for Dutch and Flemish: physical function-mobility (three items), anger (one item), pain interference (two items), and asthma impact (one item). Challenges faced in the translation process included scarcity or overabundance of possible translations, unclear item descriptions, constructs broader/smaller in the target language, difficulties in rank ordering items, differences in unit of measurement, irrelevant items, or differences in performance of activities. By addressing these challenges, acceptable translations were obtained for all items. The Dutch-Flemish PROMIS items are linguistically equivalent to the original USA version. Short forms are now available for use, and entire item banks are ready for cross-cultural validation in the Netherlands and Flanders.
An objective measure of physical function of elderly outpatients. The Physical Performance Test.

PubMed

Reuben, D B; Siu, A L

1990-10-01

Direct observation of physical function has the advantage of providing an objective, quantifiable measure of functional capabilities. We have developed the Physical Performance Test (PPT), which assesses multiple domains of physical function using observed performance of tasks that simulate activities of daily living of various degrees of difficulty. Two versions are presented: a nine-item scale that includes writing a sentence, simulated eating, turning 360 degrees, putting on and removing a jacket, lifting a book and putting it on a shelf, picking up a penny from the floor, a 50-foot walk test, and climbing stairs (scored as two items); and a seven-item scale that does not include stairs. The PPT can be completed in less than 10 minutes and requires only a few simple props. We then tested the validity of PPT using 183 subjects (mean age, 79 years) in six settings including four clinical practices (one of Parkinson's disease patients), a board-and-care home, and a senior citizens' apartment. The PPT was reliable (Cronbach's alpha = 0.87 and 0.79, interrater reliability = 0.99 and 0.93 for the nine-item and seven-item tests, respectively) and demonstrated concurrent validity with self-reported measures of physical function. Scores on the PPT for both scales were highly correlated (.50 to .80) with modified Rosow-Breslau, Instrumental and Basic Activities of Daily Living scales, and Tinetti gait score. Scores on the PPT were more moderately correlated with self-reported health status, cognitive status, and mental health (.24 to .47), and negatively with age (-.24 and -.18). Thus, the PPT also demonstrated construct validity. The PPT is a promising objective measurement of physical function, but its clinical and research value for screening, monitoring, and prediction will have to be determined.
[Development of a scale to measure the self concept of cesarean section mothers].

PubMed

Lee, M L; Cho, J H

1990-08-01

Recently, the rate of cesarean section in Korea has been increasing. The results of several previous studies in foreign countries on the emotional responses of cesarean section mothers showed that they might experience difficulties in the mother-infant interaction due to fatigue, lack of early mother-infant interaction, disappointments, anger, feelings of loss of control, and other factors. Human behavior is said to be determined by one's self concept, and self concept is influenced by both internal and external environmental factors. A scale to measure the self concept of cesarean section mothers was needed in order to identify those who might have difficulties in the mother-infant interactions in future. The purposes of this study were to develop a measuring scale, and to test its reliability and validity. The process of this study was as follows. A structured interview was done with 50 cesarean section and vaginal delivery mothers to find their state of emotional reaction after giving birth to their babies. Based on the results of the interviews, a 50 items Likert scale was developed. The self concept of 268 cesarean section and vaginal delivery mothers who were hospitalized at six hospital in seoul were measured, during the period between Feb. 1 and April 30. Reviewing the discriminating power of each item by means of crosstabulation, ten items were selected for the final scale. The reliability and validity of this ten item scale were tested by Cronbach's alpha and t-test, using spss pc + package. The results of this study and recommendation are as follows. 1. The ten selected items were as follows. I feel pains in my breast. (-) I have a good appetite now. (+) I feel pains in my flank. (-) I feel fine now. (+) My body seems to have returned to its prepregnant state. (+) Thinking of the delivery process, I feel sorry. (-) I want to hold my baby in my arms. (+) I want to keep my own life, even if I became a mother. (-) I want to delegate the care of the baby to my mother/mother in law. (-) I think baby is my alter ege. (+) 2. The reliability of this scale was tested by Cronbach's alpha, and the coefficient of this scale was .8066. 3. The construct validity of this scale was tested by means of known group methods. The value of self concept for cesarean section mother was significantly lower than for vaginal delivery mothers (t = -5.51, df = 266, p = 0.007).(ABSTRACT TRUNCATED AT 400 WORDS)
Improving Inpatient Surveys: Web-Based Computer Adaptive Testing Accessed via Mobile Phone QR Codes

PubMed Central

2016-01-01

Background The National Health Service (NHS) 70-item inpatient questionnaire surveys inpatients on their perceptions of their hospitalization experience. However, it imposes more burden on the patient than other similar surveys. The literature shows that computerized adaptive testing (CAT) based on item response theory can help shorten the item length of a questionnaire without compromising its precision. Objective Our aim was to investigate whether CAT can be (1) efficient with item reduction and (2) used with quick response (QR) codes scanned by mobile phones. Methods After downloading the 2008 inpatient survey data from the Picker Institute Europe website and analyzing the difficulties of this 70-item questionnaire, we used an author-made Excel program using the Rasch partial credit model to simulate 1000 patients’ true scores followed by a standard normal distribution. The CAT was compared to two other scenarios of answering all items (AAI) and the randomized selection method (RSM), as we investigated item length (efficiency) and measurement accuracy. The author-made Web-based CAT program for gathering patient feedback was effectively accessed from mobile phones by scanning the QR code. Results We found that the CAT can be more efficient for patients answering questions (ie, fewer items to respond to) than either AAI or RSM without compromising its measurement accuracy. A Web-based CAT inpatient survey accessed by scanning a QR code on a mobile phone was viable for gathering inpatient satisfaction responses. Conclusions With advances in technology, patients can now be offered alternatives for providing feedback about hospitalization satisfaction. This Web-based CAT is a possible option in health care settings for reducing the number of survey items, as well as offering an innovative QR code access. PMID:26935793
Improving Inpatient Surveys: Web-Based Computer Adaptive Testing Accessed via Mobile Phone QR Codes.

PubMed

Chien, Tsair-Wei; Lin, Weir-Sen

2016-03-02

The National Health Service (NHS) 70-item inpatient questionnaire surveys inpatients on their perceptions of their hospitalization experience. However, it imposes more burden on the patient than other similar surveys. The literature shows that computerized adaptive testing (CAT) based on item response theory can help shorten the item length of a questionnaire without compromising its precision. Our aim was to investigate whether CAT can be (1) efficient with item reduction and (2) used with quick response (QR) codes scanned by mobile phones. After downloading the 2008 inpatient survey data from the Picker Institute Europe website and analyzing the difficulties of this 70-item questionnaire, we used an author-made Excel program using the Rasch partial credit model to simulate 1000 patients' true scores followed by a standard normal distribution. The CAT was compared to two other scenarios of answering all items (AAI) and the randomized selection method (RSM), as we investigated item length (efficiency) and measurement accuracy. The author-made Web-based CAT program for gathering patient feedback was effectively accessed from mobile phones by scanning the QR code. We found that the CAT can be more efficient for patients answering questions (ie, fewer items to respond to) than either AAI or RSM without compromising its measurement accuracy. A Web-based CAT inpatient survey accessed by scanning a QR code on a mobile phone was viable for gathering inpatient satisfaction responses. With advances in technology, patients can now be offered alternatives for providing feedback about hospitalization satisfaction. This Web-based CAT is a possible option in health care settings for reducing the number of survey items, as well as offering an innovative QR code access.

Improving Patients’ Understanding of Terms and Phrases Commonly Used in Self-Reported Measures of Sexual Function

PubMed Central

Alexander, Angel M.; Flynn, Kathryn E.; Hahn, Elizabeth A.; Jeffery, Diana D.; Keefe, Francis J.; Reeve, Bryce B.; Schultz, Wesley; Reese, Jennifer Barsky; Shelby, Rebecca A.; Weinfurt, Kevin P.

2014-01-01

Introduction There is a significant gap in research regarding the readability and comprehension of existing sexual function measures. Patient-reported outcome measures may use terms not well understood by respondents with low literacy. Aim To test comprehension of words and phrases typically used in sexual function measures to improve validity for all individuals, including those with low literacy. Methods We recruited 20 men and 28 women for cognitive interviews on version 2.0 of the PROMIS Sexual Function and Satisfaction measures. We assessed participants’ reading level using the word reading subtest of the Wide Range Achievement Test (WRAT). Sixteen participants were classified as having low literacy. Main Outcome Measures In the first round of cognitive interviews, each survey item was reviewed by 5 or more people, at least 2 of whom had lower than a ninth-grade reading level (low literacy). Patient feedback was incorporated into a revised version of the items. In the second round of interviews, an additional 3 or more people (at least 1 with low literacy) reviewed each revised item. Results Participants with low literacy had difficulty comprehending terms such as aroused, orgasm, erection, ejaculation, incontinence, and vaginal penetration. Women across a range of literacy levels had difficulty with clinical terms like labia and clitoris. We modified unclear terms to include parenthetical descriptors or slang equivalents, which generally improved comprehension. Conclusions Common words and phrases used across measures of self-reported sexual function are not universally understood. Researchers should appreciate these misunderstandings as a potential source of error in studies using self-reported measures of sexual function. PMID:24902984
Realizing a Rasch measurement through instructionally- sequenced domains of test items.

NASA Astrophysics Data System (ADS)

Schulz, E. Matthew

2016-11-01

This paper presents results from a project in which instructionally-sequenced domains were defined for purposes of constructing measures that that conform to an ideal in Guttman scaling and Rasch measurement. A fundamental idea in these measurement systems is that every person higher on the measurement scale can do everything that lower-level persons can do, plus at least one more thing. This idea has had limited application in educational measurement due to the stochastic nature of item response data and the sheer number of items needed to obtain reliable measures. However, it has been shown by Schulz, Lee, and Mullen [1] that this ideal can be can be realized at a higher level of abstraction - when items within a content strand are aggregated into a small number of domains that are ordered in instructional timing and difficulty. The present paper shows how this was done, and the results, in an achievement level setting project for the 2007 Grade 12 NAEP Economics Assessment.
Systemic factors of errors in the case identification process of the national routine health information system: A case study of Modified Field Health Services Information System in the Philippines

PubMed Central

2011-01-01

Background The quality of data in national health information systems has been questionable in most developing countries. However, the mechanisms of errors in the case identification process are not fully understood. This study aimed to investigate the mechanisms of errors in the case identification process in the existing routine health information system (RHIS) in the Philippines by measuring the risk of committing errors for health program indicators used in the Field Health Services Information System (FHSIS 1996), and characterizing those indicators accordingly. Methods A structured questionnaire on the definitions of 12 selected indicators in the FHSIS was administered to 132 health workers in 14 selected municipalities in the province of Palawan. A proportion of correct answers (difficulty index) and a disparity of two proportions of correct answers between higher and lower scored groups (discrimination index) were calculated, and the patterns of wrong answers for each of the 12 items were abstracted from 113 valid responses. Results None of 12 items reached a difficulty index of 1.00. The average difficulty index of 12 items was 0.266 and the discrimination index that showed a significant difference was 0.216 and above. Compared with these two cut-offs, six items showed non-discrimination against lower difficulty indices of 0.035 (4/113) to 0.195 (22/113), two items showed a positive discrimination against lower difficulty indices of 0.142 (16/113) and 0.248 (28/113), and four items showed a positive discrimination against higher difficulty indices of 0.469 (53/113) to 0.673 (76/113). Conclusions The results suggest three characteristics of definitions of indicators such as those that are (1) unsupported by the current conditions in the health system, i.e., (a) data are required from a facility that cannot directly generate the data and, (b) definitions of indicators are not consistent with its corresponding program; (2) incomplete or ambiguous, which allow several interpretations; and (3) complete yet easily misunderstood by health workers. Taking systemic factors into account, the case identification step needs to be reviewed and designed to generate intended data in health information systems. PMID:21995369
[Instrument to measure adherence in hypertensive patients: contribution of Item Response Theory].

PubMed

Rodrigues, Malvina Thaís Pacheco; Moreira, Thereza Maria Magalhaes; Vasconcelos, Alexandre Meira de; Andrade, Dalton Francisco de; Silva, Daniele Braz da; Barbetta, Pedro Alberto

2013-06-01

To analyze, by means of "Item Response Theory", an instrument to measure adherence to t treatment for hypertension. Analytical study with 406 hypertensive patients with associated complications seen in primary care in Fortaleza, CE, Northeastern Brazil, 2011 using "Item Response Theory". The stages were: dimensionality test, calibrating the items, processing data and creating a scale, analyzed using the gradual response model. A study of the dimensionality of the instrument was conducted by analyzing the polychoric correlation matrix and factor analysis of complete information. Multilog software was used to calibrate items and estimate the scores. Items relating to drug therapy are the most directly related to adherence while those relating to drug-free therapy need to be reworked because they have less psychometric information and low discrimination. The independence of items, the small number of levels in the scale and low explained variance in the adjustment of the models show the main weaknesses of the instrument analyzed. The "Item Response Theory" proved to be a relevant analysis technique because it evaluated respondents for adherence to treatment for hypertension, the level of difficulty of the items and their ability to discriminate between individuals with different levels of adherence, which generates a greater amount of information. The instrument analyzed is limited in measuring adherence to hypertension treatment, by analyzing the "Item Response Theory" of the item, and needs adjustment. The proper formulation of the items is important in order to accurately measure the desired latent trait.
Analysis of Iowa Data I: Initial Study and Findings. Research Report 80-1.

ERIC Educational Resources Information Center

Samejima, Fumiko; Trestman, Robert L.

The first step of the data analysis with respect to the eventual application of the various new methods in latent trait theory is here initiated. The data are a set of approximately 500 item responses of each of 7,439 examinees to the Iowa Tests of Basic Skills, Form 6, on one of three difficulty levels, which correspond to the ages of 11, 12 and…
Redintegration, task difficulty, and immediate serial recall tasks.

PubMed

Ritchie, Gabrielle; Tolan, Georgina Anne; Tehan, Gerald

2015-03-01

While current theoretical models remain somewhat inconclusive in their explanation of short-term memory (STM), many theories suggest at least a contribution of long-term memory (LTM) to the short-term system. A number of researchers refer to this process as redintegration (e.g., Schweickert, 1993). Under short-term recall conditions, the current study investigated the effects of redintegration and task difficulty in order to extend research conducted by Neale and Tehan (2007). Thirty participants in Experiment 1 and 26 participants in Experiment 2 completed a serial recall task in which retention interval, presentation rate, and articulatory suppression were used to modify task difficulty. Redintegration was examined by manipulating the characteristics of the to-be-remembered items; lexicality in Experiment 1 and wordlikeness in Experiment 2. Responses were scored based on correct-in-position recall, item scoring, and order accuracy scoring. In line with the Neale and Tehan results, as the difficulty of the task increased so did the effects of redintegration. This was evident in that the advantage for words in Experiment 1 and wordlikeness in Experiment 2 decreased as task difficulty increased. This relationship was observed for item but not order memory, and findings were discussed in relation to the theory of redintegration. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
Measuring ability to assess claims about treatment effects: a latent trait analysis of items from the ‘Claim Evaluation Tools’ database using Rasch modelling

PubMed Central

Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D

2017-01-01

Background The Claim Evaluation Tools database contains multiple-choice items for measuring people’s ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. Objectives To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. Participants We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Results Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Conclusion Most of the items conformed well to the Rasch model’s expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. PMID:28550019
Comparison of Rating Scales in the Development of Patient-Reported Outcome Measures for Children with Eye Disorders.

PubMed

Hatt, Sarah R; Leske, David A; Wernimont, Suzanne M; Birch, Eileen E; Holmes, Jonathan M

2017-03-01

A rating scale is a critical component of patient-reported outcome instrument design, but the optimal rating scale format for pediatric use has not been investigated. We compared rating scale performance when administering potential questionnaire items to children with eye disorders and their parents. Three commonly used rating scales were evaluated: frequency (never, sometimes, often, always), severity (not at all, a little, some, a lot), and difficulty (not difficult, a little difficult, difficult, very difficult). Ten patient-derived items were formatted for each rating scale, and rating scale testing order was randomized. Both child and parent were asked to comment on any problems with, or a preference for, a particular scale. Any confusion about options or inability to answer was recorded. Twenty-one children, aged 5-17 years, with strabismus, amblyopia, or refractive error were recruited, each with one of their parents. Of the first 10 children, 4 (40%) had problems using the difficulty scale, compared with 1 (10%) using frequency, and none using severity. The difficulty scale was modified, replacing the word "difficult" with "hard." Eleven additional children (plus parents) then completed all 3 questionnaires. No children had problems using any scale. Four (36%) parents had problems using the difficulty ("hard") scale and 1 (9%) with frequency. Regarding preference, 6 (55%) of 11 children and 5 (50%) of 10 parents preferred using the frequency scale. Children and parents found the frequency scale and question format to be the most easily understood. Children and parents also expressed preference for the frequency scale, compared with the difficulty and severity scales. We recommend frequency rating scales for patient-reported outcome measures in pediatric populations.
Design and development of food safety knowledge and attitude scales for consumer food safety education.

PubMed

Medeiros, Lydia C; Hillers, Virginia N; Chen, Gang; Bergmann, Verna; Kendall, Patricia; Schroeder, Mary

2004-11-01

The objective of this study was to design and develop food safety knowledge and attitude scales based on food-handling guidelines developed by a national panel of food safety experts. Knowledge (n=43) and attitude (n=49) questions were developed and pilot-tested with a variety of consumer groups. Final questions were selected based on item analysis and on validity and reliability statistical tests. Knowledge questions were tested in Washington State with participants in low-income nutrition education programs (pretest/posttest n=58, test/retest n=19) and college students (pretest/posttest n=34). Attitude questions were tested in Ohio with nutrition education program participants (n=30) and college students (non-nutrition majors n=138, nutrition majors n=57). Item analysis, paired sample t tests, Pearson's correlation coefficients, and Cronbach's alpha were used. Reliability and validity tests of individual items and the question sets were used to reduce the scales to 18 knowledge questions and 10 attitude questions. The knowledge and attitude scales covered topics ranked as important by a national panel of experts and met most validity and reliability standards. The 18-item knowledge questionnaire had instructional sensitivity (mean score increase of more than three points after instruction), internal reliability (Cronbach's alpha >.75), and produced similar results in test-retest without intervention (coefficient of stability=.81). Knowledge of correct procedures for hand washing and avoiding cross-contamination was widespread before instruction. Knowledge was limited regarding avoiding food preparation while ill, cooking hamburgers, high-risk foods, and whether cooked rice and potatoes could be stored at room temperature. The 10-item attitude scale had an appropriate range of responses (item difficulty) and produced similar results in test-retest ( P
Item Information in the Rasch Model. Project Psychometric Aspects of Item Banking No. 34. Research Report 88-7.

ERIC Educational Resources Information Center

Engelen, Ron J. H.; And Others

Fisher's information measure for the item difficulty parameter in the Rasch model and its marginal and conditional formulations are investigated. It is shown that expected item information in the unconditional model equals information in the marginal model, provided the assumption of sampling examinees from an ability distribution is made. For the…
Physics 30 Program Machine-Scorable Open-Ended Questions: Unit 2: Electric and Magnetic Forces. Diploma Examinations Program.

ERIC Educational Resources Information Center

Alberta Dept. of Education, Edmonton.

This document outlines the use of machine-scorable open-ended questions for the evaluation of Physics 30 in Alberta. Contents include: (1) an introduction to the questions; (2) sample instruction sheet; (3) fifteen sample items; (4) item information including the key, difficulty, and source of each item; (5) solutions to items having multiple…
Some Factors Underlying Individual Differences in Speech Recognition on PRESTO: A First Report

PubMed Central

Tamati, Terrin N.; Gilbert, Jaimie L.; Pisoni, David B.

2013-01-01

Background Previous studies investigating speech recognition in adverse listening conditions have found extensive variability among individual listeners. However, little is currently known about the core, underlying factors that influence speech recognition abilities. Purpose To investigate sensory, perceptual, and neurocognitive differences between good and poor listeners on PRESTO, a new high-variability sentence recognition test under adverse listening conditions. Research Design Participants who fell in the upper quartile (HiPRESTO listeners) or lower quartile (LoPRESTO listeners) on key word recognition on sentences from PRESTO in multitalker babble completed a battery of behavioral tasks and self-report questionnaires designed to investigate real-world hearing difficulties, indexical processing skills, and neurocognitive abilities. Study Sample Young, normal-hearing adults (N = 40) from the Indiana University community participated in the current study. Data Collection and Analysis Participants’ assessment of their own real-world hearing difficulties was measured with a self-report questionnaire on situational hearing and hearing health history. Indexical processing skills were assessed using a talker discrimination task, a gender discrimination task, and a forced-choice regional dialect categorization task. Neurocognitive abilities were measured with the Auditory Digit Span Forward (verbal short-term memory) and Digit Span Backward (verbal working memory) tests, the Stroop Color and Word Test (attention/inhibition), the WordFam word familiarity test (vocabulary size), the BRIEF-A self-report questionnaire on executive function, and two performance subtests of the WASI Performance IQ (non-verbal intelligence). Scores on self-report questionnaires and behavioral tasks were tallied and analyzed by listener group (HiPRESTO and LoPRESTO). Results The extreme groups did not differ overall on self-reported hearing difficulties in real-world listening environments. However, an item-by-item analysis of questions revealed that LoPRESTO listeners reported significantly greater difficulty understanding speakers in a public place. HiPRESTO listeners were significantly more accurate than LoPRESTO listeners at gender discrimination and regional dialect categorization, but they did not differ on talker discrimination accuracy or response time, or gender discrimination response time. HiPRESTO listeners also had longer forward and backward digit spans, higher word familiarity ratings on the WordFam test, and lower (better) scores for three individual items on the BRIEF-A questionnaire related to cognitive load. The two groups did not differ on the Stroop Color and Word Test or either of the WASI performance IQ subtests. Conclusions HiPRESTO listeners and LoPRESTO listeners differed in indexical processing abilities, short-term and working memory capacity, vocabulary size, and some domains of executive functioning. These findings suggest that individual differences in the ability to encode and maintain highly detailed episodic information in speech may underlie the variability observed in speech recognition performance in adverse listening conditions using high-variability PRESTO sentences in multitalker babble. PMID:24047949
Development and Validation of the Spanish Numeracy Understanding in Medicine Instrument.

PubMed

Jacobs, Elizabeth A; Walker, Cindy M; Miller, Tamara; Fletcher, Kathlyn E; Ganschow, Pamela S; Imbert, Diana; O'Connell, Maria; Neuner, Joan M; Schapira, Marilyn M

2016-11-01

The Spanish-speaking population in the U.S. is large and growing and is known to have lower health literacy than the English-speaking population. Less is known about the health numeracy of this population due to a lack of health numeracy measures in Spanish. we aimed to develop and validate a short and easy to use measure of health numeracy for Spanish-speaking adults: the Spanish Numeracy Understanding in Medicine Instrument (Spanish-NUMi). Items were generated based on qualitative studies in English- and Spanish-speaking adults and translated into Spanish using a group translation and consensus process. Candidate items for the Spanish NUMi were selected from an eight-item validated English Short NUMi. Differential Item Functioning (DIF) was conducted to evaluate equivalence between English and Spanish items. Cronbach's alpha was computed as a measure of reliability and a Pearson's correlation was used to evaluate the association between test scores and the Spanish Test of Functional Health Literacy (S-TOFHLA) and education level. Two-hundred and thirty-two Spanish-speaking Chicago residents were included in the study. The study population was diverse in age, gender, and level of education and 70 % reported Mexico as their country of origin. Two items of the English eight-item Short NUMi demonstrated DIF and were dropped. The resulting six-item test had a Cronbach's alpha of 0.72, a range of difficulty using classical test statistics (percent correct: 0.48 to 0.86), and adequate discrimination (item-total score correlation: 0.34-0.49). Scores were positively correlated with print literacy as measured by the S- TOFHLA (r = 0.67; p < 0.001) and varied as predicted across grade level; mean scores for up to eighth grade, ninth through twelfth grade, and some college experience or more, respectively, were 2.48 (SD ± 1.64), 4.15 (SD ± 1.45), and 4.82 (SD ± 0.37). The Spanish NUMi is a reliable and valid measure of important numerical concepts used in communicating health information.
Work Functioning Among Firefighters: A Comparison Between Self-Reported Limitations and Functional Task Performance.

PubMed

MacDermid, Joy C; Tang, Kenneth; Sinden, Kathryn E; D'Amico, Robert

2018-05-25

Purpose Performance-based and disease indicators have been widely studied in firefighters; self-reported work role limitations have not. The aim of this study was to describe the distributions and correlations of a generic self-reported Work Limitations Questionnaire (WLQ-26) and firefighting-specific task performance-based tests. Methods Active firefighters from the City of Hamilton Fire Services (n = 293) were recruited. Participants completed the WLQ-26 to quantify on-the-job difficulties over five work domains: work scheduling (4 items), output demands (7 items), physical demands (8 items), mental demands (4 items), and social demands (3 items). A subset of participants (n = 149) were also assessed on hose drag and stair climb with a high-rise pack performance-based tests. Descriptive statistics and correlations were used to compare item/subscale performance; and to describe the inter-relationships between tests. Results The mean WLQ-26 item scores (/5) ranged from 4.1 to 4.4 (median = 5 for all items); most firefighters (54.5-80.5%) selected "difficult none of the time" response option on all items. A substantial ceiling effect was observed across all five WLQ-26 subscales as 44.0-55.6% were in the highest category. Subscale means ranged from 61.8 (social demands) to 78.7 (output demands and physical demands). Internal consistency exceeded 0.90 on all subscales. For the hose drag task, the mean time-to-completion was 48.0 s (SD = 14.5; range 20.4-95.0). For the stair climb task, the mean time-to-completion was 76.7 s (SD = 37.2; range 21.0-218.0). There were no significant correlations between self-report work limitations and performance of firefighting tasks. Conclusions The WLQ-26 measured five domains, but had ceiling effects in firefighters. Performance-based testing showed wider score range, lacked ceiling effects and did not correlate to the WLQ-26. A firefighter-specific, self-report role functioning scale may be needed to identify compromised work role capabilities in firefighters.
Fractionating the Neural Substrates of Incidental Recognition Memory

ERIC Educational Resources Information Center

Greene, Ciara M.; Vidaki, Kleio; Soto, David

2015-01-01

Familiar stimuli are typically accompanied by decreases in neural response relative to the presentation of novel items, but these studies often include explicit instructions to discriminate old and new items; this creates difficulties in partialling out the contribution of top-down intentional orientation to the items based on recognition goals.…
Teacher Perceived Difficulty in Implementing Differentiated Instructional Strategies in Primary School

ERIC Educational Resources Information Center

Gaitas, Sérgio; Alves Martins, Margarida

2017-01-01

This study analyses teacher perceived difficulty in implementing differentiated instructional strategies in regular classes. The participants were 273 Portuguese primary school teachers with teaching experience ranging from 1 to 33 years. A 39-item questionnaire was used to evaluate teacher perceived difficulty in relation to different…
Measuring and Predicting Graded Reader Difficulty

ERIC Educational Resources Information Center

Holster, Trevor A.; Lake, J. W.; Pellowe, William R.

2017-01-01

This study used many-faceted Rasch measurement to investigate the difficulty of graded readers using a 3-item survey. Book difficulty was compared with Kyoto Level, Yomiyasusa Level, Lexile Level, book length, mean sentence length, and mean word frequency. Word frequency and Kyoto Level were found to be ineffective in predicting students'…
Critical success factors in awareness of and choice towards low vision rehabilitation.

PubMed

Fraser, Sarah A; Johnson, Aaron P; Wittich, Walter; Overbury, Olga

2015-01-01

The goal of the current study was to examine the critical factors indicative of an individual's choice to access low vision rehabilitation services. Seven hundred and forty-nine visually impaired individuals, from the Montreal Barriers Study, completed a structured interview and questionnaires (on visual function, coping, depression, satisfaction with life). Seventy-five factors from the interview and questionnaires were entered into a data-driven Classification and Regression Tree Analysis in order to determine the best predictors of awareness group: positive personal choice (I knew and I went), negative personal choice (I knew and did not go), and lack of information (Nobody told me, and I did not know). Having a response of moderate to no difficulty on item 6 (reading signs) of the Visual Function Index 14 (VF-14) indicated that the person had made a positive personal choice to seek rehabilitation, whereas reporting a great deal of difficulty on this item was associated with a lack of information on low vision rehabilitation. In addition to this factor, symptom duration of under nine years, moderate difficulty or less on item 5 (seeing steps or curbs) of the VF-14, and an indication of little difficulty or less on item 3 (reading large print) of the VF-14 further identified those who were more likely to have made a positive personal choice. Individuals in the lack of information group also reported greater difficulty on items 3 and 5 of the VF-14 and were more likely to be male. The duration-of-symptoms factor suggests that, even in the positive choice group, it may be best to offer rehabilitation services early. Being male and responding moderate difficulty or greater to the VF-14 questions about far, medium-distance and near situations involving vision was associated with individuals that lack information. Consequently, these individuals may need additional education about the benefits of low vision services in order to make a positive personal choice. © 2014 The Authors Ophthalmic & Physiological Optics © 2014 The College of Optometrists.
Validity of a Protocol for Adult Self-Report of Dyslexia and Related Difficulties

PubMed Central

Snowling, Margaret; Dawes, Piers; Nash, Hannah; Hulme, Charles

2012-01-01

Background There is an increased prevalence of reading and related difficulties in children of dyslexic parents. In order to understand the causes of these difficulties, it is important to quantify the risk factors passed from parents to their offspring. Method 417 adults completed a protocol comprising a 15-item questionnaire rating reading and related skills and a scale assessing ADHD symptoms; 344 completed reading, nonword reading and spelling tests. Results A confirmatory factor analysis with four factors (Reading, Word Finding, Attention and Hyperactivity) provided a reasonable fit to the data. The Reading Factor showed robust correlations with measured literacy skills. Adults who reported as dyslexic, or rated their reading difficulties as more severe, gained lower scores on objective measures of literacy skills. Although the sensitivity of the new scale was acceptable, it tended to miss some cases of low literacy. Conclusions Self-report scales of reading and of attention difficulties are useful for identifying adults with reading and attention difficulties which may confer risks on their children of related problems. It is important for research following children at family risk of dyslexia to be aware of these effects. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22271419
Validity of a protocol for adult self-report of dyslexia and related difficulties.

PubMed

Snowling, Margaret; Dawes, Piers; Nash, Hannah; Hulme, Charles

2012-02-01

There is an increased prevalence of reading and related difficulties in children of dyslexic parents. In order to understand the causes of these difficulties, it is important to quantify the risk factors passed from parents to their offspring. 417 adults completed a protocol comprising a 15-item questionnaire rating reading and related skills and a scale assessing ADHD symptoms; 344 completed reading, nonword reading and spelling tests. A confirmatory factor analysis with four factors (Reading, Word Finding, Attention and Hyperactivity) provided a reasonable fit to the data. The Reading Factor showed robust correlations with measured literacy skills. Adults who reported as dyslexic, or rated their reading difficulties as more severe, gained lower scores on objective measures of literacy skills. Although the sensitivity of the new scale was acceptable, it tended to miss some cases of low literacy. Self-report scales of reading and of attention difficulties are useful for identifying adults with reading and attention difficulties which may confer risks on their children of related problems. It is important for research following children at family risk of dyslexia to be aware of these effects. Copyright © 2011 John Wiley & Sons, Ltd.

Classical test theory and Rasch analysis validation of the Upper Limb Functional Index in subjects with upper limb musculoskeletal disorders.

PubMed

Bravini, Elisabetta; Franchignoni, Franco; Giordano, Andrea; Sartorio, Francesco; Ferriero, Giorgio; Vercelli, Stefano; Foti, Calogero

2015-01-01

To perform a comprehensive analysis of the psychometric properties and dimensionality of the Upper Limb Functional Index (ULFI) using both classical test theory and Rasch analysis (RA). Prospective, single-group observational design. Freestanding rehabilitation center. Convenience sample of Italian-speaking subjects with upper limb musculoskeletal disorders (N=174). Not applicable. The Italian version of the ULFI. Data were analyzed using parallel analysis, exploratory factor analysis, and RA for evaluating dimensionality, functioning of rating scale categories, item fit, hierarchy of item difficulties, and reliability indices. Parallel analysis revealed 2 factors explaining 32.5% and 10.7% of the response variance. RA confirmed the failure of the unidimensionality assumption, and 6 items out of the 25 misfitted the Rasch model. When the analysis was rerun excluding the misfitting items, the scale showed acceptable fit values, loading meaningfully to a single factor. Item separation reliability and person separation reliability were .98 and .89, respectively. Cronbach alpha was .92. RA revealed weakness of the scale concerning dimensionality and internal construct validity. However, a set of 19 ULFI items defined through the statistical process demonstrated a unidimensional structure, good psychometric properties, and clinical meaningfulness. These findings represent a useful starting point for further analyses of the tool (based on modern psychometric approaches and confirmatory factor analysis) in larger samples, including different patient populations and nationalities. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Development of the Assessment of Belief Conflict in Relationship-14 (ABCR-14)

PubMed Central

Kyougoku, Makoto; Teraoka, Mutsumi; Masuda, Noriko; Ooura, Mariko; Abe, Yasushi

2015-01-01

Purpose Nurses and other healthcare workers frequently experience belief conflict, one of the most important, new stress-related problems in both academic and clinical fields. Methods In this study, using a sample of 1,683 nursing practitioners, we developed The Assessment of Belief Conflict in Relationship-14 (ABCR-14), a new scale that assesses belief conflict in the healthcare field. Standard psychometric procedures were used to develop and test the scale, including a qualitative framework concept and item-pool development, item reduction, and scale development. We analyzed the psychometric properties of ABCR-14 according to entropy, polyserial correlation coefficient, exploratory factor analysis, confirmatory factor analysis, average variance extracted, Cronbach’s alpha, Pearson product-moment correlation coefficient, and multidimensional item response theory (MIRT). Results The results of the analysis supported a three-factor model consisting of 14 items. The validity and reliability of ABCR-14 was suggested by evidence from high construct validity, structural validity, hypothesis testing, internal consistency reliability, and concurrent validity. The result of the MIRT offered strong support for good item response of item slope parameters and difficulty parameters. However, the ABCR-14 Likert scale might need to be explored from the MIRT point of view. Yet, as mentioned above, there is sufficient evidence to support that ABCR-14 has high validity and reliability. Conclusion The ABCR-14 demonstrates good psychometric properties for nursing belief conflict. Further studies are recommended to confirm its application in clinical practice. PMID:26247356
The Contribution of Prospective Memory Performance to the Neuropsychological Assessment of Mild Cognitive Impairment.

PubMed

Lee, Stephen; Ong, Ben; Pike, Kerryn E; Mullaly, Elizabeth; Rand, Elizabeth; Storey, Elsdon; Ames, David; Saling, Michael; Clare, Linda; Kinsella, Glynda J

2016-01-01

Prospective memory difficulties are a feature of the amnestic form of mild cognitive impairment (aMCI). Although comprehensive test batteries of prospective memory are suitable for clinical practice, they are lengthy, which has detracted from their widespread clinical use. Our aim was to investigate the utility of a brief screening measure of prospective memory, which can be incorporated into a clinical neuropsychological assessment. Seventy-seven healthy older adults (HOA) and 77 participants with aMCI were administered a neuropsychological test battery, including a prospective memory screening measure (Envelope Task), a retrospective memory measure (CVLT-II), and a multi-item subjective memory questionnaire (Prospective and Retrospective Memory Questionnaire; PRMQ) and a single-item subjective memory scale. Compared with HOA participants, participants with aMCI performed poorly on the Envelope Task (η(2) = .38), which provided good discrimination of the aMCI and HOA groups (AUC = .83). In the aMCI group, there was a small but significant relationship between the Envelope Task and the single-item subjective rating of memory, with the Envelope Task accounting for 5-6% of the variance in subjective memory after accounting for emotional status. This relationship of prospective memory and subjective memory was not significant for the multi-item questionnaire (PRMQ); and, retrospective memory was not a significant predictor of self-rated memory, single-item, or multi-item. A brief screening measure of prospective memory, the Envelope Task, provides useful support to traditional memory measures in detecting aMCI.
New evidence of factor structure and measurement invariance of the SDQ across five European nations.

PubMed

Ortuño-Sierra, Javier; Fonseca-Pedrero, Eduardo; Aritio-Solana, Rebeca; Velasco, Alvaro Moreno; de Luis, Edurne Chocarro; Schumann, Gunter; Cattrell, Anna; Flor, Herta; Nees, Frauke; Banaschewski, Tobias; Bokde, Arun; Whelan, Rob; Buechel, Christian; Bromberg, Uli; Conrod, Patricia; Frouin, Vincent; Papadopoulos, Dimitri; Gallinat, Juergen; Garavan, Hugh; Heinz, Andreas; Walter, Henrik; Struve, Maren; Gowland, Penny; Paus, Tomáš; Poustka, Luise; Martinot, Jean-Luc; Paillère-Martinot, Marie-Laure; Vetter, Nora C; Smolka, Michael N; Lawrence, Claire

2015-12-01

The main purpose of the present study was to analyse the internal structure and to test the measurement invariance of the Strengths and Difficulties Questionnaire (SDQ), self-reported version, in five European countries. The sample consisted of 3012 adolescents aged between 12 and 17 years (M = 14.20; SD = 0.83). The five-factor model (with correlated errors added), and the five-factor model (with correlated errors added) with the reverse-worded items allowed to cross-load on the Prosocial subscale, displayed adequate goodness of-fit indices. Multi-group confirmatory factor analysis showed that the five-factor model (with correlated errors added) had partial strong measurement invariance by countries. A total of 11 of the 25 items were non-invariant across samples. The level of internal consistency of the Total difficulties score was 0.84, ranging between 0.69 and 0.78 for the SDQ subscales. The findings indicate that the SDQ's subscales need to be modified in various ways for screening emotional and behavioural problems in the five European countries that were analysed.
Psychometric analyses to improve the Dutch ICF Activity Inventory.

PubMed

Bruijning, Janna E; van Rens, Ger; Knol, Dirk; van Nispen, Ruth

2013-08-01

In the past, rehabilitation centers for the visually impaired used unstructured or semistructured methods to assess rehabilitation needs of their patients. Recently, an extensive instrument, the Dutch ICF Activity Inventory (D-AI), was developed to systematically investigate rehabilitation needs of visually impaired adults and to evaluate rehabilitation outcomes. The purpose of this study was to investigate the underlying factor structure and other psychometric properties to shorten and improve the D-AI. The D-AI was administered to 241 visually impaired persons who recently enrolled in a multidisciplinary rehabilitation center. The D-AI uses graded scores to assess the importance and difficulty of 65 rehabilitation goals. For high-priority goals (e.g., daily meal preparation), the difficulty of underlying tasks (e.g., read recipes, cut vegetables) was assessed. To reduce underlying task items (>950), descriptive statistics were investigated and factor analyses were performed for several goals. The internal consistency reliability and test-retest reliability of the D-AI were investigated by calculating Cronbach α and Cohen (weighted) κ. Finally, consensus-based discussions were used to shorten and improve the D-AI. Except for one goal, factor analysis model parameters were at least reasonable. Internal consistency reliability was satisfactory (range, 0.74 to 0.93). In total, 60% of the 65 goal importance items and 84.4% of the goal difficulty items showed moderate to almost perfect κ values (≥0.40). After consensus-based discussions, a new D-AI was produced, containing 48 goals and less than 500 tasks. The analyses were an important step in the validation process of the D-AI and to develop a more feasible assessment tool to investigate rehabilitation needs of visually impaired persons in a systematic way. The D-AI is currently implemented in all Dutch rehabilitation centers serving all visually impaired adults with various rehabilitation needs.
Cross-cultural adaptation and construct validity of the Korean version of a physical activity measure for community-dwelling elderly.

PubMed

Choi, Bongsam

2018-01-01

[Purpose] This study aimed to cross-cultural adapt and validate the Korean version of an physical activity measure (K-PAM) for community-dwelling elderly. [Subjects and Methods] One hundred and thirty eight community-dwelling elderlies, 32 males and 106 female, participated in the study. All participants were asked to fill out a fifty-one item questionnaire measuring perceived difficulty in the activities of daily living (ADL) for the elderly. One-parameter model of item response theory (Rasch analysis) was applied to determine the construct validity and to inspect item-level psychometric properties of 51 ADL items of the K-PAM. [Results] Person separation reliability (analogous to Cronbach's alpha) for internal consistency was ranging 0.93 to 0.94. A total of 16 items was misfit to the Rasch model. After misfit item deletion, 35 ADL items of the K-PAM were placed in an empirically meaningful hierarchy from easy to hard. The item-person map analysis delineated that the item difficulty was well matched for the elderlies with moderate and low ability except for high ceilings. [Conclusion] Cross-cultural adapted K-PAM was shown to be sufficient for establishing construct validity and stable psychometric properties confirmed by person separation reliability and fit statistics.
Differential age-related effects on conjunctive and relational visual short-term memory binding.

PubMed

Bastin, Christine

2017-12-28

An age-related associative deficit has been described in visual short-term binding memory tasks. However, separate studies have suggested that ageing disrupts relational binding (to associate distinct items or item and context) more than conjunctive binding (to integrate features within an object). The current study directly compared relational and conjunctive binding with a short-term memory task for object-colour associations in 30 young and 30 older adults. Participants studied a number of object-colour associations corresponding to their individual object span level in a relational task in which objects were associated to colour patches and a conjunctive task where colour was integrated into the object. Memory for individual items and for associations was tested with a recognition memory test. Evidence for an age-related associative deficit was observed in the relational binding task, but not in the conjunctive binding task. This differential impact of ageing on relational and conjunctive short-term binding is discussed by reference to two underlying age-related cognitive difficulties: diminished hippocampally dependent binding and attentional resources.
Analysis of the psychometric properties of the American Orthopaedic Foot and Ankle Society Score (AOFAS) in rheumatoid arthritis patients: application of the Rasch model.

PubMed

Conceição, Cristiano Sena da; Neto, Mansueto Gomes; Neto, Anolino Costa; Mendes, Selena M D; Baptista, Abrahão Fontes; Sá, Kátia Nunes

2016-01-01

To tested the reliability and validity of Aofas in a sample of rheumatoid arthritis patients. The scale was applicable to rheumatoid arthritis patients, twice by the interviewer 1 and once by the interviewer 2. The Aofas was subjected to test-retest reliability analysis (with 20 Rheumatoid arthritis subjects). The psychometric properties were investigated using Rasch analysis on 33 Rheumatoid arthritis patients. Intra-Class Correlation Coefficient (ICC) were (0.90
Spanish translation and linguistic validation of the quality of life in neurological disorders (Neuro-QoL) measurement system.

PubMed

Correia, H; Pérez, B; Arnold, B; Wong, Alex W K; Lai, J S; Kallen, M; Cella, D

2015-03-01

The quality of life in neurological disorders (Neuro-QoL) measurement system is a 470-item compilation of health-related quality of life domains for adults and children with neurological disorders. It was developed and cognitively debriefed in English and Spanish, with general population and clinical samples in the USA. This paper describes the Spanish translation and linguistic validation process. The translation methodology combined forward and back-translations, multiple reviews, and cognitive debriefing with 30 adult and 30 pediatric Spanish-speaking respondents in the USA. The adult Fatigue bank was later also tested in Spain and Argentina. A universal approach to translation was adopted to produce a Spanish version that can be used in various countries. Translators from several countries were involved in the process. Cognitive debriefing results indicated that most of the 470 Spanish items were well understood. Translations were revised as needed where difficulty was reported or where participants' comments revealed misunderstanding of an item's intended meaning. Additional testing of the universal Spanish adult Fatigue item bank in Spain and Argentina confirmed good understanding of the items and that no country-specific word changes were necessary. All the adult and pediatric Neuro-QoL measures have been linguistically validated with Spanish speakers in the USA. Instruments are available for use at www.assessmentcenter.net.
The Curiosity and Exploration Inventory-II: Development, Factor Structure, and Psychometrics

PubMed Central

Kashdan, Todd B.; Gallagher, Matthew W.; Silvia, Paul J.; Winterstein, Beate P.; Breen, William E.; Terhar, Daniel; Steger, Michael F.

2009-01-01

Given curiosity’s fundamental role in motivation, learning, and well-being, we sought to refine the measurement of trait curiosity with an improved version of the Curiosity and Exploration Inventory (CEI; Kashdan, Rose, & Fincham, 2004). A preliminary pool of 36 items was administered to 311 undergraduate students, who also completed measures of emotion, emotion regulation, personality, and well-being. Factor analyses indicated a two factor model—motivation to seek out knowledge and new experiences (Stretching; 5 items) and a willingness to embrace the novel, uncertain, and unpredictable nature of everyday life (Embracing; 5 items). In two additional samples (ns = 150 and 119), we cross-validated this factor structure and provided initial evidence for construct validity. This includes positive correlations with personal growth, openness to experience, autonomy, purpose in life, self-acceptance, psychological flexibility, positive affect, and positive social relations, among others. Applying item response theory (IRT) to these samples (n = 578), we showed that the items have good discrimination and a desirable breadth of difficulty. The item information functions and test information function were centered near zero, indicating that the scale assesses the mid-range of the latent curiosity trait most reliably. The findings thus far provide good evidence for the psychometric properties of the 10-item CEI-II. PMID:20160913
A Graphical Approach to Item Analysis. Research Report. ETS RR-04-10

ERIC Educational Resources Information Center

Livingston, Samuel A.; Dorans, Neil J.

2004-01-01

This paper describes an approach to item analysis that is based on the estimation of a set of response curves for each item. The response curves show, at a glance, the difficulty and the discriminating power of the item and the popularity of each distractor, at any level of the criterion variable (e.g., total score). The curves are estimated by…
Psychometric Properties of the Children's Depression Inventory: An Item Response Theory Analysis across Age in a Nonclinical, Longitudinal, Adolescent Sample

ERIC Educational Resources Information Center

Lee, Young-Sun; Krishnan, Anita; Park, Yoon Soo

2012-01-01

The purpose of this study was to investigate psychometric properties of the Children's Depression Inventory within a nonclinical and longitudinal sample (8th and 12th grades). Using the Rasch rating scale, most items represented one dimension. There was adequate separation among items and no overlap between ranges of item difficulties with latent…
Development and Validation of a Novel Generic Health-related Quality of Life Instrument With 20 Items (HINT-20).

PubMed

Jo, Min-Woo; Lee, Hyeon-Jeong; Kim, Soo Young; Kim, Seon-Ha; Chang, Hyejung; Ahn, Jeonghoon; Ock, Minsu

2017-01-01

Few attempts have been made to develop a generic health-related quality of life (HRQoL) instrument and to examine its validity and reliability in Korea. We aimed to do this in our present study. After a literature review of existing generic HRQoL instruments, a focus group discussion, in-depth interviews, and expert consultations, we selected 30 tentative items for a new HRQoL measure. These items were evaluated by assessing their ceiling effects, difficulty, and redundancy in the first survey. To validate the HRQoL instrument that was developed, known-groups validity and convergent/discriminant validity were evaluated and its test-retest reliability was examined in the second survey. Of the 30 items originally assessed for the HRQoL instrument, four were excluded due to high ceiling effects and six were removed due to redundancy. We ultimately developed a HRQoL instrument with a reduced number of 20 items, known as the Health-related Quality of Life Instrument with 20 items (HINT-20), incorporating physical, mental, social, and positive health dimensions. The results of the HINT-20 for known-groups validity were poorer in women, the elderly, and those with a low income. For convergent/discriminant validity, the correlation coefficients of items (except vitality) in the physical health dimension with the physical component summary of the Short Form 36 version 2 (SF-36v2) were generally higher than the correlations of those items with the mental component summary of the SF-36v2, and vice versa. Regarding test-retest reliability, the intraclass correlation coefficient of the total HINT-20 score was 0.813 (p<0.001). A novel generic HRQoL instrument, the HINT-20, was developed for the Korean general population and showed acceptable validity and reliability.
Assessing student understanding of measurement and uncertainty

NASA Astrophysics Data System (ADS)

Abbott, David Scot

A test to assess student understanding of measurement and uncertainty has been developed and administered to more than 500 students at two large research universities. The aim is two-fold: (1) to assess what students learn in the first semester of introductory physics labs and (2) to uncover patterns in student reasoning and practice. The forty minute, eleven item test focuses on direct measurement and student attitudes toward multiple measurements. After one revision cycle using think-aloud interviews, the test was administered to students to three groups: students enrolled in traditional laboratory lab sections of first semester physics at North Carolina State University (NCSU), students in an experimental (SCALE-UP) section of first semester physics at NCSU, and students in first semester physics at the University of North Carolina at Chapel Hill. The results were analyzed using a mixture of qualitative and quantitative methods. In the traditional NCSU labs, where students receive no instruction in uncertainty and measurement, students show no improvement on any of the areas examined by the test. In SCALE-UP and at UNC, students show statistically significant gains in most areas of the test. Gains on specific test items in SCALE-UP and at UNC correspond to areas of instructional emphasis. Test items were grouped into four main aspects of performance: "point/set" reasoning, meaning of spread, ruler reading and "stacking." Student performance on the pretest was examined to identify links between these aspects. Items within each aspect are correlated to one another, sometimes quite strongly, but items from different aspects rarely show statistically significant correlation. Taken together, these results suggest that student difficulties may not be linked to a single underlying cause. The study shows that current instruction techniques improve student understanding, but that many students exit the introductory physics lab course without appreciation or coherent understanding for the concept of measurement uncertainty.
Differential Gender Effects in the Relationship between Perceived Immune Functioning and Autistic Traits.

PubMed

Mackus, Marlou; Kruijff, Deborah de; Otten, Leila S; Kraneveld, Aletta D; Garssen, Johan; Verster, Joris C

2017-04-12

Altered immune functioning has been demonstrated in individuals with autism spectrum disorder (ASD). The current study explores the relationship between perceived immune functioning and experiencing ASD traits in healthy young adults. N = 410 students from Utrecht University completed a survey on immune functioning and autistic traits. In addition to a 1-item perceived immune functioning rating, the Immune Function Questionnaire (IFQ) was completed to assess perceived immune functioning. The Dutch translation of the Autism-Spectrum Quotient (AQ) was completed to examine variation in autistic traits, including the domains "social insights and behavior", "difficulties with change", "communication", "phantasy and imagination", and "detail orientation". The 1-item perceived immune functioning score did not significantly correlate with the total AQ score. However, a significant negative correlation was found between perceived immune functioning and the AQ subscale "difficulties with change" (r = -0.119, p = 0.019). In women, 1-item perceived immune functioning correlated significantly with the AQ subscales "difficulties with change" (r = -0.149, p = 0.029) and "communication" (r = -0.145, p = 0.032). In men, none of the AQ subscales significantly correlated with 1-item perceived immune functioning. In conclusion, a modest relationship between perceived immune functioning and several autistic traits was found.
Education on electrical phenomena involved in electroporation-based therapies and treatments: a blended learning approach.

PubMed

Čorović, Selma; Mahnič-Kalamiza, Samo; Miklavčič, Damijan

2016-04-07

Electroporation-based applications require multidisciplinary expertise and collaboration of experts with different professional backgrounds in engineering and science. Beginning in 2003, an international scientific workshop and postgraduate course electroporation based technologies and treatments (EBTT) has been organized at the University of Ljubljana to facilitate transfer of knowledge from leading experts to researches, students and newcomers in the field of electroporation. In this paper we present one of the integral parts of EBTT: an e-learning practical work we developed to complement delivery of knowledge via lectures and laboratory work, thus providing a blended learning approach on electrical phenomena involved in electroporation-based therapies and treatments. The learning effect was assessed via a pre- and post e-learning examination test composed of 10 multiple choice questions (i.e. items). The e-learning practical work session and both of the e-learning examination tests were carried out after the live EBTT lectures and other laboratory work. Statistical analysis was performed to compare and evaluate the learning effect measured in two groups of students: (1) electrical engineers and (2) natural scientists (i.e. medical doctors, biologists and chemists) undergoing the e-learning practical work in 2011-2014 academic years. Item analysis was performed to assess the difficulty of each item of the examination test. The results of our study show that the total score on the post examination test significantly improved and the item difficulty in both experimental groups decreased. The natural scientists reached the same level of knowledge (no statistical difference in total post-examination test score) on the post-course test take, as do electrical engineers, although the engineers started with statistically higher total pre-test examination score, as expected. The main objective of this study was to investigate whether the educational content the e-learning practical work presented to the students with different professional backgrounds enhanced their knowledge acquired via lectures during EBTT. We compared the learning effect assessed in two experimental groups undergoing the e-learning practical work: electrical engineers and natural scientists. The same level of knowledge on the post-course examination was reached in both groups. The results indicate that our e-learning platform supported by blended learning approach provides an effective learning tool for populations with mixed professional backgrounds and thus plays an important role in bridging the gap between scientific domains involved in electroporation-based technologies and treatments.
Regression Effects in Angoff Ratings: Examples from Credentialing Exams

ERIC Educational Resources Information Center

Wyse, Adam E.

2018-01-01

This article discusses regression effects that are commonly observed in Angoff ratings where panelists tend to think that hard items are easier than they are and easy items are more difficult than they are in comparison to estimated item difficulties. Analyses of data from two credentialing exams illustrate these regression effects and the…
Validation of the Erlangen Test of Activities of Daily Living in Persons with Mild Dementia or Mild Cognitive Impairment (ETAM).

PubMed

Luttenberger, Katharina; Reppermund, Simone; Schmiedeberg-Sohn, Anke; Book, Stephanie; Graessel, Elmar

2016-05-26

There are currently no valid, fast, and easy-to-administer performance tests that are designed to assess the capacities to perform activities of daily living in persons with mild dementia and mild cognitive impairment (MCI). However, such measures are urgently needed for determining individual support needs as well as the efficacy of interventions. The aim of the present study was therefore to validate the Erlangen Test of Activities of Daily Living in Persons with Mild Dementia and Mild Cognitive Impairment (ETAM), a performance test that is based on the International Classification of Functioning and Health (ICF), which assesses the relevant domains of living in older adults with MCI and mild dementia who live independently. The 10 ICF-based items on the research version of the ETAM were tested in a final sample of 81 persons with MCI or mild dementia. The items were selected for the final version in accordance with 6 criteria: 1) all domains must be represented and have equal weight, 2) all items must load on the same factor, 3) item difficulties and item discriminatory powers, 4) convergent validity (Bayer Activities of Daily Living Scale [B-ADL]) and discriminant validity (Mini Mental State Examination [MMSE], Geriatric Depression Scale 15 [GDS-15]), 5) inter-rater reliabilities of the individual items, 6) as little material as possible. Retest reliability was also examined. Cohen's ds were calculated to determine the magnitudes of the differences in ETAM scores between participants diagnosed with different grades of severity of cognitive impairment. The final version of the ETAM consists of 6 items that cover the five ICF domains communication, mobility, self-care, domestic life (assessed by two 3-point items), and major life areas (specifically, the economic life sub-category) and load on a single factor. The maximum achievable score is 30 points (6 points per domain). The average administration time was 35 min, 19 of which were needed for pure item performance. The internal consistency was α = .71. The three-week test-retest reliability was r = .78, and the inter-rater reliability was r = .97. The ETAM also provided satisfactory discrimination between healthy individuals and persons with MCI or mild dementia as well as between persons with mild and moderate dementia. The 6-item final version of the ETAM shows satisfactory psychometric characteristics and can be administered quickly. It is therefore suitable for use in both clinical practice and research.
A Five-Year Evaluation of Examination Structure in a Cardiovascular Pharmacotherapy Course

PubMed Central

Kolar, Claire; Janke, Kristin K.

2015-01-01

Objective. To evaluate the composition and effectiveness as an assessment tool of a criterion-referenced examination comprised of clinical cases tied to practice decisions, to examine the effect of varying audience response system (ARS) questions on student examination preparation, and to articulate guidelines for structuring examinations to maximize evaluation of student learning. Design. Multiple-choice items developed over 5 years were evaluated using Bloom’s Taxonomy classification, point biserial correlation, item difficulty, and grade distribution. In addition, examination items were classified into categories based on similarity to items used in ARS preparation. Assessment. As the number of items directly tied to clinical practice rose, Bloom’s Taxonomy level and item difficulty also rose. In examination years where Bloom’s levels were high but preparation was minimal, average grade distribution was lower compared with years in which student preparation was higher. Conclusion. Criterion-referenced examinations can benefit from systematic evaluation of their composition and effectiveness as assessment tools. Calculated design and delivery of classroom preparation is an asset in improving examination performance on rigorous, practice-relevant examinations. PMID:27168611
Effects of spacing of item repetitions in continuous recognition memory: does item retrieval difficulty promote item retention in older adults?

PubMed

Kılıç, Aslı; Hoyer, William J; Howard, Marc W

2013-01-01

BACKGROUND/STUDY CONTEXT: Older adults exhibit an age-related deficit in item memory as a function of the length of the retention interval, but older adults and young adults usually show roughly equivalent benefits due to the spacing of item repetitions in continuous memory tasks. The current experiment investigates the seemingly paradoxical effects of retention interval and spacing in young and older adults using a continuous recognition memory procedure. Fifty young adults and 52 older adults gave memory confidence ratings to words that were presented once (P1), twice (P2), or three times (P3), and the effects of the lag length and retention interval were assessed at P2 and at P3, respectively. Response times at P2 were disproportionately longer for older adults than for younger adults as a function of the number of items occurring between P1 and P2, suggestive of age-related loss in item memory. Ratings of confidence in memory responses revealed that older adults remembered fewer items at P2 with a high degree of certainty. Confidence ratings given at P3 suggested that young and older adults derived equivalent benefits from the spacing between P1 and P2. Findings of this study support theoretical accounts that suggest that recursive reminding and/or item retrieval difficulty promote item retention in older adults.

The Social, Emotional and Behavioural Difficulties of Primary School Children with Poor Attendance Records

ERIC Educational Resources Information Center

Carroll, H. C. M.

2013-01-01

Two complementary studies of poor and better attenders are presented. To measure emotional and behavioural difficulties (EBD) different teacher-completed rating scales were employed, and to determine social difficulties, the studies used sociometry and some items from the scales. One study had a longitudinal design. It revealed that, after…
Applying the Rule Space Model to Develop a Learning Progression for Thermochemistry

NASA Astrophysics Data System (ADS)

Chen, Fu; Zhang, Shanshan; Guo, Yanfang; Xin, Tao

2017-12-01

We used the Rule Space Model, a cognitive diagnostic model, to measure the learning progression for thermochemistry for senior high school students. We extracted five attributes and proposed their hierarchical relationships to model the construct of thermochemistry at four levels using a hypothesized learning progression. For this study, we developed 24 test items addressing the attributes of exothermic and endothermic reactions, chemical bonds and heat quantity change, reaction heat and enthalpy, thermochemical equations, and Hess's law. The test was administered to a sample base of 694 senior high school students taught in 3 schools across 2 cities. Results based on the Rule Space Model analysis indicated that (1) the test items developed by the Rule Space Model were of high psychometric quality for good analysis of difficulties, discriminations, reliabilities, and validities; (2) the Rule Space Model analysis classified the students into seven different attribute mastery patterns; and (3) the initial hypothesized learning progression was modified by the attribute mastery patterns and the learning paths to be more precise and detailed.
Verbal fluency in bilingual Spanish/English Alzheimer's disease patients.

PubMed

Salvatierra, Judy; Rosselli, Monica; Acevedo, Amarilis; Duara, Ranjan

2007-01-01

Studies have demonstrated that in verbal fluency tests, monolinguals with Alzheimer's disease (AD) show greater difficulties retrieving words based on semantic rather than phonemic rules. The present study aimed to determine whether this difficulty was reproduced in both languages of Spanish/English bilinguals with mild to moderate AD whose primary language was Spanish. Performance on semantic and phonemic verbal fluency of 11 bilingual AD patients was compared to the performance of 11 cognitively normal, elderly bilingual individuals matched for gender, age, level of education, and degree of bilingualism. Cognitively normal subjects retrieved significantly more items under the semantic condition compared to the phonemic, whereas the performance of AD patients was similar under both conditions, suggesting greater decline in semantic verbal fluency tests. This pattern was produced in both languages, implying a related semantic decline in both languages. Results from this study should be considered preliminary because of the small sample size.
Development and validation of brief scales to measure emotional and behavioural problems among Chinese adolescents

PubMed Central

Shen, Minxue; Hu, Ming; Sun, Zhenqiu

2017-01-01

Objectives To develop and validate brief scales to measure common emotional and behavioural problems among adolescents in the examination-oriented education system and collectivistic culture of China. Setting Middle schools in Hunan province. Participants 5442 middle school students aged 11–19 years were sampled. 4727 valid questionnaires were collected and used for validation of the scales. The final sample included 2408 boys and 2319 girls. Primary and secondary outcome measures The tools were assessed by the item response theory, classical test theory (reliability and construct validity) and differential item functioning. Results Four scales to measure anxiety, depression, study problem and sociality problem were established. Exploratory factor analysis showed that each scale had two solutions. Confirmatory factor analysis showed acceptable to good model fit for each scale. Internal consistency and test–retest reliability of all scales were above 0.7. Item response theory showed that all items had acceptable discrimination parameters and most items had appropriate difficulty parameters. 10 items demonstrated differential item functioning with respect to gender. Conclusions Four brief scales were developed and validated among adolescents in middle schools of China. The scales have good psychometric properties with minor differential item functioning. They can be used in middle school settings, and will help school officials to assess the students’ emotional/behavioural problems. PMID:28062469
Retrieval monitoring and anosognosia in Alzheimer's disease.

PubMed

Gallo, David A; Chen, Jennifer M; Wiseman, Amy L; Schacter, Daniel L; Budson, Andrew E

2007-09-01

This study explored the relationship between episodic memory and anosognosia (a lack of deficit awareness) among patients with mild Alzheimer's disease (AD). Participants studied words and pictures for subsequent memory tests. Healthy older adults made fewer false recognition errors when trying to remember pictures compared with words, suggesting that the perceptual distinctiveness of picture memories enhanced retrieval monitoring (the distinctiveness heuristic). In contrast, although participants with AD could discriminate between studied and nonstudied items, they had difficulty recollecting the specific presentation formats (words or pictures), and they had limited use of the distinctiveness heuristic. Critically, the demands of the memory test modulated the relationship between memory accuracy and anosognosia. Greater anosognosia was associated with impaired memory accuracy when participants with AD tried to remember words but not when they tried to remember pictures. These data further delineate the retrieval monitoring difficulties among individuals with AD and suggest that anosognosia measures are most likely to correlate with memory tests that require the effortful retrieval of nondistinctive information. (PsycINFO Database Record (c) 2007 APA, all rights reserved).
Middle school students' reading comprehension of mathematical texts and algebraic equations

NASA Astrophysics Data System (ADS)

Duru, Adem; Koklu, Onder

2011-06-01

In this study, middle school students' abilities to translate mathematical texts into algebraic representations and vice versa were investigated. In addition, students' difficulties in making such translations and the potential sources for these difficulties were also explored. Both qualitative and quantitative methods were used to collect data for this study: questionnaire and clinical interviews. The questionnaire consisted of two general types of items: (1) selected-response (multiple-choice) items for which the respondent selects from multiple options and (2) open-ended items for which the respondent constructs a response. In order to further investigate the students' strategies while they were translating the given mathematical texts to algebraic equations and vice versa, five randomly chosen (n = 5) students were interviewed. Data were collected in the 2007-2008 school year from 185 middle-school students in five teachers' classrooms in three different schools in the city of Adıyaman, Turkey. After the analysis of data, it was found that students who participated in this study had difficulties in translating the mathematical texts into algebraic equations by using symbols. It was also observed that these students had difficulties in translating the symbolic representations into mathematical texts because of their weak reading comprehension. In addition, finding of this research revealed that students' difficulties in translating the given mathematical texts into symbolic representations or vice versa come from different sources.
Measuring ability to assess claims about treatment effects: a latent trait analysis of items from the 'Claim Evaluation Tools' database using Rasch modelling.

PubMed

Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D

2017-05-25

The Claim Evaluation Tools database contains multiple-choice items for measuring people's ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Most of the items conformed well to the Rasch model's expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
A Comparison of Two Low-Stakes Methods for Administering a Program-Level Biology Concept Assessment.

PubMed

Couch, Brian A; Knight, Jennifer K

2015-12-01

Concept assessments are used commonly in undergraduate science courses to assess student learning and diagnose areas of student difficulty. While most concept assessments align with the content of individual courses or course topics, some concept assessments have been developed for use at the programmatic level to gauge student progress and achievement over a series of courses or an entire major. The broad scope of a program-level assessment, which exceeds the content of any single course, creates several test administration issues, including finding a suitable time for students to take the assessment and adequately incentivizing student participation. These logistical considerations must also be weighed against test security and the ability of students to use unauthorized resources that could compromise test validity. To understand how potential administration methods affect student outcomes, we administered the Molecular Biology Capstone Assessment (MBCA) to three pairs of matched upper-division courses in two ways: an online assessment taken by students outside of class and a paper-based assessment taken during class. We found that overall test scores were not significantly different and that individual item difficulties were highly correlated between these two administration methods. However, in-class administration resulted in reduced completion rates of items at the end of the assessment. Taken together, these results suggest that an online, outside-of-class administration produces scores that are comparable to a paper-based, in-class format and has the added advantages that instructors do not have to dedicate class time and students are more likely to complete the entire assessment.
Improving patients' understanding of terms and phrases commonly used in self-reported measures of sexual function.

PubMed

Alexander, Angel M; Flynn, Kathryn E; Hahn, Elizabeth A; Jeffery, Diana D; Keefe, Francis J; Reeve, Bryce B; Schultz, Wesley; Reese, Jennifer Barsky; Shelby, Rebecca A; Weinfurt, Kevin P

2014-08-01

There is a significant gap in research regarding the readability and comprehension of existing sexual function measures. Patient-reported outcome measures may use terms not well understood by respondents with low literacy. This study aims to test comprehension of words and phrases typically used in sexual function measures to improve validity for all individuals, including those with low literacy. We recruited 20 men and 28 women for cognitive interviews on version 2.0 of the Patient-Reported Outcome Measurement Information System(®) (PROMIS(®) ) Sexual Function and Satisfaction measures. We assessed participants' reading level using the word reading subtest of the Wide Range Achievement Test. Sixteen participants were classified as having low literacy. In the first round of cognitive interviews, each survey item was reviewed by five or more people, at least two of whom had lower than a ninth-grade reading level (low literacy). Patient feedback was incorporated into a revised version of the items. In the second round of interviews, an additional three or more people (at least one with low literacy) reviewed each revised item. Participants with low literacy had difficulty comprehending terms such as aroused, orgasm, erection, ejaculation, incontinence, and vaginal penetration. Women across a range of literacy levels had difficulty with clinical terms like labia and clitoris. We modified unclear terms to include parenthetical descriptors or slang equivalents, which generally improved comprehension. Common words and phrases used across measures of self-reported sexual function are not universally understood. Researchers should appreciate these misunderstandings as a potential source of error in studies using self-reported measures of sexual function. This study also provides evidence for the importance of including individuals with low literacy in cognitive pretesting during the measure development. © 2014 International Society for Sexual Medicine.
Investigating the Performance of Omega Index According to Item Parameters and Ability Levels

ERIC Educational Resources Information Center

Sunbul, Onder; Yormaz, Seha

2018-01-01

Purpose: Several studies can be found in the literature that investigate the performance of ? under various conditions. However no study for the effects of item difficulty, item discrimination, and ability restrictions on the performance of ? could be found. The current study aims to investigate the performance of ? for the conditions given below.…
Application of Item Analysis to Assess Multiple-Choice Examinations in the Mississippi Master Cattle Producer Program

ERIC Educational Resources Information Center

Parish, Jane A.; Karisch, Brandi B.

2013-01-01

Item analysis can serve as a useful tool in improving multiple-choice questions used in Extension programming. It can identify gaps between instruction and assessment. An item analysis of Mississippi Master Cattle Producer program multiple-choice examination responses was performed to determine the difficulty of individual examinations, assess the…
Exploring the Manifestations of Anxiety in Children with Autism Spectrum Disorders

ERIC Educational Resources Information Center

Hallett, Victoria; Lecavalier, Luc; Sukhodolsky, Denis G.; Cipriano, Noreen; Aman, Michael G.; McCracken, James T.; McDougle, Christopher J.; Tierney, Elaine; King, Bryan H.; Hollander, Eric; Sikich, Linmarie; Bregman, Joel; Anagnostou, Evdokia; Donnelly, Craig; Katsovich, Lily; Dukes, Kimberly; Vitiello, Benedetto; Gadow, Kenneth; Scahill, Lawrence

2013-01-01

This study explores the manifestation and measurement of anxiety symptoms in 415 children with ASDs on a 20-item, parent-rated, DSM-IV referenced anxiety scale. In both high and low-functioning children (IQ above vs. below 70), commonly endorsed items assessed restlessness, tension and sleep difficulties. Items requiring verbal expression of worry…
Using an analytical hierarchy process (AHP) for weighting items of a measurement scale: a pilot study.

PubMed

Benaïm, C; Perennou, D-A; Pelissier, J-Y; Daures, J-P

2010-02-01

Many clinical scales contain items that are scored separately prior to being compiled into a single score. However, if the items have different degrees of importance, they should be weighted differently before being compiled. The principal aims of this study were to show how the "analytic hierarchy process" (AHP), which has never been used for this purpose, can be applied to weighting the six items of the "London handicap scale", and to compare the AHP to the "conjoint analysis" (CA), which was previously implemented by Harwood et al. (1994) [1]. In order to assess the relative importance of the six items, we submitted AHP and CA to a group of 10 physiatrists. We compared the methods in terms of item ranking according to importance, assessment of fictitious patients based on weights determined by each method, and perceived difficulty by the physiatrist. For both techniques, "Physical independence" (PHY) was the best-weighted item, but other ranks varied depending on the technique. AHP was better than CA in terms of accuracy (global assessment of the clinical status) and perceived difficulty. AHP may be used to reveal the importance that experts assign to the items of a multidimensional scale, and to calculate the appropriate weights for specific items. For this purpose, AHP seems to be more accurate than CA.
Validity of a Protocol for Adult Self-Report of Dyslexia and Related Difficulties

ERIC Educational Resources Information Center

Snowling, Margaret; Dawes, Piers; Nash, Hannah; Hulme, Charles

2012-01-01

Background: There is an increased prevalence of reading and related difficulties in children of dyslexic parents. In order to understand the causes of these difficulties, it is important to quantify the risk factors passed from parents to their offspring. Method: 417 adults completed a protocol comprising a 15-item questionnaire rating reading and…
Item-level psychometrics of the ADL instrument of the Korean National Survey on persons with physical disabilities.

PubMed

Hong, Ickpyo; Lee, Mi Jung; Kim, Moon Young; Park, Hae Yean

2017-10-01

The aim of this study is to investigate the psychometrics of the 12 items of an instrument assessing activities of daily living (ADL) using an item response theory model. A total of 648 adults with physical disabilities and having difficulties in ADLs were retrieved from the 2014 Korean National Survey on People with Disabilities. The psychometric testing included factor analysis, internal consistency, precision, and differential item functioning (DIF) across categories including sex, older age, marital status, and physical impairment area. The sample had a mean age of 69.7 years old (SD = 13.7). The majority of the sample had lower extremity impairments (62.0%) and had at least 2.1 chronic conditions. The instrument demonstrated unidimensional construct and good internal consistency (Cronbach's alpha = 0.95). The instrument precisely estimated person measures within a wide range of theta values (-2.22 logits < θ < 0.27 logits) with a reliability of 0.9. Only the changing position item demonstrated misfit (χ 2 = 36.6, df = 17, p = 0.0038), and the dressing item demonstrated DIF on the impairment type (upper extremity/others, McFadden's Pseudo R 2 > 5.0%). Our findings indicate that the dressing item would need to be modified to improve its psychometrics. Overall, the ADL instrument demonstrates good psychometrics, and thus, it may be used as a standardized instrument for measuring disability in rehabilitation contexts. However, the findings are limited to adults with physical disabilities. Future studies should replicate psychometric testing for survey respondents with other disorders and for children.
[Development and validation of a questionnaire on knowledge and personal hygiene habits in childhood (HICORIN®)].

PubMed

Moreno-Martínez, Francisco José; Ruzafa-Martínez, María; Ramos-Morcillo, Antonio Jesús; Gómez García, Carmen Isabel; Hernández-Susarte, Ana María

2015-01-01

To develop and validate a questionnaire on the integral assessment of the habits and knowledge in personal hygiene in children between 7 to 12 years old in the educational, social and health environment. Cross-sectional study for the validation of a questionnaire. One primary and secondary school and one children's home in the Region of Murcia, Spain. A total of 86 children were included (80 from a primary and secondary school; 6 from a children's home), as well as 7 experts. Content validation by experts; qualitative assessment; identify difficulties related to some questions, item response analysis, and test-retest reliability. After the literature search, 20 tools that included items related to child body hygiene were obtained. The researchers selected 34 items and drafted 48 additional ones. After content validity by the experts, the questionnaire (HICORIN®) was reduced to 63 items, and consisted of 7 dimensions of child personal hygiene (skin, hair, hands, oral, feet, ears, and intimate hygiene). After with the children some terms were adapted to improve their understanding. Only two items had non-response rates that exceeded 10%. The test-retest showed that 84.1% of the items had between very good and moderate reliability. HICORIN® is a reliable and valid instrument that integrally assesses the habits and knowledge in personal hygiene in children between 7-12 years old. It is applicable in educative and social and health environments and in children from different socioeconomic levels. Copyright © 2014 Elsevier España, S.L.U. All rights reserved.
Developing Item Response Theory-Based Short Forms to Measure the Social Impact of Burn Injuries.

PubMed

Marino, Molly E; Dore, Emily C; Ni, Pengsheng; Ryan, Colleen M; Schneider, Jeffrey C; Acton, Amy; Jette, Alan M; Kazis, Lewis E

2018-03-01

To develop self-reported short forms for the Life Impact Burn Recovery Evaluation (LIBRE) Profile. Short forms based on the item parameters of discrimination and average difficulty. A support network for burn survivors, peer support networks, social media, and mailings. Burn survivors (N=601) older than 18 years. Not applicable. The LIBRE Profile. Ten-item short forms were developed to cover the 6 LIBRE Profile scales: Relationships with Family & Friends, Social Interactions, Social Activities, Work & Employment, Romantic Relationships, and Sexual Relationships. Ceiling effects were ≤15% for all scales; floor effects were <1% for all scales. The marginal reliability of the short forms ranged from .85 to .89. The LIBRE Profile-Short Forms demonstrated credible psychometric properties. The short form version provides a viable alternative to administering the LIBRE Profile when resources do not allow computer or Internet access. The full item bank, computerized adaptive test, and short forms are all scored along the same metric, and therefore scores are comparable regardless of the mode of administration. Copyright © 2017 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Measuring Graph Comprehension, Critique, and Construction in Science

NASA Astrophysics Data System (ADS)

Lai, Kevin; Cabrera, Julio; Vitale, Jonathan M.; Madhok, Jacquie; Tinker, Robert; Linn, Marcia C.

2016-08-01

Interpreting and creating graphs plays a critical role in scientific practice. The K-12 Next Generation Science Standards call for students to use graphs for scientific modeling, reasoning, and communication. To measure progress on this dimension, we need valid and reliable measures of graph understanding in science. In this research, we designed items to measure graph comprehension, critique, and construction and developed scoring rubrics based on the knowledge integration (KI) framework. We administered the items to over 460 middle school students. We found that the items formed a coherent scale and had good reliability using both item response theory and classical test theory. The KI scoring rubric showed that most students had difficulty linking graphs features to science concepts, especially when asked to critique or construct graphs. In addition, students with limited access to computers as well as those who speak a language other than English at home have less integrated understanding than others. These findings point to the need to increase the integration of graphing into science instruction. The results suggest directions for further research leading to comprehensive assessments of graph understanding.
Developing a fluid intelligence scale through a combination of Rasch modeling and cognitive psychology.

PubMed

Primi, Ricardo

2014-09-01

Ability testing has been criticized because understanding of the construct being assessed is incomplete and because the testing has not yet been satisfactorily improved in accordance with new knowledge from cognitive psychology. This article contributes to the solution of this problem through the application of item response theory and Susan Embretson's cognitive design system for test development in the development of a fluid intelligence scale. This study is based on findings from cognitive psychology; instead of focusing on the development of a test, it focuses on the definition of a variable for the creation of a criterion-referenced measure for fluid intelligence. A geometric matrix item bank with 26 items was analyzed with data from 2,797 undergraduate students. The main result was a criterion-referenced scale that was based on information from item features that were linked to cognitive components, such as storage capacity, goal management, and abstraction; this information was used to create the descriptions of selected levels of a fluid intelligence scale. The scale proposed that the levels of fluid intelligence range from the ability to solve problems containing a limited number of bits of information with obvious relationships through the ability to solve problems that involve abstract relationships under conditions that are confounded with an information overload and distraction by mixed noise. This scale can be employed in future research to provide interpretations for the measurements of the cognitive processes mastered and the types of difficulty experienced by examinees. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Applying Rasch model analysis in the development of the cantonese tone identification test (CANTIT).

PubMed

Lee, Kathy Y S; Lam, Joffee H S; Chan, Kit T Y; van Hasselt, Charles Andrew; Tong, Michael C F

2017-01-01

Applying Rasch analysis to evaluate the internal structure of a lexical tone perception test known as the Cantonese Tone Identification Test (CANTIT). A 75-item pool (CANTIT-75) with pictures and sound tracks was developed. Respondents were required to make a four-alternative forced choice on each item. A short version of 30 items (CANTIT-30) was developed based on fit statistics, difficulty estimates, and content evaluation. Internal structure was evaluated by fit statistics and Rasch Factor Analysis (RFA). 200 children with normal hearing and 141 children with hearing impairment were recruited. For CANTIT-75, all infit and 97% of outfit values were < 2.0. RFA revealed 40.1% of total variance was explained by the Rasch measure. The first residual component explained 2.5% of total variance in an eigenvalue of 3.1. For CANTIT-30, all infit and outfit values were < 2.0. The Rasch measure explained 38.8% of total variance, the first residual component explained 3.9% of total variance in an eigenvalue of 1.9. The Rasch model provides excellent guidance for the development of short forms. Both CANTIT-75 and CANTIT-30 possess satisfactory internal structure as a construct validity evidence in measuring the lexical tone identification ability of the Cantonese speakers.

Conditional statistical inference with multistage testing designs.

PubMed

Zwitser, Robert J; Maris, Gunter

2015-03-01

In this paper it is demonstrated how statistical inference from multistage test designs can be made based on the conditional likelihood. Special attention is given to parameter estimation, as well as the evaluation of model fit. Two reasons are provided why the fit of simple measurement models is expected to be better in adaptive designs, compared to linear designs: more parameters are available for the same number of observations; and undesirable response behavior, like slipping and guessing, might be avoided owing to a better match between item difficulty and examinee proficiency. The results are illustrated with simulated data, as well as with real data.
A critique of Rasch residual fit statistics.

PubMed

Karabatsos, G

2000-01-01

In test analysis involving the Rasch model, a large degree of importance is placed on the "objective" measurement of individual abilities and item difficulties. The degree to which the objectivity properties are attained, of course, depends on the degree to which the data fit the Rasch model. It is therefore important to utilize fit statistics that accurately and reliably detect the person-item response inconsistencies that threaten the measurement objectivity of persons and items. Given this argument, it is somewhat surprising that there is far more emphasis placed in the objective measurement of person and items than there is in the measurement quality of Rasch fit statistics. This paper provides a critical analysis of the residual fit statistics of the Rasch model, arguably the most often used fit statistics, in an effort to illustrate that the task of Rasch fit analysis is not as simple and straightforward as it appears to be. The faulty statistical properties of the residual fit statistics do not allow either a convenient or a straightforward approach to Rasch fit analysis. For instance, given a residual fit statistic, the use of a single minimum critical value for misfit diagnosis across different testing situations, where the situations vary in sample and test properties, leads to both the overdetection and underdetection of misfit. To improve this situation, it is argued that psychometricians need to implement residual-free Rasch fit statistics that are based on the number of Guttman response errors, or use indices that are statistically optimal in detecting measurement disturbances.
[SOMS-2: translation into portuguese of the screening for Somatoform Disorders].

PubMed

Fabião, Cristina; Costa E Silva, Carolina; Fleming, Manuela; Barbosa, António

2008-01-01

The diagnosis of Somatization Disorder (SD) requires the presence of somatic medically unexplained symptoms (MUS) which must be assessed so that organic diseases may be excluded. SOMS-2 is a self-report measure for SD that assesses medically unexplained symptoms by requiring participants to answer affirmatively and qualify any of the complaints as MUS, only if they have obtained from his doctor the opinion that the said complaint is not due to an organic disease. According to the authors, original SOMS-2 has a good internal consistency with Cronbach's a = .87 and a good correlation between selfratings and interview (r = .75). After obtaining the author's permission, translation from and into English has been made by experienced translators. The resulting questionnaire has been used on a small group of patients. Afterwards the items in which there were difficulties in understanding during the pretest were identified and experienced practitioners were asked for suggestions. The resulting version was answered by 123 primary health care patients (sample I). After some modifications of the SOMS-2, another group of 190 primary health care patients answered the questionnaire (sample II). Most patients, in the first sample, found it difficult to understand that, in order to answer affirmatively it was necessary to answer three questions: 1) is the symptom present? 2) has your doctor found no clear causes for the symptom? 3) does the symptom affect your well-being? The difficulties in understanding items 21 and 45 (pre-test) were confirmed. Items 11, 28 and 38 were more easily understood when worded differently. In sample I, less than 5% of positive answers were given to items 20, 21, 23, 40, 43, 45, and 51. Probably because of the low education level of the Portuguese population which this sample reflects, difficulties in carrying out the instructions given at the beginning made it advisable to modify the SOMS-2, so that the three implicit questions in each question of the SOMS-2 were divided into two columns (two explicit questions). Simultaneously attention must continue on controlling severity criterion (the third implicit question). After phase I, the items with an answer rate of less than 5% were eliminated. The majority of them are coincident with the low answer rate items found by the authors of the original version. The next step is to study the internal consistency and the correlation between results of self-ratings and interview, of the resulting version, in order to establish the validity of the SOMS-2 in these populations.
Evaluation of the International Outcome Inventory for Hearing Aids in a veteran sample.

PubMed

Smith, Sherri L; Noe, Colleen M; Alexander, Genevieve C

2009-06-01

The International Outcome Inventory for Hearing Aids (IOI-HA) was developed as a global hearing aid outcome measure targeting seven outcome domains. The published norms were based on a private-pay sample who were fitted with analog hearing aids. The purpose of this study was to evaluate the psychometric properties of the IOI-HA and to establish normative data in a veteran sample. Survey. The participants were 131 male veterans (mean age of 74.3 years, SD = 7.4) who were issued hearing aids with digital signal processing (DSP). Hearing aids with DSP that were fitted bilaterally between 2005 and 2007. Veterans were mailed two copies of the IOI-HA. The participants were instructed to complete the first copy of the questionnaire immediately and the second copy in two weeks. The completed questionnaires were mailed to the laboratory. The psychometric properties of the questionnaire were evaluated. As suggested by Cox and colleagues, the participants were divided into two categories based on their unaided subjective hearing difficulty. The two categories were (1) those with less hearing difficulty (none-to-moderate category) and (2) those who report more hearing difficulty (moderately severe+ category). The norms from the current veteran sample then were compared to the original, published sample. For each hearing difficulty category, the critical difference values were calculated for each item and for the total score. A factor analysis showed that the IOI-HA in the veteran sample had the identical subscale structure as reported in the original sample. For the total scale, the internal consistency was good (Chronbach's alpha = 0.83), and the test-retest reliability was high (lambda = 0.94). Group and individual norms were developed for both hearing difficulty categories in the veteran sample. For each IOI-HA item, the critical difference scores were < 1.0. This finding suggests that for any item on the IOI-HA, there is a 95 percent chance that an observed change of one response unit between two test sessions reflects a true change in outcome for a given domain. The results of this study confirmed that the psychometric properties of the IOI-HA questionnaire are strong and are essentially the same for the veteran sample and the original private-pay sample. The veteran norms, however, produced higher outcomes than those established originally, possibly because of differences in the population samples and/or hearing aid technology. Clinical and research applications of the current findings are presented. Based on the results from the current study, the norms established here should replace the original norms for use in veterans with current hearing aid technology.
A SIMPLE FRAILTY QUESTIONNAIRE (FRAIL) PREDICTS OUTCOMES IN MIDDLE AGED AFRICAN AMERICANS

PubMed Central

MORLEY, J.E.; MALMSTROM, T.K.; MILLER, D.K.

2015-01-01

Objective To validate the FRAIL scale. Design Longitudinal study. Setting Community. Participants Representative sample of African Americans age 49 to 65 years at onset of study. Measurements The 5-item FRAIL scale (Fatigue, Resistance, Ambulation, Illnesses, & Loss of Weight), at baseline and activities of daily living (ADLs), instrumental activities of daily living (IADLs), mortality, short physical performance battery (SPPB), gait speed, one-leg stand, grip strength and injurious falls at baseline and 9 years. Blood tests for CRP, SIL6R, STNFR1, STNFR2 and 25 (OH) vitamin D at baseline. Results Cross-sectionally the FRAIL scale correlated significantly with IADL difficulties, SPPB, grip strength and one-leg stand among participants with no baseline ADL difficulties (N=703) and those outcomes plus gait speed in those with no baseline ADL dependencies (N=883). TNFR1 was increased in pre-frail and frail subjects and CRP in some subgroups. Longitudinally (N=423 with no baseline ADL difficulties or N=528 with no baseline ADL dependencies), and adjusted for the baseline value for each outcome, being pre-frail at baseline significantly predicted future ADL difficulties, worse one-leg stand scores, and mortality in both groups, plus IADL difficulties in the dependence-excluded group. Being frail at baseline significantly predicted future ADL difficulties, IADL difficulties, and mortality in both groups, plus worse SPPB in the dependence-excluded group. Conclusion This study has validated the FRAIL scale in a late middle-aged African American population. This simple 5-question scale is an excellent screening test for clinicians to identify frail persons at risk of developing disability as well as decline in health functioning and mortality. PMID:22836700
The Dominance Concept Inventory: A Tool for Assessing Undergraduate Student Alternative Conceptions about Dominance in Mendelian and Population Genetics

PubMed Central

Perez, Kathryn E.; Price, Rebecca M.

2014-01-01

Despite the impact of genetics on daily life, biology undergraduates understand some key genetics concepts poorly. One concept requiring attention is dominance, which many students understand as a fixed property of an allele or trait and regularly conflate with frequency in a population or selective advantage. We present the Dominance Concept Inventory (DCI), an instrument to gather data on selected alternative conceptions about dominance. During development of the 16-item test, we used expert surveys (n = 12), student interviews (n = 42), and field tests (n = 1763) from introductory and advanced biology undergraduates at public and private, majority- and minority-serving, 2- and 4-yr institutions in the United States. In the final field test across all subject populations (n = 709), item difficulty ranged from 0.08 to 0.84 (0.51 ± 0.049 SEM), while item discrimination ranged from 0.11 to 0.82 (0.50 ± 0.048 SEM). Internal reliability (Cronbach's alpha) was 0.77, while test–retest reliability values were 0.74 (product moment correlation) and 0.77 (intraclass correlation). The prevalence of alternative conceptions in the field tests shows that introductory and advanced students retain confusion about dominance after instruction. All measures support the DCI as a useful instrument for measuring undergraduate biology student understanding and alternative conceptions about dominance. PMID:26086665
Are Faculty Predictions or Item Taxonomies Useful for Estimating the Outcome of Multiple-Choice Examinations?

ERIC Educational Resources Information Center

Kibble, Jonathan D.; Johnson, Teresa

2011-01-01

The purpose of this study was to evaluate whether multiple-choice item difficulty could be predicted either by a subjective judgment by the question author or by applying a learning taxonomy to the items. Eight physiology faculty members teaching an upper-level undergraduate human physiology course consented to participate in the study. The…
A measure of early physical functioning (EPF) post-stroke.

PubMed

Finch, Lois E; Higgins, Johanne; Wood-Dauphinee, Sharon; Mayo, Nancy E

2008-07-01

To develop a comprehensive measure of Early Physical Functioning (EPF) post-stroke quantified through Rasch analysis and conceptualized using the International Classification of Functioning Disability and Health (ICF). An observational cohort study. A cohort of 262 subjects (mean age 71.6 (standard deviation 12.5) years) hospitalized post-acute stroke. Functional assessments were made within 3 days of stroke with items from valid and reliable indices commonly utilized to evaluate stroke survivors. Information on important variables was also collected. Principal component and Rasch analysis confirmed the factor structure, and dimensionality of the measure. Rasch analysis combined items across ICF components to develop the measure. Items were deleted iteratively, those retained fit the model and were related to the construct; reliability and validity were assessed. A 38-item unidimensional measure of the EPF met all Rasch model requirements. The item difficulty matched the person ability (mean person measure: -0.31; standard error 0.37 logits), reliability of the person-item-hierarchy was excellent at 0.97. Initial validity was adequate. The 38-item EPF measure was developed. It expands the range of assessment post acute stroke; it covers a broad spectrum of difficulty with good initial psychometric properties that, once revalidated, can assist in planning and evaluating early interventions.
CTTITEM: SAS macro and SPSS syntax for classical item analysis.

PubMed

Lei, Pui-Wa; Wu, Qiong

2007-08-01

This article describes the functions of a SAS macro and an SPSS syntax that produce common statistics for conventional item analysis including Cronbach's alpha, item difficulty index (p-value or item mean), and item discrimination indices (D-index, point biserial and biserial correlations for dichotomous items and item-total correlation for polytomous items). These programs represent an improvement over the existing SAS and SPSS item analysis routines in terms of completeness and user-friendliness. To promote routine evaluations of item qualities in instrument development of any scale, the programs are available at no charge for interested users. The program codes along with a brief user's manual that contains instructions and examples are downloadable from suen.ed.psu.edu/-pwlei/plei.htm.
Memory for Self-Performed Actions in Individuals with Asperger Syndrome

PubMed Central

Zalla, Tiziana; Daprati, Elena; Sav, Anca-Maria; Chaste, Pauline; Nico, Daniele; Leboyer, Marion

2010-01-01

Memory for action is enhanced if individuals are allowed to perform the corresponding movements, compared to when they simply listen to them (enactment effect). Previous studies have shown that individuals with Autism Spectrum Disorders (ASD) have difficulties with processes involving the self, such as autobiographical memories and self performed actions. The present study aimed at assessing memory for action in Asperger Syndrome (AS). We investigated whether adults with AS would benefit from the enactment effect when recalling a list of previously performed items vs. items that were only visually and verbally experienced through three experimental tasks (Free Recall, Old/New Recognition and Source Memory). The results showed that while performance on Recognition and Source Memory tasks was preserved in individuals with AS, the enactment effect for self-performed actions was not consistently present, as revealed by the lower number of performed actions being recalled on the Free Recall test, as compared to adults with typical development. Subtle difficulties in encoding specific motor and proprioceptive signals during action execution in individuals with AS might affect retrieval of relevant personal episodic information. These disturbances might be associated to an impaired action monitoring system. PMID:20967277
[Transcultural adaptation into Spanish of the Patient empowerment in long-term conditions questionnaire].

PubMed

Garcimartin, Paloma; Pardo-Cladellas, Yolanda; Verdú-Rotellar, Jose-Maria; Delgado-Hito, Pilar; Astals-Vizcaino, Monica; Comin-Colet, Josep

2017-12-22

To describe the process of translation and cultural adaptation of the Patient empowerment in long-term condition to the Spanish language. Translation, cross-cultural adaptation, and pilot testing (cognitive debriefing) LOCATION: Primary and Hospital care. Ten patients admitted to a cardiology department of a University Hospital MAIN MEASUREMENTS: 1) Direct translation, 2) conciliation and synthesis of the versions by expert panel, 3) back- translation, 4) agreement on the back-translated version with the author of the original version, 5) analysis of comprehensibility through cognitive interviews. There were no differences between the direct-translated versions. The expert panel introduced changes in 23 out of the 47 items of the questionnaire. The author of the original version agreed with the version of the back-translation. In the cognitive interviews, patients reported high difficulty in one item and low difficulty in 4. The Spanish version of the Patient Empowerment in long-term conditions questionnaire is semantically and conceptually equivalent to the original tool. The assessment of the psychometric properties of the Spanish version of the questionnaire will be carried out at a later stage. Copyright © 2017 The Authors. Publicado por Elsevier España, S.L.U. All rights reserved.
Memory for self-performed actions in individuals with Asperger syndrome.

PubMed

Zalla, Tiziana; Daprati, Elena; Sav, Anca-Maria; Chaste, Pauline; Nico, Daniele; Leboyer, Marion

2010-10-12

Memory for action is enhanced if individuals are allowed to perform the corresponding movements, compared to when they simply listen to them (enactment effect). Previous studies have shown that individuals with Autism Spectrum Disorders (ASD) have difficulties with processes involving the self, such as autobiographical memories and self performed actions. The present study aimed at assessing memory for action in Asperger Syndrome (AS). We investigated whether adults with AS would benefit from the enactment effect when recalling a list of previously performed items vs. items that were only visually and verbally experienced through three experimental tasks (Free Recall, Old/New Recognition and Source Memory). The results showed that while performance on Recognition and Source Memory tasks was preserved in individuals with AS, the enactment effect for self-performed actions was not consistently present, as revealed by the lower number of performed actions being recalled on the Free Recall test, as compared to adults with typical development. Subtle difficulties in encoding specific motor and proprioceptive signals during action execution in individuals with AS might affect retrieval of relevant personal episodic information. These disturbances might be associated to an impaired action monitoring system.
Validation of the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch Analysis.

PubMed

Garzón Umerenkova, Angélica; de la Fuente Arias, Jesús; Martínez-Vicente, José Manuel; Zapata Sevillano, Lucía; Pichardo, Mari Carmen; García-Berbén, Ana Belén

2017-01-01

Background: The aim of the study was to psychometrically characterize the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch analysis. Materials and Methods: 831 Spaniard university students (262 men), between 17 and 39 years of age and ranging from the first to the 5th year of studies, completed the SSSRQ questionnaire. Confirmatory factor analysis (CFA) was carried out in order to establish structural adequacy. Afterward, by means of the Rasch model, a study of each sub scale was conducted to test for dimensionality, fit of the sample questions, functionality of the response categories, reliability and estimation of Differential Item Functioning by gender and course. Results: The four sub-scales comply with the unidimensionality criteria, the questions are in line with the model, the response categories operate properly and the reliability of the sample is acceptable. Nonetheless, the test could benefit from the inclusion of additional items of both high and low difficulty in order to increase construct validity, discrimination and reliability for the respondents. Several items with differences in gender and course were also identified. Discussion: The results evidence the need and adequacy of this complementary psychometric analysis strategy, in relation to the CFA to enhance the instrument.
Validation of the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch Analysis

PubMed Central

Garzón Umerenkova, Angélica; de la Fuente Arias, Jesús; Martínez-Vicente, José Manuel; Zapata Sevillano, Lucía; Pichardo, Mari Carmen; García-Berbén, Ana Belén

2017-01-01

Background: The aim of the study was to psychometrically characterize the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch analysis. Materials and Methods: 831 Spaniard university students (262 men), between 17 and 39 years of age and ranging from the first to the 5th year of studies, completed the SSSRQ questionnaire. Confirmatory factor analysis (CFA) was carried out in order to establish structural adequacy. Afterward, by means of the Rasch model, a study of each sub scale was conducted to test for dimensionality, fit of the sample questions, functionality of the response categories, reliability and estimation of Differential Item Functioning by gender and course. Results: The four sub-scales comply with the unidimensionality criteria, the questions are in line with the model, the response categories operate properly and the reliability of the sample is acceptable. Nonetheless, the test could benefit from the inclusion of additional items of both high and low difficulty in order to increase construct validity, discrimination and reliability for the respondents. Several items with differences in gender and course were also identified. Discussion: The results evidence the need and adequacy of this complementary psychometric analysis strategy, in relation to the CFA to enhance the instrument. PMID:28298898
Rasch Measurement of Collaborative Problem Solving in an Online Environment.

PubMed

Harding, Susan-Marie E; Griffin, Patrick E

2016-01-01

This paper describes an approach to the assessment of human to human collaborative problem solving using a set of online interactive tasks completed by student dyads. Within the dyad, roles were nominated as either A or B and students selected their own roles. The question as to whether role selection affected individual student performance measures is addressed. Process stream data was captured from 3402 students in six countries who explored the problem space by clicking, dragging the mouse, moving the cursor and collaborating with their partner through a chat box window. Process stream data were explored to identify behavioural indicators that represented elements of a conceptual framework. These indicative behaviours were coded into a series of dichotomous items. These items represented actions and chats performed by students. The frequency of occurrence was used as a proxy measure of item difficulty. Then given a measure of item difficulty, student ability could be estimated using the difficulty estimates of the range of items demonstrated by the student. The Rasch simple logistic model was used to review the indicators to identify those that were consistent with the assumptions of the model and were invariant across national samples, language, curriculum and age of the student. The data were analysed using a one and two dimension, one parameter model. Rasch separation reliability, fit to the model, distribution of students and items on the underpinning construct, estimates for each country and the effect of role differences are reported. This study provides evidence that collaborative problem solving can be assessed in an online environment involving human to human interaction using behavioural indicators shown to have a consistent relationship between the estimate of student ability, and the probability of demonstrating the behaviour.
Why Japanese workers show low work engagement: An item response theory analysis of the Utrecht Work Engagement scale

PubMed Central

2010-01-01

With the globalization of occupational health psychology, more and more researchers are interested in applying employee well-being like work engagement (i.e., a positive, fulfilling, work-related state of mind that is characterized by vigor, dedication, and absorption) to diverse populations. Accurate measurement contributes to our further understanding and to the generalizability of the concept of work engagement across different cultures. The present study investigated the measurement accuracy of the Japanese and the original Dutch versions of the Utrecht Work Engagement Scale (9-item version, UWES-9) and the comparability of this scale between both countries. Item Response Theory (IRT) was applied to the data from Japan (N = 2,339) and the Netherlands (N = 13,406). Reliability of the scale was evaluated at various levels of the latent trait (i.e., work engagement) based the test information function (TIF) and the standard error of measurement (SEM). The Japanese version had difficulty in differentiating respondents with extremely low work engagement, whereas the original Dutch version had difficulty in differentiating respondents with high work engagement. The measurement accuracy of both versions was not similar. Suppression of positive affect among Japanese people and self-enhancement (the general sensitivity to positive self-relevant information) among Dutch people may have caused decreased measurement accuracy. Hence, we should be cautious when interpreting low engagement scores among Japanese as well as high engagement scores among western employees. PMID:21054839
Why Japanese workers show low work engagement: An item response theory analysis of the Utrecht Work Engagement scale.

PubMed

Shimazu, Akihito; Schaufeli, Wilmar B; Miyanaka, Daisuke; Iwata, Noboru

2010-11-05

With the globalization of occupational health psychology, more and more researchers are interested in applying employee well-being like work engagement (i.e., a positive, fulfilling, work-related state of mind that is characterized by vigor, dedication, and absorption) to diverse populations. Accurate measurement contributes to our further understanding and to the generalizability of the concept of work engagement across different cultures. The present study investigated the measurement accuracy of the Japanese and the original Dutch versions of the Utrecht Work Engagement Scale (9-item version, UWES-9) and the comparability of this scale between both countries. Item Response Theory (IRT) was applied to the data from Japan (N = 2,339) and the Netherlands (N = 13,406). Reliability of the scale was evaluated at various levels of the latent trait (i.e., work engagement) based the test information function (TIF) and the standard error of measurement (SEM). The Japanese version had difficulty in differentiating respondents with extremely low work engagement, whereas the original Dutch version had difficulty in differentiating respondents with high work engagement. The measurement accuracy of both versions was not similar. Suppression of positive affect among Japanese people and self-enhancement (the general sensitivity to positive self-relevant information) among Dutch people may have caused decreased measurement accuracy. Hence, we should be cautious when interpreting low engagement scores among Japanese as well as high engagement scores among western employees.
Measuring Acceptance of Sleep Difficulties: The Development of the Sleep Problem Acceptance Questionnaire.

PubMed

Bothelius, Kristoffer; Jernelöv, Susanna; Fredrikson, Mats; McCracken, Lance M; Kaldo, Viktor

2015-11-01

Acceptance may be an important therapeutic process in sleep medicine, but valid psychometric instruments measuring acceptance related to sleep difficulties are lacking. The purpose of this study was to develop a measure of acceptance in insomnia, and to examine its factor structure as well as construct validity. In a cross-sectional design, a principal component analysis for item reduction was conducted on a first sample (A) and a confirmatory factor analysis on a second sample (B). Construct validity was tested on a combined sample (C). Questionnaire items were derived from a measure of acceptance in chronic pain, and data were gathered through screening or available from pretreatment assessments in four insomnia treatment trials, administered online, via bibliotherapy and in primary care. Adults with insomnia: 372 in sample A and 215 in sample B. Sample C (n = 820) included sample A and B with another 233 participants added. Construct validity was assessed through relations with established acceptance and sleep scales. The principal component analysis presented a two-factor solution with eight items, explaining 65.9% of the total variance. The confirmatory factor analysis supported the solution. Acceptance of sleep problems was more closely related to subjective symptoms and consequences of insomnia than to diary description of sleep, or to acceptance of general private events. The Sleep Problem Acceptance Questionnaire (SPAQ), containing the subscales "Activity Engagement" and "Willingness", is a valid tool to assess acceptance of insomnia. © 2015 Associated Professional Sleep Societies, LLC.
Applying item response theory and computer adaptive testing: the challenges for health outcomes assessment.

PubMed

Fayers, Peter M

2007-01-01

We review the papers presented at the NCI/DIA conference, to identify areas of controversy and uncertainty, and to highlight those aspects of item response theory (IRT) and computer adaptive testing (CAT) that require theoretical or empirical research in order to justify their application to patient reported outcomes (PROs). IRT and CAT offer exciting potential for the development of a new generation of PRO instruments. However, most of the research into these techniques has been in non-healthcare settings, notably in education. Educational tests are very different from PRO instruments, and consequently problematic issues arise when adapting IRT and CAT to healthcare research. Clinical scales differ appreciably from educational tests, and symptoms have characteristics distinctly different from examination questions. This affects the transferring of IRT technology. Particular areas of concern when applying IRT to PROs include inadequate software, difficulties in selecting models and communicating results, insufficient testing of local independence and other assumptions, and a need of guidelines for estimating sample size requirements. Similar concerns apply to differential item functioning (DIF), which is an important application of IRT. Multidimensional IRT is likely to be advantageous only for closely related PRO dimensions. Although IRT and CAT provide appreciable potential benefits, there is a need for circumspection. Not all PRO scales are necessarily appropriate targets for this methodology. Traditional psychometric methods, and especially qualitative methods, continue to have an important role alongside IRT. Research should be funded to address the specific concerns that have been identified.
Overall quality of life and difficulty paying for ostomy supplies in the Veterans Affairs ostomy health-related quality of life study: an exploratory analysis.

PubMed

Coons, Stephen Joel; Chongpison, Yuda; Wendel, Christopher S; Grant, Marcia; Krouse, Robert S

2007-09-01

To explore whether there was a significant relationship between difficulty paying for ostomy supplies and overall quality of life among a sample of ostomates receiving care from the Veterans Health Administration (VHA). The data were collected as part of the Veterans Affairs (VA) Ostomy Health-Related Quality of Life Study, in which 511 respondents (239 cases, 272 controls) completed a survey instrument that included the modified City of Hope Quality of Life (mCOH-QOL) Ostomy questionnaire, SF-36V, and sociodemographic items. Responses from the 239 cases (ie, patients with intestinal stomas) were used in this analysis. The modified City of Hope Quality of Life Ostomy questionnaire item, "How good is your overall quality of life?," was the dependent variable for this analysis. The primary independent variable was the response (yes/no) to the item, "If you pay for any of the (ostomy) costs, is it difficult for you?" A hierarchical regression model was used to examine whether difficulty paying was significantly related to overall quality of life after adjusting for age, income, race/ethnicity, and physical health. After accounting for the proportion of variance explained by age, income, race/ethnicity, and physical health, the additional proportion of variance explained by difficulty paying was statistically significant. Individuals reporting difficulty paying had a roughly 1 point lower (ie, beta-coefficient = -1.052; SE = 0.481) overall quality of life score on the 11-point scale. We found a significant association between difficulty paying for ostomy supplies and overall quality of life. Although the cross-sectional study design does not allow causal inference, the results suggest a relationship that merits further examination.

Is there a reliable factorial structure in the 20-item Toronto Alexithymia Scale? A comparison of factor models in clinical and normal adult samples.

PubMed

Müller, Jochen; Bühner, Markus; Ellgring, Heiner

2003-12-01

The 20-item Toronto Alexithymia Scale (TAS-20) is the most widely used instrument for measuring alexithymia. However, different studies did not always yield identical factor structures of this scale. The present study aims at clarifying some discrepant results. Maximum likelihood confirmatory factor analyses of a German version of the TAS-20 were conducted on data from a clinical sample (N=204) and a sample of normal adults (N=224). Five different models with one to four factors were compared. A four-factor model with factors (F1) "Difficulty identifying feelings" (F2), "Difficulty describing feelings" (F3), "Low importance of emotion" and (F4) "Pragmatic thinking" and a three-factor model with the combined factor "Difficulties in identifying and describing feelings" described the data best. Factors related to "externally oriented thinking" provided no acceptable level of reliability. Results from the present and other studies indicate that the factorial structure of the TAS-20 may vary across samples. Whether factor structures different from the common three-factor structure are an exception in some mainly clinical populations or a common phenomenon outside student populations has still to be determined. For a further exploration of the factor structure of the TAS-20 in different populations, it would be important not only to test the fit of the common three-factor model, but also to consider other competing solutions like the models of the present study.
Generation and associative encoding in young and old adults: the effect of the strength of association between cues and targets on a cued recall task.

PubMed

Taconnat, Laurence; Froger, Charlotte; Sacher, Mathilde; Isingrini, Michel

2008-01-01

The generation effect (i.e., better recall of the generated items than the read items) was investigated with a between-list design in young and elderly participants. The generation task difficulty was manipulated by varying the strength of association between cues and targets. Overall, strong associates were better recalled than weak associates. However, the results showed different generation effect patterns according to strength of association and age, with a greater generation effect for weak associates in younger adults only. These findings suggest that generating weak associates leads to more elaborated encoding, but that elderly adults cannot use this elaborated encoding as well as younger adults to recall the target words at test.
The Influence of Task Demands, Verbal Ability and Executive Functions on Item and Source Memory in Autism Spectrum Disorder

ERIC Educational Resources Information Center

Semino, Sara; Ring, Melanie; Bowler, Dermot M.; Gaigg, Sebastian B.

2018-01-01

Autism Spectrum Disorder (ASD) is generally associated with difficulties in contextual source memory but not single item memory. There are surprising inconsistencies in the literature, however, that the current study seeks to address by examining item and source memory in age and ability matched groups of 22 ASD and 21 comparison adults. Results…
Spanish translation and linguistic validation of the quality of life in neurological disorders (Neuro-QoL) measurement system

PubMed Central

Pérez, B.; Arnold, B.; Wong, Alex W. K.; Lai, JS; Kallen, M.; Cella, D.

2017-01-01

Introduction The quality of life in neurological disorders (Neuro-QoL) measurement system is a 470-item compilation of health-related quality of life domains for adults and children with neurological disorders. It was developed and cognitively debriefed in English and Spanish, with general population and clinical samples in the USA. This paper describes the Spanish translation and linguistic validation process. Methods The translation methodology combined forward and back-translations, multiple reviews, and cognitive debriefing with 30 adult and 30 pediatric Spanish-speaking respondents in the USA. The adult Fatigue bank was later also tested in Spain and Argentina. A universal approach to translation was adopted to produce a Spanish version that can be used in various countries. Translators from several countries were involved in the process. Results Cognitive debriefing results indicated that most of the 470 Spanish items were well understood. Translations were revised as needed where difficulty was reported or where participants’ comments revealed misunderstanding of an item’s intended meaning. Additional testing of the universal Spanish adult Fatigue item bank in Spain and Argentina confirmed good understanding of the items and that no country-specific word changes were necessary. Conclusion All the adult and pediatric Neuro-QoL measures have been linguistically validated with Spanish speakers in the USA. Instruments are available for use at www.assessmentcenter.net. PMID:25236708
The Effect of Mental Rotation on Surgical Pathological Diagnosis.

PubMed

Park, Heejung; Kim, Hyun Soo; Cha, Yoon Jin; Choi, Junjeong; Minn, Yangki; Kim, Kyung Sik; Kim, Se Hoon

2018-05-01

Pathological diagnosis involves very delicate and complex consequent processing that is conducted by a pathologist. The recognition of false patterns might be an important cause of misdiagnosis in the field of surgical pathology. In this study, we evaluated the influence of visual and cognitive bias in surgical pathologic diagnosis, focusing on the influence of "mental rotation." We designed three sets of the same images of uterine cervix biopsied specimens (original, left to right mirror images, and 180-degree rotated images), and recruited 32 pathologists to diagnose the 3 set items individually. First, the items found to be adequate for analysis by classical test theory, Generalizability theory, and item response theory. The results showed statistically no differences in difficulty, discrimination indices, and response duration time between the image sets. Mental rotation did not influence the pathologists' diagnosis in practice. Interestingly, outliers were more frequent in rotated image sets, suggesting that the mental rotation process may influence the pathological diagnoses of a few individual pathologists. © Copyright: Yonsei University College of Medicine 2018.
Assessing the 16 hour intern shift limit: Results of a multi-center, mixed-methods study of residents and faculty in general surgery.

PubMed

Coverdill, James E; Alseidi, Adnan; Borgstrom, David C; Dent, Daniel L; Dumire, Russell; Fryer, Jonathan; Hartranft, Thomas H; Holsten, Steven B; Nelson, M Timothy; Shabahang, Mohsen M; Sherman, Stanley R; Termuhlen, Paula M; Woods, Randy J; Mellinger, John D

2018-02-01

The study explores how residents and faculty assess the ACGME's 16-h limit on intern shifts. Questionnaire response rates were 76% for residents (N = 291) and 71% for faculty (N = 279) in 13 general surgery residency programs. Results include means, percentage in agreement, and statistical tests for 15 questionnaire items. Semi-structured interviews conducted with 39 residents and 43 faculty were analyzed for main themes. Few view the intern shift limit as a positive change. Views differ (P < 0.01) for residents and faculty on 12 of 15 item means and across PGY levels on all 15 items. Interviews indicate concerns about losses with respect to education and professional development, difficulties when interns transition to their second year, and how intern shifts may be more fatiguing than expected. The 16-h limit on intern shifts has remained a source of concern and an educational challenge for residents and faculty. Copyright © 2017 Elsevier Inc. All rights reserved.
Can manual ability be measured with a generic ABILHAND scale? A cross-sectional study conducted on six diagnostic groups

PubMed Central

Arnould, Carlyne; Vandervelde, Laure; Batcho, Charles Sèbiyo; Penta, Massimo; Thonnard, Jean-Louis

2012-01-01

Objectives Several ABILHAND Rasch-built manual ability scales were previously developed for chronic stroke (CS), cerebral palsy (CP), rheumatoid arthritis (RA), systemic sclerosis (SSc) and neuromuscular disorders (NMD). The present study aimed to explore the applicability of a generic manual ability scale unbiased by diagnosis and to study the nature of manual ability across diagnoses. Design Cross-sectional study. Setting Outpatient clinic homes (CS, CP, RA), specialised centres (CP), reference centres (CP, NMD) and university hospitals (SSc). Participants 762 patients from six diagnostic groups: 103 CS adults, 113 CP children, 112 RA adults, 156 SSc adults, 124 NMD children and 124 NMD adults. Primary and secondary outcome measures Manual ability as measured by the ABILHAND disease-specific questionnaires, diagnosis and nature (ie, uni-manual or bi-manual involvement and proximal or distal joints involvement) of the ABILHAND manual activities. Results The difficulties of most manual activities were diagnosis dependent. A principal component analysis highlighted that 57% of the variance in the item difficulty between diagnoses was explained by the symmetric or asymmetric nature of the disorders. A generic scale was constructed, from a metric point of view, with 11 items sharing a common difficulty among diagnoses and 41 items displaying a category-specific location (asymmetric: CS, CP; and symmetric: RA, SSc, NMD). This generic scale showed that CP and NMD children had significantly less manual ability than RA patients, who had significantly less manual ability than CS, SSc and NMD adults. However, the generic scale was less discriminative and responsive to small deficits than disease-specific instruments. Conclusions Our finding that most of the manual item difficulties were disease-dependent emphasises the danger of using generic scales without prior investigation of item invariance across diagnostic groups. Nevertheless, a generic manual ability scale could be developed by adjusting and accounting for activities perceived differently in various disorders. PMID:23117570
Lawton IADL scale in dementia: can item response theory make it more informative?

PubMed

McGrory, Sarah; Shenkin, Susan D; Austin, Elizabeth J; Starr, John M

2014-07-01

impairment of functional abilities represents a crucial component of dementia diagnosis. Current functional measures rely on the traditional aggregate method of summing raw scores. While this summary score provides a quick representation of a person's ability, it disregards useful information on the item level. to use item response theory (IRT) methods to increase the interpretive power of the Lawton Instrumental Activities of Daily Living (IADL) scale by establishing a hierarchy of item 'difficulty' and 'discrimination'. this cross-sectional study applied IRT methods to the analysis of IADL outcomes. Participants were 202 members of the Scottish Dementia Research Interest Register (mean age = 76.39, range = 56-93, SD = 7.89 years) with complete itemised data available. a Mokken scale with good reliability (Molenaar Sijtsama statistic 0.79) was obtained, satisfying the IRT assumption that the items comprise a single unidimensional scale. The eight items in the scale could be placed on a hierarchy of 'difficulty' (H coefficient = 0.55), with 'Shopping' being the most 'difficult' item and 'Telephone use' being the least 'difficult' item. 'Shopping' was the most discriminatory item differentiating well between patients of different levels of ability. IRT methods are capable of providing more information about functional impairment than a summed score. 'Shopping' and 'Telephone use' were identified as items that reveal key information about a patient's level of ability, and could be useful screening questions for clinicians. © The Author 2013. Published by Oxford University Press on behalf of the British Geriatrics Society. All rights reserved. For Permissions, please email: journals.permissions@ oup.com.
An investigation into introductory astronomy students' difficulties with cosmology, and the development, validation, and efficacy of a new suite of cosmology lecture-tutorials

NASA Astrophysics Data System (ADS)

Wallace, Colin S.

This study reports the results of the first systematic investigation into Astro 101 students' conceptual and reasoning difficulties with cosmology. We developed four surveys with which we measured students' conceptual knowledge of the Big Bang, the expansion and evolution of the universe, and the evidence for dark matter. Our classical test theory and item response theory analyses of over 2300 students' pre- and post-instruction responses, combined with daily classroom observations, videotapes of students working in class, and one-on-one semi-structured think-aloud interviews with nineteen Astro 101 students, revealed several common learning difficulties. In order to help students overcome these difficulties, we used our results to inform the development of a new suite of cosmology lecture-tutorials. In our initial testing of the new lecture-tutorials at the University of Colorado at Boulder and the University of Arizona, we found many cases in which students who used the lecture-tutorials achieved higher learning gains (as measured by our surveys) at statistically significant levels than students who did not. Subsequent use of the lecture-tutorials at a variety of colleges and universities across the United States produced a wide range of learning gains, suggesting that instructors' pedagogical practices and implementations of the lecture-tutorials significantly affect whether or not students achieve high learning gains.
Development and Validation of a Novel Generic Health-related Quality of Life Instrument With 20 Items (HINT-20)

PubMed Central

2017-01-01

Objectives Few attempts have been made to develop a generic health-related quality of life (HRQoL) instrument and to examine its validity and reliability in Korea. We aimed to do this in our present study. Methods After a literature review of existing generic HRQoL instruments, a focus group discussion, in-depth interviews, and expert consultations, we selected 30 tentative items for a new HRQoL measure. These items were evaluated by assessing their ceiling effects, difficulty, and redundancy in the first survey. To validate the HRQoL instrument that was developed, known-groups validity and convergent/discriminant validity were evaluated and its test-retest reliability was examined in the second survey. Results Of the 30 items originally assessed for the HRQoL instrument, four were excluded due to high ceiling effects and six were removed due to redundancy. We ultimately developed a HRQoL instrument with a reduced number of 20 items, known as the Health-related Quality of Life Instrument with 20 items (HINT-20), incorporating physical, mental, social, and positive health dimensions. The results of the HINT-20 for known-groups validity were poorer in women, the elderly, and those with a low income. For convergent/discriminant validity, the correlation coefficients of items (except vitality) in the physical health dimension with the physical component summary of the Short Form 36 version 2 (SF-36v2) were generally higher than the correlations of those items with the mental component summary of the SF-36v2, and vice versa. Regarding test-retest reliability, the intraclass correlation coefficient of the total HINT-20 score was 0.813 (p<0.001). Conclusions A novel generic HRQoL instrument, the HINT-20, was developed for the Korean general population and showed acceptable validity and reliability. PMID:28173686
The Relation between Item Identification Difficulty and Elaborative Conceptual Processing for Children and Adults.

ERIC Educational Resources Information Center

Ackerman, Brian P.; And Others

1990-01-01

Results of four experiments show that developmental differences in elaborative conceptual processing at acquisition and retrieval contribute independently to developmental increases in recall. Item identification processes for both words and pictures constrain children's elaborative processing. The constraints are time limited. (RH)
Measurement Equivalence in ADL and IADL Difficulty Across International Surveys of Aging: Findings From the HRS, SHARE, and ELSA

PubMed Central

Kasper, Judith D.; Brandt, Jason; Pezzin, Liliana E.

2012-01-01

Objective. To examine the measurement equivalence of items on disability across three international surveys of aging. Method. Data for persons aged 65 and older were drawn from the Health and Retirement Survey (HRS, n = 10,905), English Longitudinal Study of Aging (ELSA, n = 5,437), and Survey of Health, Ageing and Retirement in Europe (SHARE, n = 13,408). Differential item functioning (DIF) was assessed using item response theory (IRT) methods for activities of daily living (ADL) and instrumental activities of daily living (IADL) items. Results. HRS and SHARE exhibited measurement equivalence, but 6 of 11 items in ELSA demonstrated meaningful DIF. At the scale level, this item-level DIF affected scores reflecting greater disability. IRT methods also spread out score distributions and shifted scores higher (toward greater disability). Results for mean disability differences by demographic characteristics, using original and DIF-adjusted scores, were the same overall but differed for some subgroup comparisons involving ELSA. Discussion. Testing and adjusting for DIF is one means of minimizing measurement error in cross-national survey comparisons. IRT methods were used to evaluate potential measurement bias in disability comparisons across three international surveys of aging. The analysis also suggested DIF was mitigated for scales including both ADL and IADL and that summary indexes (counts of limitations) likely underestimate mean disability in these international populations. PMID:22156662
The revised Stress Measurement of Female Marriage Immigrants in Korea: Evaluation of the psychometric properties.

PubMed

Park, Min Hee; Yang, Sook Ja; Chee, Yeon Kyung

2016-01-01

The twenty-one item Stress Measurement of Female Marriage Immigrants (SMFMI) was developed to assess stress of female marriage immigrants in Korea. This study reports the psychometric properties of a revised SMFMI (SMFMI-R) for application with female marriage immigrants to Korea who were raising children. Participants were 190 female marriage immigrants from China, Vietnam, the Philippines, and other Asian countries, who were recruited using convenience sampling between November 2013 and December 2013. Survey questionnaires were translated into study participants' native languages (Chinese, Vietnamese, and English). Principal component analysis yielded nineteen items in four factors (family, parenting, cultural, and economic stress), explaining 63.5% of the variance, which was slightly better than the original scale. Confirmatory factor analysis indicated adequate fit for the four-factor model. Based on classic test theory and item response theory, strong support was provided for item discrimination, item difficulty, and internal consistency (Cronbach's alpha = 0.923). SMFMI-R scores were negatively associated with Korean proficiency and subjective economic status. The SMFMI-R is a valid, reliable, and comprehensive measure of stress for female marriage immigrants and can provide useful information to develop intervention programs for those who may be at risk for emotional stress.
Anesthesiology Journal club assessment by means of semantic changes.

PubMed

Vieira, Joaquim Edson; Torres, Marcelo Luís Abramides; Pose, Regina Albanese; Auler, José Otávio Costa Junior

2014-01-01

the interactive approach of a journal club has been described in the medical education literature. The aim of this investigation is to present an assessment of journal club as a tool to address the question whether residents read more and critically. this study reports the performance of medical residents in anesthesiology from the Clinics Hospital - University of São Paulo Medical School. All medical residents were invited to answer five questions derived from discussed papers. The answer sheet consisted of an affirmative statement with a Likert type scale (totally disagree-disagree-not sure-agree-totally agree), each related to one of the chosen articles. The results were evaluated by means of item analysis - difficulty index and discrimination power. residents filled one hundred and seventy three evaluations in the months of December 2011 (n=51), July 2012 (n=66) and December 2012 (n=56). The first exam presented all items with straight statement, second and third exams presented mixed items. Separating "totally agree" from "agree" increased the difficulty indices, but did not improve the discrimination power. the use of a journal club assessment with straight and inverted statements and by means of five points scale for agreement has been shown to increase its item difficulty and discrimination power. This may reflect involvement either with the reading or the discussion during the journal meeting. Copyright © 2013 Sociedade Brasileira de Anestesiologia. Published by Elsevier Editora Ltda. All rights reserved.
The dialysis orders objective structured clinical examination (OSCE): a formative assessment for nephrology fellows.

PubMed

Prince, Lisa K; Campbell, Ruth C; Gao, Sam W; Kendrick, Jessica; Lebrun, Christopher J; Little, Dustin J; Mahoney, David L; Maursetter, Laura A; Nee, Robert; Saddler, Mark; Watson, Maura A; Yuan, Christina M

2018-04-01

Few quantitative nephrology-specific simulations assess fellow competency. We describe the development and initial validation of a formative objective structured clinical examination (OSCE) assessing fellow competence in ordering acute dialysis. The three test scenarios were acute continuous renal replacement therapy, chronic dialysis initiation in moderate uremia and acute dialysis in end-stage renal disease-associated hyperkalemia. The test committee included five academic nephrologists and four clinically practicing nephrologists outside of academia. There were 49 test items (58 points). A passing score was 46/58 points. No item had median relevance less than 'important'. The content validity index was 0.91. Ninety-five percent of positive-point items were easy-medium difficulty. Preliminary validation was by 10 board-certified volunteers, not test committee members, a median of 3.5 years from graduation. The mean score was 49 [95% confidence interval (CI) 46-51], κ = 0.68 (95% CI 0.59-0.77), Cronbach's α = 0.84. We subsequently administered the test to 25 fellows. The mean score was 44 (95% CI 43-45); 36% passed the test. Fellows scored significantly less than validators (P < 0.001). Of evidence-based questions, 72% were answered correctly by validators and 54% by fellows (P = 0.018). Fellows and validators scored least well on the acute hyperkalemia question. In self-assessing proficiency, 71% of fellows surveyed agreed or strongly agreed that the OSCE was useful. The OSCE may be used to formatively assess fellow proficiency in three common areas of acute dialysis practice. Further validation studies are in progress.
A Study of General Education Astronomy Students' Understandings of Cosmology. Part III. Evaluating Four Conceptual Cosmology Surveys: An Item Response Theory Approach

ERIC Educational Resources Information Center

Wallace, Colin S.; Prather, Edward E.; Duncan, Douglas K.

2012-01-01

This is the third of five papers detailing our national study of general education astronomy students' conceptual and reasoning difficulties with cosmology. In this paper, we use item response theory to analyze students' responses to three out of the four conceptual cosmology surveys we developed. The specific item response theory model we use is…
An international measure of awareness and beliefs about cancer: development and testing of the ABC

PubMed Central

Simon, Alice E; Forbes, Lindsay J L; Boniface, David; Warburton, Fiona; Brain, Kate E; Dessaix, Anita; Donnelly, Michael; Haynes, Kerry; Hvidberg, Line; Lagerlund, Magdalena; Petermann, Lisa; Tishelman, Carol; Vedsted, Peter; Vigmostad, Maria Nyre; Wardle, Jane; Ramirez, Amanda J

2012-01-01

Objectives To develop an internationally validated measure of cancer awareness and beliefs; the awareness and beliefs about cancer (ABC) measure. Design and setting Items modified from existing measures were assessed by a working group in six countries (Australia, Canada, Denmark, Norway, Sweden and the UK). Validation studies were completed in the UK, and cross-sectional surveys of the general population were carried out in the six participating countries. Participants Testing in UK English included cognitive interviewing for face validity (N=10), calculation of content validity indexes (six assessors), and assessment of test–retest reliability (N=97). Conceptual and cultural equivalence of modified (Canadian and Australian) and translated (Danish, Norwegian, Swedish and Canadian French) ABC versions were tested quantitatively for equivalence of meaning (≥4 assessors per country) and in bilingual cognitive interviews (three interviews per translation). Response patterns were assessed in surveys of adults aged 50+ years (N≥2000) in each country. Main outcomes Psychometric properties were evaluated through tests of validity and reliability, conceptual and cultural equivalence and systematic item analysis. Test–retest reliability used weighted-κ and intraclass correlations. Construction and validation of aggregate scores was by factor analysis for (1) beliefs about cancer outcomes, (2) beliefs about barriers to symptomatic presentation, and item summation for (3) awareness of cancer symptoms and (4) awareness of cancer risk factors. Results The English ABC had acceptable test–retest reliability and content validity. International assessments of equivalence identified a small number of items where wording needed adjustment. Survey response patterns showed that items performed well in terms of difficulty and discrimination across countries except for awareness of cancer outcomes in Australia. Aggregate scores had consistent factor structures across countries. Conclusions The ABC is a reliable and valid international measure of cancer awareness and beliefs. The methods used to validate and harmonise the ABC may serve as a methodological guide in international survey research. PMID:23253874
Controlling Guessing Bias in the Dichotomous Rasch Model Applied to a Large-Scale, Vertically Scaled Testing Program

PubMed Central

Andrich, David; Marais, Ida; Humphry, Stephen Mark

2015-01-01

Recent research has shown how the statistical bias in Rasch model difficulty estimates induced by guessing in multiple-choice items can be eliminated. Using vertical scaling of a high-profile national reading test, it is shown that the dominant effect of removing such bias is a nonlinear change in the unit of scale across the continuum. The consequence is that the proficiencies of the more proficient students are increased relative to those of the less proficient. Not controlling the guessing bias underestimates the progress of students across 7 years of schooling with important educational implications. PMID:29795871
Measuring student learning using initial and final concept test in an STEM course

NASA Astrophysics Data System (ADS)

Kaw, Autar; Yalcin, Ali

2012-06-01

Effective assessment is a cornerstone in measuring student learning in higher education. For a course in Numerical Methods, a concept test was used as an assessment tool to measure student learning and its improvement during the course. The concept test comprised 16 multiple choice questions and was given in the beginning and end of the class for three semesters. Hake's gain index, a measure of learning gains from pre- to post-tests, of 0.36 to 0.41 were recorded. The validity and reliability of the concept test was checked via standard measures such as Cronbach's alpha, content and criterion-related validity, item characteristic curves and difficulty and discrimination indices. The performance of various subgroups such as pre-requisite grades, transfer students, gender and age were also studied.
A Rasch measure of teachers' views of teacher-student relationships in the primary school.

PubMed

Leitao, Natalie; Waugh, Russell F

2012-01-01

This study investigated teacher-student relationships from the teachers' point of view at Perth metropolitan schools in Western Australia. The study identified three key social and emotional aspects that affect teacher-student relationships, namely, Connectedness, Availability and Communication. Data were collected by questionnaire (N = 139) with stem-items answered in three perspectives: (1) Idealistic: this is what I would like to happen; (2) Capability: this is what I am capable of; and (3) Behaviour: this is what actually happens, using four ordered response categories: not at all (score 1), some of the time (score 2), most of the time (score 3), and almost always (score 4). Data were analysed with a Rasch measurement model and a uni-dimensional, linear scale with 24 items, ordered from easy to hard, was created. The data were shown to be highly reliable, so that valid inferences could be made from the scale. The Person Separation Index (akin to a reliability index) was 0.93; there was good global teacher and item fit to the measurement model; there was good item fit; the targeting of the item difficulties against the teacher measures was good, and the response categories were answered consistently and logically. Teachers said that the ideal items were all easier than their corresponding capability items which were in turn easier than the behaviour items (where the items fitted the model), as conceptualized. The easiest ideal items were: I like this child and This child and I get along well together. The hardest ideal item (but still easy) was: I am available for this child. The easiest behaviour item (but still hard) was: This child and I get along well together. The hardest behaviour item (and very hard) was: I am interested to learn about this child's personal thoughts, feelings and experiences. The difficulties of the items supported the conceptual structure of the variable.

Universal Ontology: Attentive Tracking of Objects and Substances across Languages and over Development

ERIC Educational Resources Information Center

Cacchione, Trix; Indino, Marcello; Fujita, Kazuo; Itakura, Shoji; Matsuno, Toyomi; Schaub, Simone; Amici, Federica

2014-01-01

Previous research has demonstrated that adults are successful at visually tracking rigidly moving items, but experience great difficulties when tracking substance-like "pouring" items. Using a comparative approach, we investigated whether the presence/absence of the grammatical count-mass distinction influences adults and children's…
Decimal Fraction Arithmetic: Logical Error Analysis and Its Validation.

ERIC Educational Resources Information Center

Standiford, Sally N.; And Others

This report illustrates procedures of item construction for addition and subtraction examples involving decimal fractions. Using a procedural network of skills required to solve such examples, an item characteristic matrix of skills analysis was developed to describe the characteristics of the content domain by projected student difficulties. Then…
Reproduction of Inflectional Markers in French-Speaking Children with Reading Impairment

ERIC Educational Resources Information Center

St-Pierre, Marie-Catherine; Beland, Renee

2010-01-01

Purpose: Children with reading impairment (RI) experience difficulties in oral and written production of inflectional markers. The origin of these difficulties is not well documented in French. According to some authors, acquisition of irregular items by typically developing children is predicted by token frequency, whereas acquisition of regular…
Conceptualizing and Measuring Weekend versus Weekday Alcohol Use: Item Response Theory and Confirmatory Factor Analysis

PubMed Central

Handren, Lindsay; Crano, William D.

2018-01-01

Culturally, people tend to abstain from alcohol intake during the weekdays and wait to consume in greater frequency and quantity during the weekends. The current research sought to empirically justify the days representing weekday versus weekend alcohol consumption. In study 1 (N = 419), item response theory was applied to a two-parameter (difficulty and discrimination) model that evaluated the days of drinking (frequency) during the typical 7-day week. Item characteristic curves were most similar for Monday, Tuesday, and Wednesday (prototypical weekday) and for Friday and Saturday (prototypical weekend). Thursday and Sunday, however, exhibited item characteristics that bordered the properties of weekday and weekend consumption. In study 2 (N = 403), confirmatory factor analysis was applied to test six hypothesized measurement structures representing drinks per day (quantity) during the typical week. The measurement model producing the strongest fit indices was a correlated two-factor structure involving separate weekday and weekend factors that permitted Thursday and Sunday to double load on both dimensions. The proper conceptualization and accurate measurement of the days demarcating the normative boundaries of “dry” weekdays and “wet” weekends are imperative to inform research and prevention efforts targeting temporal alcohol intake patterns. PMID:27488456
Conceptualizing and Measuring Weekend versus Weekday Alcohol Use: Item Response Theory and Confirmatory Factor Analysis.

PubMed

Lac, Andrew; Handren, Lindsay; Crano, William D

2016-10-01

Culturally, people tend to abstain from alcohol intake during the weekdays and wait to consume in greater frequency and quantity during the weekends. The current research sought to empirically justify the days representing weekday versus weekend alcohol consumption. In study 1 (N = 419), item response theory was applied to a two-parameter (difficulty and discrimination) model that evaluated the days of drinking (frequency) during the typical 7-day week. Item characteristic curves were most similar for Monday, Tuesday, and Wednesday (prototypical weekday) and for Friday and Saturday (prototypical weekend). Thursday and Sunday, however, exhibited item characteristics that bordered the properties of weekday and weekend consumption. In study 2 (N = 403), confirmatory factor analysis was applied to test six hypothesized measurement structures representing drinks per day (quantity) during the typical week. The measurement model producing the strongest fit indices was a correlated two-factor structure involving separate weekday and weekend factors that permitted Thursday and Sunday to double load on both dimensions. The proper conceptualization and accurate measurement of the days demarcating the normative boundaries of "dry" weekdays and "wet" weekends are imperative to inform research and prevention efforts targeting temporal alcohol intake patterns.
Effects of Perceptual and Contextual Enrichment on Visual Confrontation Naming in Adult Aging

PubMed Central

Rogalski, Yvonne; Peelle, Jonathan E.; Reilly, Jamie

2013-01-01

Purpose The purpose of this study was to determine the effects of enriching line drawings with color/texture and environmental context as a facilitator of naming speed and accuracy in older adults. Method Twenty young and 23 older adults named high-frequency picture stimuli from the Boston Naming Test (Kaplan, Goodglass, & Weintraub, 2001) under three conditions: (a) black-and-white items, (b) colorized-texturized items, and (c) scene-primed colored items (e.g., “hammock” preceded 1,000 ms by a backyard scene). Results With respect to speeded naming latencies, mixed-model analyses of variance revealed that young adults did not benefit from colorization-texturization but did show scene-priming effects. In contrast, older adults failed to show facilitation effects from either colorized-texturized or scene-primed items. Moreover, older adults were consistently slower to initiate naming than were their younger counterparts across all conditions. Conclusions Perceptual and contextual enrichment of sparse line drawings does not appear to facilitate visual confrontation naming in older adults, whereas younger adults do tend to show benefits of scene priming. We interpret these findings as generally supportive of a processing speed account of age-related object picture-naming difficulty. PMID:21498581
A psychometric evaluation of the Arm Motor Ability Test.

PubMed

O'Dell, Michael W; Kim, Grace; Rivera, Lisa; Fieo, Robert; Christos, Paul; Polistena, Caitlin; Fitzgerald, Kerri; Gorga, Delia

2013-06-01

To further examine the psychometric properties of a 9-item version of the Arm Motor Ability Test (AMAT-9) in persons with stroke. Thirty-two community-dwelling persons > 6 months post-stroke undergoing robotics treatment (mean age = 56.0 years, time post-stroke = 4.1 years, National Institutes of Health Stroke Scale score = 4.1, and AMAT-9 score = 1.22). Construct validity (including Rasch analyses) used baseline data prior to treatment (n = 32). Standardized response mean was calculated for subjects completing the protocol (n = 29). The Wolf Motor Function Test (WMFT), Fugl-Meyer Assessment (FMA), Action Research Arm Test (ARAT), and Stroke Impact Scale (SIS) were also administered. Spearman-rank correlation coefficients between AMAT-9 and the WMFT, FMA, and ARAT were strong (0.78-0.79, all p < 0.001). The correlation between the AMAT-9 and SIS Hand Function sub-score was stronger than that between the AMAT-9 and the Communication sub-score (0.40, p = 0.025 and -0.16, p = 0.39, respectively). Rasch analyses provided evidence for an appropriate hierarchical structure of item difficulties, unidimensionality, and good reliability. The AMAT demonstrated a comparable standardized response mean of 0.98. The AMAT-9 is valid and responsive among subjects scoring in the lower range of the scale. It has the advantage of assessing function and by eliminating the standing item from the previous iteration, it may be more easily used with severely impaired patients.
[Psychometric properties of Q-DIO, an instrument to measure the quality of documented nursing diagnoses, interventions and outcomes].

PubMed

Müller-Staub, Maria; Lunney, Margaret; Lavin, Mary Ann; Needham, Ian; Odenbreit, Matthias; van Achterberg, Theo

2010-04-01

The instrument Q-DIO was developed in the years 2005 till 2006 to measure the quality of documented nursing diagnoses, interventions, and nursing sensitive patient outcomes. Testing psychometric properties of the Q-DIO (Quality of nursing Diagnoses, Interventions and Outcomes.) was the study aim. Instrument testing included internal consistency, test-retest reliability, interrater reliability, item analyses, and an assessment of the objectivity. To render variation in scores, a random strata sample of 60 nursing documentations was drawn. The strata represented 30 nursing documentations with and 30 without application of theory based, standardised nursing language. Internal consistency of the subscale nursing diagnoses as process showed Cronbach's Alpha 0.83 [0.78, 0.88]; nursing diagnoses as product 0.98 [0.94, 0.99]; nursing interventions 0.90 [0.85, 0.94]; and nursing-sensitive patient outcomes 0.99 [0.95, 0.99]. With Cohen's Kappa of 0.95, the intrarater reliability was good. The interrater reliability showed a Kappa of 0.94 [0.90, 0.96]. Item analyses confirmed the fulfilment of criteria for degree of difficulty and discriminative validity of the items. In this study, Q-DIO has shown to be a reliable instrument. It allows measuring the documented quality of nursing diagnoses, interventions and outcomes with and without implementation of theory based, standardised nursing languages. Studies for further testing of Q-DIO in other settings are recommended. The results implicitly support the use of nursing classifications such as NANDA, NIC and NOC.
The effects of a visualization-centered curriculum on conceptual understanding and representational competence in high school biology

NASA Astrophysics Data System (ADS)

Wilder, Anna

The purpose of this study was to investigate the effects of a visualization-centered curriculum, Hemoglobin: A Case of Double Identity, on conceptual understanding and representational competence in high school biology. Sixty-nine students enrolled in three sections of freshman biology taught by the same teacher participated in this study. Online Chemscape Chime computer-based molecular visualizations were incorporated into the 10-week curriculum to introduce students to fundamental structure and function relationships. Measures used in this study included a Hemoglobin Structure and Function Test, Mental Imagery Questionnaire, Exam Difficulty Survey, the Student Assessment of Learning Gains, the Group Assessment of Logical Thinking, the Attitude Toward Science in School Assessment, audiotapes of student interviews, students' artifacts, weekly unit activity surveys, informal researcher observations and a teacher's weekly questionnaire. The Hemoglobin Structure and Function Test, consisting of Parts A and B, was administered as a pre and posttest. Part A used exclusively verbal test items to measure conceptual understanding, while Part B used visual-verbal test items to measure conceptual understanding and representational competence. Results of the Hemoglobin Structure and Function pre and posttest revealed statistically significant gains in conceptual understanding and representational competence, suggesting the visualization-centered curriculum implemented in this study was effective in supporting positive learning outcomes. The large positive correlation between posttest results on Part A, comprised of all-verbal test items, and Part B, using visual-verbal test items, suggests this curriculum supported students' mutual development of conceptual understanding and representational competence. Evidence based on student interviews, Student Assessment of Learning Gains ratings and weekly activity surveys indicated positive attitudes toward the use of Chemscape Chime software and the computer-based molecular visualization activities as learning tools. Evidence from these same sources also indicated that students felt computer-based molecular visualization activities in conjunction with other classroom activities supported their learning. Implications for instructional design are discussed.
Relationship Between Difficulties in Daily Activities and Falling: Loco-Check as a Self-Assessment of Fall Risk.

PubMed

Akahane, Manabu; Maeyashiki, Akie; Yoshihara, Shingo; Tanaka, Yasuhito; Imamura, Tomoaki

2016-06-20

People aged 65 years or older accounted for 25.1% of the Japanese population in 2013, and this characterizes the country as a "super-aging society." With increased aging, fall-related injuries are becoming important in Japan, because such injuries underlie the necessity for nursing care services. If people could evaluate their risk of falling using a simple self-check test, they would be able to take preventive measures such as exercise, muscle training, walking with a cane, or renovation of their surroundings to remove impediments. Loco-check is a checklist measure of early locomotive syndrome (circumstances in which elderly people need nursing care service or are at high risk of requiring the service within a short time), prepared by the Japanese Orthopaedic Association (JOA) in 2007, but it is unclear if there is any association between this measure and falls. To investigate the association between falls during the previous year and the 7 "loco-check" daily activity items and the total number of items endorsed, and sleep duration. We conducted an Internet panel survey. Subjects were 624 persons aged between 30 and 90 years. The general health condition of the participants, including their experience of falling, daily activities, and sleep duration, was investigated. A multivariate analysis was carried out using logistic regression to investigate the relationship between falls in the previous year and difficulties with specific daily activities and total number of difficulties (loco-check) endorsed, and sleep duration, adjusting for sex and age. One-fourth of participants (157 persons) experienced at least one fall during the previous year. Fall rate of females (94/312: 30.1%) was significantly higher than that of males (63/312: 20.2%). Fall rate of persons aged more than 65 years (80/242: 33.1%) was significantly higher than that of younger persons (77/382: 20.2%). Logistic regression analysis revealed that daily activities such as "impossibility of getting across the road at a crossing before the traffic light changes" are significantly related to falling. Logistic regression analysis also demonstrated a relationship between the number of items endorsed on loco-check and incidence of falling, wherein persons who endorsed 4 or more items appear to be at higher risk for falls. However, logistic regression found no significant relationship between sleep duration and falling. Our study demonstrated a relationship between the number of loco-check items endorsed and the incidence of falling in the previous year. Endorsement of 4 or more items appeared to signal a high risk for falls. The short self-administered checklist can be a valuable tool for assessing the risk of falling and for initiating preventive measures.
Bank of Items for H.S.C. Biology Level III and Division 1 with Computerised Self-Moderation and Error Analysis Procedures Using the Items from the Bank.

ERIC Educational Resources Information Center

Palmer, D. G.

This publication presents an organized collection of biology questions, designed for use in evaluation at the secondary level in Tasmania. Each item has been tried for quality and is accompanied by its difficulty percentage as well as by its content area and the mental processes required to answer it. The content areas include: Diversity,…
The Genetics Concept Assessment: a new concept inventory for gauging student understanding of genetics.

PubMed

Smith, Michelle K; Wood, William B; Knight, Jennifer K

2008-01-01

We have designed, developed, and validated a 25-question Genetics Concept Assessment (GCA) to test achievement of nine broad learning goals in majors and nonmajors undergraduate genetics courses. Written in everyday language with minimal jargon, the GCA is intended for use as a pre- and posttest to measure student learning gains. The assessment was reviewed by genetics experts, validated by student interviews, and taken by >600 students at three institutions. Normalized learning gains on the GCA were positively correlated with averaged exam scores, suggesting that the GCA measures understanding of topics relevant to instructors. Statistical analysis of our results shows that differences in the item difficulty and item discrimination index values between different questions on pre- and posttests can be used to distinguish between concepts that are well or poorly learned during a course.
The Genetics Concept Assessment: A New Concept Inventory for Gauging Student Understanding of Genetics

PubMed Central

Wood, William B.; Knight, Jennifer K.

2008-01-01

We have designed, developed, and validated a 25-question Genetics Concept Assessment (GCA) to test achievement of nine broad learning goals in majors and nonmajors undergraduate genetics courses. Written in everyday language with minimal jargon, the GCA is intended for use as a pre- and posttest to measure student learning gains. The assessment was reviewed by genetics experts, validated by student interviews, and taken by >600 students at three institutions. Normalized learning gains on the GCA were positively correlated with averaged exam scores, suggesting that the GCA measures understanding of topics relevant to instructors. Statistical analysis of our results shows that differences in the item difficulty and item discrimination index values between different questions on pre- and posttests can be used to distinguish between concepts that are well or poorly learned during a course. PMID:19047428
Measuring impairments of functioning and health in patients with axial spondyloarthritis by using the ASAS Health Index and the Environmental Item Set: translation and cross-cultural adaptation into 15 languages.

PubMed

Kiltz, U; van der Heijde, D; Boonen, A; Bautista-Molano, W; Burgos-Vargas, R; Chiowchanwisawakit, P; Duruoz, T; El-Zorkany, B; Essers, I; Gaydukova, I; Géher, P; Gossec, L; Grazio, S; Gu, J; Khan, M A; Kim, T J; Maksymowych, W P; Marzo-Ortega, H; Navarro-Compán, V; Olivieri, I; Patrikos, D; Pimentel-Santos, F M; Schirmer, M; van den Bosch, F; Weber, U; Zochling, J; Braun, J

2016-01-01

The Assessments of SpondyloArthritis international society Health Index (ASAS HI) measures functioning and health in patients with spondyloarthritis (SpA) across 17 aspects of health and 9 environmental factors (EF). The objective was to translate and adapt the original English version of the ASAS HI, including the EF Item Set, cross-culturally into 15 languages. Translation and cross-cultural adaptation has been carried out following the forward-backward procedure. In the cognitive debriefing, 10 patients/country across a broad spectrum of sociodemographic background, were included. The ASAS HI and the EF Item Set were translated into Arabic, Chinese, Croatian, Dutch, French, German, Greek, Hungarian, Italian, Korean, Portuguese, Russian, Spanish, Thai and Turkish. Some difficulties were experienced with translation of the contextual factors indicating that these concepts may be more culturally-dependent. A total of 215 patients with axial SpA across 23 countries (62.3% men, mean (SD) age 42.4 (13.9) years) participated in the field test. Cognitive debriefing showed that items of the ASAS HI and EF Item Set are clear, relevant and comprehensive. All versions were accepted with minor modifications with respect to item wording and response option. The wording of three items had to be adapted to improve clarity. As a result of cognitive debriefing, a new response option 'not applicable' was added to two items of the ASAS HI to improve appropriateness. This study showed that the items of the ASAS HI including the EFs were readily adaptable throughout all countries, indicating that the concepts covered were comprehensive, clear and meaningful in different cultures.
Measuring impairments of functioning and health in patients with axial spondyloarthritis by using the ASAS Health Index and the Environmental Item Set: translation and cross-cultural adaptation into 15 languages

PubMed Central

Kiltz, U; van der Heijde, D; Boonen, A; Bautista-Molano, W; Burgos-Vargas, R; Chiowchanwisawakit, P; Duruoz, T; El-Zorkany, B; Essers, I; Gaydukova, I; Géher, P; Gossec, L; Grazio, S; Gu, J; Khan, M A; Kim, T J; Maksymowych, W P; Marzo-Ortega, H; Navarro-Compán, V; Olivieri, I; Patrikos, D; Pimentel-Santos, F M; Schirmer, M; van den Bosch, F; Weber, U; Zochling, J; Braun, J

2016-01-01

Introduction The Assessments of SpondyloArthritis international society Health Index (ASAS HI) measures functioning and health in patients with spondyloarthritis (SpA) across 17 aspects of health and 9 environmental factors (EF). The objective was to translate and adapt the original English version of the ASAS HI, including the EF Item Set, cross-culturally into 15 languages. Methods Translation and cross-cultural adaptation has been carried out following the forward–backward procedure. In the cognitive debriefing, 10 patients/country across a broad spectrum of sociodemographic background, were included. Results The ASAS HI and the EF Item Set were translated into Arabic, Chinese, Croatian, Dutch, French, German, Greek, Hungarian, Italian, Korean, Portuguese, Russian, Spanish, Thai and Turkish. Some difficulties were experienced with translation of the contextual factors indicating that these concepts may be more culturally-dependent. A total of 215 patients with axial SpA across 23 countries (62.3% men, mean (SD) age 42.4 (13.9) years) participated in the field test. Cognitive debriefing showed that items of the ASAS HI and EF Item Set are clear, relevant and comprehensive. All versions were accepted with minor modifications with respect to item wording and response option. The wording of three items had to be adapted to improve clarity. As a result of cognitive debriefing, a new response option ‘not applicable’ was added to two items of the ASAS HI to improve appropriateness. Discussion This study showed that the items of the ASAS HI including the EFs were readily adaptable throughout all countries, indicating that the concepts covered were comprehensive, clear and meaningful in different cultures. PMID:27752358
Cross-national health comparisons using the Rasch model: findings from the 2012 US Health and Retirement Study and the 2012 Mexican Health and Aging Study.

PubMed

Hong, Ickpyo; Reistetter, Timothy A; Díaz-Venegas, Carlos; Michaels-Obregon, Alejandra; Wong, Rebeca

2018-05-10

Cross-national comparisons of patterns of population aging have emerged as comparable national micro-data have become available. This study creates a metric using Rasch analysis and determines the health of American and Mexican older adult populations. Secondary data analysis using representative samples aged 50 and older from 2012 U.S. Health and Retirement Study (n = 20,554); 2012 Mexican Health and Aging Study (n = 14,448). We developed a function measurement scale using Rasch analysis of 22 daily tasks and physical function questions. We tested psychometrics of the scale including factor analysis, fit statistics, internal consistency, and item difficulty. We investigated differences in function using multiple linear regression controlling for demographics. Lastly, we conducted subgroup analyses for chronic conditions. The created common metric demonstrated a unidimensional structure with good item fit, an acceptable precision (person reliability = 0.78), and an item difficulty hierarchy. The American adults appeared less functional than adults in Mexico (β = - 0.26, p < 0.0001) and across two chronic conditions (arthritis, β = - 0.36; lung problems, β = - 0.62; all p < 0.05). However, American adults with stroke were more functional than Mexican adults (β = 0.46, p = 0.047). The Rasch model indicates that Mexican adults were more functional than Americans at the population level and across two chronic conditions (arthritis and lung problems). Future studies would need to elucidate other factors affecting the function differences between the two countries.
Rasch Analysis of a New Hierarchical Scoring System for Evaluating Hand Function on the Motor Assessment Scale for Stroke

PubMed Central

Sabari, Joyce S.; Woodbury, Michelle; Velozo, Craig A.

2014-01-01

Objectives. (1) To develop two independent measurement scales for use as items assessing hand movements and hand activities within the Motor Assessment Scale (MAS), an existing instrument used for clinical assessment of motor performance in stroke survivors; (2) To examine the psychometric properties of these new measurement scales. Design. Scale development, followed by a multicenter observational study. Setting. Inpatient and outpatient occupational therapy programs in eight hospital and rehabilitation facilities in the United States and Canada. Participants. Patients (N = 332) receiving stroke rehabilitation following left (52%) or right (48%) cerebrovascular accident; mean age 64.2 years (sd 15); median 1 month since stroke onset. Intervention. Not applicable. Main Outcome Measures. Data were tested for unidimensionality and reliability, and behavioral criteria were ordered according to difficulty level with Rasch analysis. Results. The new scales assessing hand movements and hand activities met Rasch expectations of unidimensionality and reliability. Conclusion. Following a multistep process of test development, analysis, and refinement, we have redesigned the two scales that comprise the hand function items on the MAS. The hand movement scale contains an empirically validated 10-behavior hierarchy and the hand activities item contains an empirically validated 8-behavior hierarchy. PMID:25177513
The Necessity of the Medial Temporal Lobe for Statistical Learning

PubMed Central

Schapiro, Anna C.; Gregory, Emma; Landau, Barbara; McCloskey, Michael; Turk-Browne, Nicholas B.

2014-01-01

The sensory input that we experience is highly patterned, and we are experts at detecting these regularities. Although the extraction of such regularities, or statistical learning (SL), is typically viewed as a cortical process, recent studies have implicated the medial temporal lobe (MTL), including the hippocampus. These studies have employed fMRI, leaving open the possibility that the MTL is involved but not necessary for SL. Here, we examined this issue in a case study of LSJ, a patient with complete bilateral hippocampal loss and broader MTL damage. In Experiments 1 and 2, LSJ and matched control participants were passively exposed to a continuous sequence of shapes, syllables, scenes, or tones containing temporal regularities in the co-occurrence of items. In a subsequent test phase, the control groups exhibited reliable SL in all conditions, successfully discriminating regularities from recombinations of the same items into novel foil sequences. LSJ, however, exhibited no SL, failing to discriminate regularities from foils. Experiment 3 ruled out more general explanations for this failure, such as inattention during exposure or difficulty following test instructions, by showing that LSJ could discriminate which individual items had been exposed. These findings provide converging support for the importance of the MTL in extracting temporal regularities. PMID:24456393
How Task Features Impact Evidence from Assessments Embedded in Simulations and Games

ERIC Educational Resources Information Center

Almond, Russell G.; Kim, Yoon Jeon; Velasquez, Gertrudes; Shute, Valerie J.

2014-01-01

One of the key ideas of evidence-centered assessment design (ECD) is that task features can be deliberately manipulated to change the psychometric properties of items. ECD identifies a number of roles that task-feature variables can play, including determining the focus of evidence, guiding form creation, determining item difficulty and…
An Eye-Movement Study of Relational Memory in Adults with Autism Spectrum Disorder

ERIC Educational Resources Information Center

Ring, Melanie; Bowler, Dermot M.; Gaigg, Sebastian B.

2017-01-01

Persons with Autism Spectrum Disorder (ASD) demonstrate good memory for single items but difficulties remembering contextual information related to these items. Recently, we found compromised explicit but intact implicit retrieval of object-location information in ASD (Ring et al. "Autism Res" 8(5):609-619, 2015). Eye-movement data…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.