difficulty item discrimination: Topics by Science.gov

Sample records for difficulty item discrimination

Item analysis of examinations in the Faculty of Medicine of Tunis.

PubMed

Hermi, Amene; Achour, Wafa

2016-04-01

Introduction Item analysis is the process of collecting, summarizing and using information from students' responses to assess test items' quality. This study used this approach to evaluate the quality of items and examinations given in the Faculty of Medicine of Tunis (FMT). Methods This study concerned the examinations of 2012-2013 (principal session). It analyzed 3138 items from 66 examinations, of which, 46 were multidisciplinary (187 disciplines). A total of 2515 students took the examinations. "AnItem.xls" file was used for the analysis that focused on difficulty, discrimination and internal consistency. Results Mean difficulty for all examinations was optimum (mean difficulty index: 0.59). Majority of items (89.17%) were either easy or of acceptable difficulty. Mean discrimination for all examinations was moderate (mean item discrimination coefficient: 0.28) with poor discrimination in 23.62% of items. Maximal discrimination occurred with disciplines of difficulty index between 0.4-0.6. « Ideal » items represented 27.02%. Mean internal consistency for all examinations was acceptable (Cronbach's alpha: 0.79). Disciplines with nonacceptable internal consistency (68.45%) contained a maximum of 33 items (each one) and a positive correlation between their alpha and the number of their questions. Distributions were mostly (72.73%) platykurtic and negatively asymmetric (89.39%). First year of studies had the best parameters. Conclusion Our examinations had an acceptable internal consistency, and a good level of difficulty and discrimination. They tended to facility and discriminated basically students of medium level. Item analysis is useful as a guide to item writers to improve the overall quality of questions in the future.
Relationship between item difficulty and discrimination indices in true/false-type multiple choice questions of a para-clinical multidisciplinary paper.

PubMed

Sim, Si-Mui; Rasiah, Raja Isaiah

2006-02-01

This paper reports the relationship between the difficulty level and the discrimination power of true/false-type multiple-choice questions (MCQs) in a multidisciplinary paper for the para-clinical year of an undergraduate medical programme. MCQ items in papers taken from Year II Parts A, B and C examinations for Sessions 2001/02, and Part B examinations for 2002/03 and 2003/04, were analysed to obtain their difficulty indices and discrimination indices. Each paper consisted of 250 true/false items (50 questions of 5 items each) on topics drawn from different disciplines. The questions were first constructed and vetted by the individual departments before being submitted to a central committee, where the final selection of the MCQs was made, based purely on the academic judgement of the committee. There was a wide distribution of item difficulty indices in all the MCQ papers analysed. Furthermore, the relationship between the difficulty index (P) and discrimination index (D) of the MCQ items in a paper was not linear, but more dome-shaped. Maximal discrimination (D = 51% to 71%) occurred with moderately easy/difficult items (P = 40% to 74%). On average, about 38% of the MCQ items in each paper were "very easy" (P > or =75%), while about 9% were "very difficult" (P <25%). About two-thirds of these very easy/difficult items had "very poor" or even negative discrimination (D < or =20%). MCQ items that demonstrate good discriminating potential tend to be moderately difficult items, and the moderately-to-very difficult items are more likely to show negative discrimination. There is a need to evaluate the effectiveness of our MCQ items.
Psychometric properties of the Global Operative Assessment of Laparoscopic Skills (GOALS) using item response theory.

PubMed

Watanabe, Yusuke; Madani, Amin; Ito, Yoichi M; Bilgic, Elif; McKendy, Katherine M; Feldman, Liane S; Fried, Gerald M; Vassiliou, Melina C

2017-02-01

The extent to which each item assessed using the Global Operative Assessment of Laparoscopic Skills (GOALS) contributes to the total score remains unknown. The purpose of this study was to evaluate the level of difficulty and discriminative ability of each of the 5 GOALS items using item response theory (IRT). A total of 396 GOALS assessments for a variety of laparoscopic procedures over a 12-year time period were included. Threshold parameters of item difficulty and discrimination power were estimated for each item using IRT. The higher slope parameters seen with "bimanual dexterity" and "efficiency" are indicative of greater discriminative ability than "depth perception", "tissue handling", and "autonomy". IRT psychometric analysis indicates that the 5 GOALS items do not demonstrate uniform difficulty and discriminative power, suggesting that they should not be scored equally. "Bimanual dexterity" and "efficiency" seem to have stronger discrimination. Weighted scores based on these findings could improve the accuracy of assessing individual laparoscopic skills. Copyright © 2016 Elsevier Inc. All rights reserved.
Do Images Influence Assessment in Anatomy? Exploring the Effect of Images on Item Difficulty and Item Discrimination

ERIC Educational Resources Information Center

Vorstenbosch, Marc A. T. M.; Klaassen, Tim P. F. M.; Kooloos, Jan G. M.; Bolhuis, Sanneke M.; Laan, Roland F. J. M.

2013-01-01

Anatomists often use images in assessments and examinations. This study aims to investigate the influence of different types of images on item difficulty and item discrimination in written assessments. A total of 210 of 460 students volunteered for an extra assessment in a gross anatomy course. This assessment contained 39 test items grouped in…
The Definition of Difficulty and Discrimination for Multidimensional Item Response Theory Models.

ERIC Educational Resources Information Center

Reckase, Mark D.; McKinley, Robert L.

A study was undertaken to develop guidelines for the interpretation of the parameters of three multidimensional item response theory models and to determine the relationship between the parameters and traditional concepts of item difficulty and discrimination. The three models considered were multidimensional extensions of the one-, two-, and…
Outcome-based self-assessment on a team-teaching subject in the medical school

PubMed Central

Cho, Sa Sun

2014-01-01

We attempted to investigate the reason why the students got a worse grade in gross anatomy and the way how we can improve upon the teaching method since there were gaps between teaching and learning under recently changed integration curriculum. General characteristics of students and exploratory factors to testify the validity were compared between year 2011 and 2012. Students were asked to complete a short survey with a Likert scale. The results were as follows: although the percentage of acceptable items was similar between professors, professor C preferred questions with adequate item discrimination and inappropriate item difficulty whereas professor Y preferred adequate item discrimination and appropriate item difficulty with statistical significance (P<0.01). The survey revealed that 26.5% of total students gave up the exam on gross anatomy of professor Y irrespective of years. These results suggested that students were affected by the corrected item difficulty rather than item discrimination in order to obtain academic achievement. Therefore, professors in a team-teaching subject should reach a consensus on an item difficulty with proper teaching methods. PMID:25548724
The Arabic Version of The Depression Anxiety Stress Scale-21: Cumulative scaling and discriminant-validation testing.

PubMed

Ali, Amira Mohammed; Ahmed, Anwar; Sharaf, Amira; Kawakami, Norito; Abdeldayem, Samia M; Green, Joseph

2017-12-01

This study aimed to examine the validity of the Arabic version of the Depression Anxiety Stress Scale-21 (DASS-21) in 149 illicit drug users. We calculated α coefficient, inter-item and item-total correlations, coefficients of reproducibility and scalability (CR and CS), item difficulty and discrimination indices. The DASS-21 had an acceptable reliability; but values of the CR and the CS were less than acceptable. Items varied in difficulty and discrimination; some items are candidates for elimination. The DASS-21 is a probabilistic and not a deterministic measure of distress; it has problematic items and needs further investigations. Copyright © 2017 Elsevier B.V. All rights reserved.
Fitting the Rasch Model to Account for Variation in Item Discrimination

ERIC Educational Resources Information Center

Weitzman, R. A.

2009-01-01

Building on the Kelley and Gulliksen versions of classical test theory, this article shows that a logistic model having only a single item parameter can account for varying item discrimination, as well as difficulty, by using item-test correlations to adjust incorrect-correct (0-1) item responses prior to an initial model fit. The fit occurs…
Item selection via Bayesian IRT models.

PubMed

Arima, Serena

2015-02-10

With reference to a questionnaire that aimed to assess the quality of life for dysarthric speakers, we investigate the usefulness of a model-based procedure for reducing the number of items. We propose a mixed cumulative logit model, which is known in the psychometrics literature as the graded response model: responses to different items are modelled as a function of individual latent traits and as a function of item characteristics, such as their difficulty and their discrimination power. We jointly model the discrimination and the difficulty parameters by using a k-component mixture of normal distributions. Mixture components correspond to disjoint groups of items. Items that belong to the same groups can be considered equivalent in terms of both difficulty and discrimination power. According to decision criteria, we select a subset of items such that the reduced questionnaire is able to provide the same information that the complete questionnaire provides. The model is estimated by using a Bayesian approach, and the choice of the number of mixture components is justified according to information criteria. We illustrate the proposed approach on the basis of data that are collected for 104 dysarthric patients by local health authorities in Lecce and in Milan. Copyright © 2014 John Wiley & Sons, Ltd.
Do item-writing flaws reduce examinations psychometric quality?

PubMed

Pais, João; Silva, Artur; Guimarães, Bruno; Povo, Ana; Coelho, Elisabete; Silva-Pereira, Fernanda; Lourinho, Isabel; Ferreira, Maria Amélia; Severo, Milton

2016-08-11

The psychometric characteristics of multiple-choice questions (MCQ) changed when taking into account their anatomical sites and the presence of item-writing flaws (IWF). The aim is to understand the impact of the anatomical sites and the presence of IWF in the psychometric qualities of the MCQ. 800 Clinical Anatomy MCQ from eight examinations were classified as standard or flawed items and according to one of the eight anatomical sites. An item was classified as flawed if it violated at least one of the principles of item writing. The difficulty and discrimination indices of each item were obtained. 55.8 % of the MCQ were flawed items. The anatomical site of the items explained 6.2 and 3.2 % of the difficulty and discrimination parameters and the IWF explained 2.8 and 0.8 %, respectively. The impact of the IWF was heterogeneous, the Writing the Stem and Writing the Choices categories had a negative impact (higher difficulty and lower discrimination) while the other categories did not have any impact. The anatomical site effect was higher than IWF effect in the psychometric characteristics of the examination. When constructing MCQ, the focus should be in the topic/area of the items and only after in the presence of IWF.
Factors Affecting Item Difficulty in English Listening Comprehension Tests

ERIC Educational Resources Information Center

Sung, Pei-Ju; Lin, Su-Wei; Hung, Pi-Hsia

2015-01-01

Task difficulty is a critical issue affecting test developers. Controlling or balancing the item difficulty of an assessment improves its validity and discrimination. Test developers construct tests from the cognitive perspective, by making the test constructing process more scientific and efficient; thus, the scores obtained more precisely…
A Comparison between Discrimination Indices and Item-Response Theory Using the Rasch Model in a Clinical Course Written Examination of a Medical School.

PubMed

Park, Jong Cook; Kim, Kwang Sig

2012-03-01

The reliability of test is determined by each items' characteristics. Item analysis is achieved by classical test theory and item response theory. The purpose of the study was to compare the discrimination indices with item response theory using the Rasch model. Thirty-one 4th-year medical school students participated in the clinical course written examination, which included 22 A-type items and 3 R-type items. Point biserial correlation coefficient (C(pbs)) was compared to method of extreme group (D), biserial correlation coefficient (C(bs)), item-total correlation coefficient (C(it)), and corrected item-total correlation coeffcient (C(cit)). Rasch model was applied to estimate item difficulty and examinee's ability and to calculate item fit statistics using joint maximum likelihood. Explanatory power (r2) of Cpbs is decreased in the following order: C(cit) (1.00), C(it) (0.99), C(bs) (0.94), and D (0.45). The ranges of difficulty logit and standard error and ability logit and standard error were -0.82 to 0.80 and 0.37 to 0.76, -3.69 to 3.19 and 0.45 to 1.03, respectively. Item 9 and 23 have outfit > or =1.3. Student 1, 5, 7, 18, 26, 30, and 32 have fit > or =1.3. C(pbs), C(cit), and C(it) are good discrimination parameters. Rasch model can estimate item difficulty parameter and examinee's ability parameter with standard error. The fit statistics can identify bad items and unpredictable examinee's responses.
Modeling the Severity of Drinking Consequences in First-Year College Women: An Item Response Theory Analysis of the Rutgers Alcohol Problem Index*

PubMed Central

Cohn, Amy M.; Hagman, Brett T.; Graff, Fiona S.; Noel, Nora E.

2011-01-01

Objective: The present study examined the latent continuum of alcohol-related negative consequences among first-year college women using methods from item response theory and classical test theory. Method: Participants (N = 315) were college women in their freshman year who reported consuming any alcohol in the past 90 days and who completed assessments of alcohol consumption and alcohol-related negative consequences using the Rutgers Alcohol Problem Index. Results: Item response theory analyses showed poor model fit for five items identified in the Rutgers Alcohol Problem Index. Two-parameter item response theory logistic models were applied to the remaining 18 items to examine estimates of item difficulty (i.e., severity) and discrimination parameters. The item difficulty parameters ranged from 0.591 to 2.031, and the discrimination parameters ranged from 0.321 to 2.371. Classical test theory analyses indicated that the omission of the five misfit items did not significantly alter the psychometric properties of the construct. Conclusions: Findings suggest that those consequences that had greater severity and discrimination parameters may be used as screening items to identify female problem drinkers at risk for an alcohol use disorder. PMID:22051212
Difficulty and Discriminability of Introductory Psychology Test Items.

ERIC Educational Resources Information Center

Scialfa, Charles; Legare, Connie; Wenger, Larry; Dingley, Louis

2001-01-01

Analyzes multiple-choice questions provided in test banks for introductory psychology textbooks. Study 1 offered a consistent picture of the objective difficulty of multiple-choice tests for introductory psychology students, while both studies 1 and 2 indicated that test items taken from commercial test banks have poor psychometric properties.…
Systemic factors of errors in the case identification process of the national routine health information system: A case study of Modified Field Health Services Information System in the Philippines

PubMed Central

2011-01-01

Background The quality of data in national health information systems has been questionable in most developing countries. However, the mechanisms of errors in the case identification process are not fully understood. This study aimed to investigate the mechanisms of errors in the case identification process in the existing routine health information system (RHIS) in the Philippines by measuring the risk of committing errors for health program indicators used in the Field Health Services Information System (FHSIS 1996), and characterizing those indicators accordingly. Methods A structured questionnaire on the definitions of 12 selected indicators in the FHSIS was administered to 132 health workers in 14 selected municipalities in the province of Palawan. A proportion of correct answers (difficulty index) and a disparity of two proportions of correct answers between higher and lower scored groups (discrimination index) were calculated, and the patterns of wrong answers for each of the 12 items were abstracted from 113 valid responses. Results None of 12 items reached a difficulty index of 1.00. The average difficulty index of 12 items was 0.266 and the discrimination index that showed a significant difference was 0.216 and above. Compared with these two cut-offs, six items showed non-discrimination against lower difficulty indices of 0.035 (4/113) to 0.195 (22/113), two items showed a positive discrimination against lower difficulty indices of 0.142 (16/113) and 0.248 (28/113), and four items showed a positive discrimination against higher difficulty indices of 0.469 (53/113) to 0.673 (76/113). Conclusions The results suggest three characteristics of definitions of indicators such as those that are (1) unsupported by the current conditions in the health system, i.e., (a) data are required from a facility that cannot directly generate the data and, (b) definitions of indicators are not consistent with its corresponding program; (2) incomplete or ambiguous, which allow several interpretations; and (3) complete yet easily misunderstood by health workers. Taking systemic factors into account, the case identification step needs to be reviewed and designed to generate intended data in health information systems. PMID:21995369
Is the Factor Observed in Investigations on the Item-Position Effect Actually the Difficulty Factor?

PubMed

Schweizer, Karl; Troche, Stefan

2018-02-01

In confirmatory factor analysis quite similar models of measurement serve the detection of the difficulty factor and the factor due to the item-position effect. The item-position effect refers to the increasing dependency among the responses to successively presented items of a test whereas the difficulty factor is ascribed to the wide range of item difficulties. The similarity of the models of measurement hampers the dissociation of these factors. Since the item-position effect should theoretically be independent of the item difficulties, the statistical ex post manipulation of the difficulties should enable the discrimination of the two types of factors. This method was investigated in two studies. In the first study, Advanced Progressive Matrices (APM) data of 300 participants were investigated. As expected, the factor thought to be due to the item-position effect was observed. In the second study, using data simulated to show the major characteristics of the APM data, the wide range of items with various difficulties was set to zero to reduce the likelihood of detecting the difficulty factor. Despite this reduction, however, the factor now identified as item-position factor, was observed in virtually all simulated datasets.
Two-item same/different discrimination in rhesus monkeys (Macaca mulatta).

PubMed

Basile, Benjamin M; Moylan, Emily J; Charles, David P; Murray, Elisabeth A

2015-11-01

Almost all nonhuman animals can recognize when one item is the same as another item. It is less clear whether nonhuman animals possess abstract concepts of "same" and "different" that can be divorced from perceptual similarity. Pigeons and monkeys show inconsistent performance, and often surprising difficulty, in laboratory tests of same/different learning that involve only two items. Previous results from tests using multi-item arrays suggest that nonhumans compute sameness along a continuous scale of perceptual variability, which would explain the difficulty of making two-item same/different judgments. Here, we provide evidence that rhesus monkeys can learn a two-item same/different discrimination similar to those on which monkeys and pigeons have previously failed. Monkeys' performance transferred to novel stimuli and was not affected by perceptual variations in stimulus size, rotation, view, or luminance. Success without the use of multi-item arrays, and the lack of effect of perceptual variability, suggests a computation of sameness that is more categorical, and perhaps more abstract, than previously thought.
A Review of Classical Methods of Item Analysis.

ERIC Educational Resources Information Center

French, Christine L.

Item analysis is a very important consideration in the test development process. It is a statistical procedure to analyze test items that combines methods used to evaluate the important characteristics of test items, such as difficulty, discrimination, and distractibility of the items in a test. This paper reviews some of the classical methods for…
An instrument to measure nurses' knowledge in palliative care: Validation of the Spanish version of Palliative Care Quiz for Nurses

PubMed Central

2017-01-01

Background Palliative care is nowadays essential in nursing care, due to the increasing number of patients who require attention in final stages of their life. Nurses need to acquire specific knowledge and abilities to provide quality palliative care. Palliative Care Quiz for Nurses is a questionnaire that evaluates their basic knowledge about palliative care. The Palliative Care Quiz for Nurses (PCQN) is useful to evaluate basic knowledge about palliative care, but its adaptation into the Spanish language and the analysis of its effectiveness and utility for Spanish culture is lacking. Purpose To report the adaptation into the Spanish language and the psychometric analysis of the Palliative Care Quiz for Nurses. Method The Palliative Care Quiz for Nurses-Spanish Version (PCQN-SV) was obtained from a process including translation, back-translation, comparison with versions in other languages, revision by experts, and pilot study. Content validity and reliability of questionnaire were analyzed. Difficulty and discrimination indexes of each item were also calculated according to Item Response Theory (IRT). Findings Adequate internal consistency was found (S-CVI = 0.83); Cronbach's alpha coefficient of 0.67 and KR-20 test result of 0,72 reflected the reliability of PCQN-SV. The questionnaire had a global difficulty index of 0,55, with six items which could be considered as difficult or very difficult, and five items with could be considered easy or very easy. The discrimination indexes of the 20 items, show us that eight items are good or very good while six items are bad to discriminate between good and bad respondents. Discussion Although in shows internal consistency, reliability and difficulty indexes similar to those obtained by versions of PCQN in other languages, a reformulation of the items with lowest content validity or discrimination indexes and those showing difficulties with their comprehension is an aspect to take into account in order to improve the PCQN-SV. Conclusion The PCQN-SV is a useful Spanish language instrument for measuring Spanish nurses’ knowledge in palliative care and it is adequate to establish international comparisons. PMID:28545037
An instrument to measure nurses' knowledge in palliative care: Validation of the Spanish version of Palliative Care Quiz for Nurses.

PubMed

Chover-Sierra, Elena; Martínez-Sabater, Antonio; Lapeña-Moñux, Yolanda Raquel

2017-01-01

Palliative care is nowadays essential in nursing care, due to the increasing number of patients who require attention in final stages of their life. Nurses need to acquire specific knowledge and abilities to provide quality palliative care. Palliative Care Quiz for Nurses is a questionnaire that evaluates their basic knowledge about palliative care. The Palliative Care Quiz for Nurses (PCQN) is useful to evaluate basic knowledge about palliative care, but its adaptation into the Spanish language and the analysis of its effectiveness and utility for Spanish culture is lacking. To report the adaptation into the Spanish language and the psychometric analysis of the Palliative Care Quiz for Nurses. The Palliative Care Quiz for Nurses-Spanish Version (PCQN-SV) was obtained from a process including translation, back-translation, comparison with versions in other languages, revision by experts, and pilot study. Content validity and reliability of questionnaire were analyzed. Difficulty and discrimination indexes of each item were also calculated according to Item Response Theory (IRT). Adequate internal consistency was found (S-CVI = 0.83); Cronbach's alpha coefficient of 0.67 and KR-20 test result of 0,72 reflected the reliability of PCQN-SV. The questionnaire had a global difficulty index of 0,55, with six items which could be considered as difficult or very difficult, and five items with could be considered easy or very easy. The discrimination indexes of the 20 items, show us that eight items are good or very good while six items are bad to discriminate between good and bad respondents. Although in shows internal consistency, reliability and difficulty indexes similar to those obtained by versions of PCQN in other languages, a reformulation of the items with lowest content validity or discrimination indexes and those showing difficulties with their comprehension is an aspect to take into account in order to improve the PCQN-SV. The PCQN-SV is a useful Spanish language instrument for measuring Spanish nurses' knowledge in palliative care and it is adequate to establish international comparisons.

An Information Analysis of 2-, 3-, and 4-Word Verbal Discrimination Learning.

ERIC Educational Resources Information Center

Arima, James K.; Gray, Francis D.

Information theory was used to qualify the difficulty of verbal discrimination (VD) learning tasks and to measure VD performance. Words for VD items were selected with high background frequency and equal a priori probabilities of being selected as a first response. Three VD lists containing only 2-, 3-, or 4-word items were created and equated for…
Measuring Student Learning with Item Response Theory

ERIC Educational Resources Information Center

Lee, Young-Jin; Palazzo, David J.; Warnakulasooriya, Rasil; Pritchard, David E.

2008-01-01

We investigate short-term learning from hints and feedback in a Web-based physics tutoring system. Both the skill of students and the difficulty and discrimination of items were determined by applying item response theory (IRT) to the first answers of students who are working on for-credit homework items in an introductory Newtonian physics…
Fractionating the Neural Substrates of Incidental Recognition Memory

ERIC Educational Resources Information Center

Greene, Ciara M.; Vidaki, Kleio; Soto, David

2015-01-01

Familiar stimuli are typically accompanied by decreases in neural response relative to the presentation of novel items, but these studies often include explicit instructions to discriminate old and new items; this creates difficulties in partialling out the contribution of top-down intentional orientation to the items based on recognition goals.…
Relevance of Item Analysis in Standardizing an Achievement Test in Teaching of Physical Science in B.Ed Syllabus

ERIC Educational Resources Information Center

Marie, S. Maria Josephine Arokia; Edannur, Sreekala

2015-01-01

This paper focused on the analysis of test items constructed in the paper of teaching Physical Science for B.Ed. class. It involved the analysis of difficulty level and discrimination power of each test item. Item analysis allows selecting or omitting items from the test, but more importantly item analysis is a tool to help the item writer improve…
A Comparison of Alternate-Choice and True-False Item Forms Used in Classroom Examinations.

ERIC Educational Resources Information Center

Maihoff, N. A.; Mehrens, Wm. A.

A comparison is presented of alternate-choice and true-false item forms used in an undergraduate natural science course. The alternate-choice item is a modified two-choice multiple-choice item in which the two responses are included within the question stem. This study (1) compared the difficulty level, discrimination level, reliability, and…
Anesthesiology Journal club assessment by means of semantic changes.

PubMed

Vieira, Joaquim Edson; Torres, Marcelo Luís Abramides; Pose, Regina Albanese; Auler, José Otávio Costa Junior

2014-01-01

the interactive approach of a journal club has been described in the medical education literature. The aim of this investigation is to present an assessment of journal club as a tool to address the question whether residents read more and critically. this study reports the performance of medical residents in anesthesiology from the Clinics Hospital - University of São Paulo Medical School. All medical residents were invited to answer five questions derived from discussed papers. The answer sheet consisted of an affirmative statement with a Likert type scale (totally disagree-disagree-not sure-agree-totally agree), each related to one of the chosen articles. The results were evaluated by means of item analysis - difficulty index and discrimination power. residents filled one hundred and seventy three evaluations in the months of December 2011 (n=51), July 2012 (n=66) and December 2012 (n=56). The first exam presented all items with straight statement, second and third exams presented mixed items. Separating "totally agree" from "agree" increased the difficulty indices, but did not improve the discrimination power. the use of a journal club assessment with straight and inverted statements and by means of five points scale for agreement has been shown to increase its item difficulty and discrimination power. This may reflect involvement either with the reading or the discussion during the journal meeting. Copyright © 2013 Sociedade Brasileira de Anestesiologia. Published by Elsevier Editora Ltda. All rights reserved.
Using Reliability and Item Analysis to Evaluate a Teacher-Developed Test in Educational Measurement and Evaluation

ERIC Educational Resources Information Center

Quaigrain, Kennedy; Arhin, Ato Kwamina

2017-01-01

Item analysis is essential in improving items which will be used again in later tests; it can also be used to eliminate misleading items in a test. The study focused on item and test quality and explored the relationship between difficulty index (p-value) and discrimination index (DI) with distractor efficiency (DE). The study was conducted among…
A Comparison of Traditional Test Blueprinting and Item Development to Assessment Engineering in a Licensure Context

ERIC Educational Resources Information Center

Masters, James S.

2010-01-01

With the need for larger and larger banks of items to support adaptive testing and to meet security concerns, large-scale item generation is a requirement for many certification and licensure programs. As part of the mass production of items, it is critical that the difficulty and the discrimination of the items be known without the need for…
Assessing the Conceptual Understanding about Heat and Thermodynamics at Undergraduate Level

ERIC Educational Resources Information Center

Kulkarni, Vasudeo Digambar; Tambade, Popat Savaleram

2013-01-01

In this study, a Thermodynamic Concept Test (TCT) was designed to assess student's conceptual understanding heat and thermodynamics at undergraduate level. The different statistical tests such as item difficulty index, item discrimination index, point biserial coefficient were used for assessing TCT. For each item of the test these indices were…
Conceptual thresholds for same and different in old-(Macaca mulatta) and new-world (Cebus apella) monkeys

PubMed Central

Flemming, Timothy M.

2011-01-01

Learning of the relational same/different (S/D) concept has been demonstrated to be largely dependent upon stimulus sets containing more than two items for pigeons and old-world monkeys. Stimulus arrays containing several images for use in same/different discrimination procures (e.g. 16 identical images vs. 16 nonidentical images) have been shown to facilitate and even be necessary for learning of relational concepts (Flemming, Beran & Washburn, 2007; Wasserman, Young & Fagot, 2001; Young, Wasserman & Garner, 1997). In the present study, we investigate the threshold at which a new world primate, the capuchin (Cebus apella) may be able to make such a discrimination. Utilizing a method of increasing entropy, rather than conventional procedures of decreasing entropy, we demonstrate unique evidence that capuchin monkeys are readily capable of making 2-item relational S/D conditional discriminations. In another experiment, we examine the supposed level of difficulty in making S/D discriminations by rhesus monkeys (Macaca mulatta). Whereas pigeons (Columba livia) and baboons (Papio papio) have shown marked difficulty simultaneously discriminating same from different arrays at all when composed of fewer than 8 items each, rhesus monkeys seem to understand that pairs of stimuli connote sameness and difference just the same (Flemming et al., 2007). With sustained accurate performance of 2-item S/D discriminations, both experienced and task-naïve rhesus monkeys appear quite certain in their conceptual knowledge of same and different. We conclude that learning of the same/different relational concept may be less dependent upon high levels of entropy contrast than originally hypothesized for nonhuman primates. PMID:21238555
A Graphical Approach to Item Analysis. Research Report. ETS RR-04-10

ERIC Educational Resources Information Center

Livingston, Samuel A.; Dorans, Neil J.

2004-01-01

This paper describes an approach to item analysis that is based on the estimation of a set of response curves for each item. The response curves show, at a glance, the difficulty and the discriminating power of the item and the popularity of each distractor, at any level of the criterion variable (e.g., total score). The curves are estimated by…
Investigating the Performance of Omega Index According to Item Parameters and Ability Levels

ERIC Educational Resources Information Center

Sunbul, Onder; Yormaz, Seha

2018-01-01

Purpose: Several studies can be found in the literature that investigate the performance of ? under various conditions. However no study for the effects of item difficulty, item discrimination, and ability restrictions on the performance of ? could be found. The current study aims to investigate the performance of ? for the conditions given below.…
Building an Evaluation Scale using Item Response Theory.

PubMed

Lalor, John P; Wu, Hao; Yu, Hong

2016-11-01

Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.
Building an Evaluation Scale using Item Response Theory

PubMed Central

Lalor, John P.; Wu, Hao; Yu, Hong

2016-01-01

Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.1 PMID:28004039
An Investigation of the Impact of Guessing on Coefficient α and Reliability

PubMed Central

2014-01-01

Guessing is known to influence the test reliability of multiple-choice tests. Although there are many studies that have examined the impact of guessing, they used rather restrictive assumptions (e.g., parallel test assumptions, homogeneous inter-item correlations, homogeneous item difficulty, and homogeneous guessing levels across items) to evaluate the relation between guessing and test reliability. Based on the item response theory (IRT) framework, this study investigated the extent of the impact of guessing on reliability under more realistic conditions where item difficulty, item discrimination, and guessing levels actually vary across items with three different test lengths (TL). By accommodating multiple item characteristics simultaneously, this study also focused on examining interaction effects between guessing and other variables entered in the simulation to be more realistic. The simulation of the more realistic conditions and calculations of reliability and classical test theory (CTT) item statistics were facilitated by expressing CTT item statistics, coefficient α, and reliability in terms of IRT model parameters. In addition to the general negative impact of guessing on reliability, results showed interaction effects between TL and guessing and between guessing and test difficulty.
Applying Item Response Theory to the Development of a Screening Adaptation of the Goldman-Fristoe Test of Articulation-Second Edition

ERIC Educational Resources Information Center

Brackenbury, Tim; Zickar, Michael J.; Munson, Benjamin; Storkel, Holly L.

2017-01-01

Purpose: Item response theory (IRT) is a psychometric approach to measurement that uses latent trait abilities (e.g., speech sound production skills) to model performance on individual items that vary by difficulty and discrimination. An IRT analysis was applied to preschoolers' productions of the words on the Goldman-Fristoe Test of…
Can Item Analysis of MCQs Accomplish the Need of a Proper Assessment Strategy for Curriculum Improvement in Medical Education?

ERIC Educational Resources Information Center

Pawade, Yogesh R.; Diwase, Dipti S.

2016-01-01

Item analysis of Multiple Choice Questions (MCQs) is the process of collecting, summarizing and utilizing information from students' responses to evaluate the quality of test items. Difficulty Index (p-value), Discrimination Index (DI) and Distractor Efficiency (DE) are the parameters which help to evaluate the quality of MCQs used in an…
Evaluation of five guidelines for option development in multiple-choice item-writing.

PubMed

Martínez, Rafael J; Moreno, Rafael; Martín, Irene; Trigo, M Eva

2009-05-01

This paper evaluates certain guidelines for writing multiple-choice test items. The analysis of the responses of 5013 subjects to 630 items from 21 university classroom achievement tests suggests that an option should not differ in terms of heterogeneous content because such error has a slight but harmful effect on item discrimination. This also occurs with the "None of the above" option when it is the correct one. In contrast, results do not show the supposedly negative effects of a different-length option, the use of specific determiners, or the use of the "All of the above" option, which not only decreases difficulty but also improves discrimination when it is the correct option.
Differential Item Functioning in Primary Healthcare Evaluation Instruments by French/English Version, Educational Level and Urban/Rural Location

PubMed Central

Haggerty, Jeannie L.; Bouharaoui, Fatima; Santor, Darcy A.

2011-01-01

Evaluating the extent to which groups or subgroups of individuals differ with respect to primary healthcare experience depends on first ruling out the possibility of bias. Objective: To determine whether item or subscale performance differs systematically between French/English, high/low education subgroups and urban/rural residency. Method: A sample of 645 adult users balanced by French/English language (in Quebec and Nova Scotia, respectively), high/low education and urban/rural residency responded to six validated instruments: the Primary Care Assessment Survey (PCAS); the Primary Care Assessment Tool – Short Form (PCAT-S); the Components of Primary Care Index (CPCI); the first version of the EUROPEP (EUROPEP-I); the Interpersonal Processes of Care Survey, version II (IPC-II); and part of the Veterans Affairs National Outpatient Customer Satisfaction Survey (VANOCSS). We normalized subscale scores to a 0-to-10 scale and tested for between-group differences using ANOVA tests. We used a parametric item response model to test for differences between subgroups in item discriminability and item difficulty. We re-examined group differences after removing items with differential item functioning. Results: Experience of care was assessed more positively in the English-speaking (Nova Scotia) than in the French-speaking (Quebec) respondents. We found differential English/French item functioning in 48% of the 153 items: discriminability in 20% and differential difficulty in 28%. English items were more discriminating generally than the French. Removing problematic items did not change the differences in French/English assessments. Differential item functioning by high/low education status affected 27% of items, with items being generally more discriminating in high-education groups. Between-group comparisons were unchanged. In contrast, only 9% of items showed differential item functioning by geography, affecting principally the accessibility attribute. Removing problematic items reversed a previously non-significant finding, revealing poorer first-contact access in rural than in urban areas. Conclusion: Differential item functioning does not bias or invalidate French/English comparisons on subscales, but additional development is required to make French and English items equivalent. These instruments are relatively robust by educational status and geography, but results suggest potential differences in the underlying construct in low-education and rural respondents. PMID:23205035
Identifying predictors of physics item difficulty: A linear regression approach

NASA Astrophysics Data System (ADS)

Mesic, Vanes; Muratovic, Hasnija

2011-06-01

Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge structures. Identified predictors point out the fundamental cognitive dimensions of student physics achievement at the end of compulsory education in Bosnia and Herzegovina, whose level of development influenced the test results within the conducted assessments.

Identifying Predictors of Physics Item Difficulty: A Linear Regression Approach

ERIC Educational Resources Information Center

Mesic, Vanes; Muratovic, Hasnija

2011-01-01

Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary…
Sources of Interactional Problems in a Survey of Racial/Ethnic Discrimination

PubMed Central

Johnson, Timothy P.; Shariff-Marco, Salma; Willis, Gordon; Cho, Young Ik; Breen, Nancy; Gee, Gilbert C.; Krieger, Nancy; Grant, David; Alegria, Margarita; Mays, Vickie M.; Williams, David R.; Landrine, Hope; Liu, Benmei; Reeve, Bryce B.; Takeuchi, David; Ponce, Ninez A.

2014-01-01

Cross-cultural variability in respondent processing of survey questions may bias results from multiethnic samples. We analyzed behavior codes, which identify difficulties in the interactions of respondents and interviewers, from a discrimination module contained within a field test of the 2007 California Health Interview Survey. In all, 553 (English) telephone interviews yielded 13,999 interactions involving 22 items. Multilevel logistic regression modeling revealed that respondent age and several item characteristics (response format, customized questions, length, and first item with new response format), but not race/ethnicity, were associated with interactional problems. These findings suggest that item function within a multi-cultural, albeit English language, survey may be largely influenced by question features, as opposed to respondent characteristics such as race/ethnicity. PMID:26166949
Greater loss of object than spatial mnemonic discrimination in aged adults.

PubMed

Reagh, Zachariah M; Ho, Huy D; Leal, Stephanie L; Noche, Jessica A; Chun, Amanda; Murray, Elizabeth A; Yassa, Michael A

2016-04-01

Previous studies across species have established that the aging process adversely affects certain memory-related brain regions earlier than others. Behavioral tasks targeted at the function of vulnerable regions can provide noninvasive methods for assessing the integrity of particular components of memory throughout the lifespan. The present study modified a previous task designed to separately but concurrently test detailed memory for object identity and spatial location. Memory for objects or items is thought to rely on perirhinal and lateral entorhinal cortices, among the first targets of Alzheimer's related neurodegeneration. In line with prior work, we split an aged adult sample into "impaired" and "unimpaired" groups on the basis of a standardized word-learning task. The "impaired" group showed widespread difficulty with memory discrimination, whereas the "unimpaired" group showed difficulty with object, but not spatial memory discrimination. These findings support the hypothesized greater age-related impacts on memory for objects or items in older adults, perhaps even with healthy aging. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Item response theory analysis of the Utrecht Work Engagement Scale for Students (UWES-S) using a sample of Japanese university and college students majoring medical science, nursing, and natural science.

PubMed

Tsubakita, Takashi; Shimazaki, Kazuyo; Ito, Hiroshi; Kawazoe, Nobuo

2017-10-30

The Utrecht Work Engagement Scale for Students has been used internationally to assess students' academic engagement, but it has not been analyzed via item response theory. The purpose of this study was to conduct an item response theory analysis of the Japanese version of the Utrecht Work Engagement Scale for Students translated by authors. Using a two-parameter model and Samejima's graded response model, difficulty and discrimination parameters were estimated after confirming the factor structure of the scale. The 14 items on the scale were analyzed with a sample of 3214 university and college students majoring medical science, nursing, or natural science in Japan. The preliminary parameter estimation was conducted with the two parameter model, and indicated that three items should be removed because there were outlier parameters. Final parameter estimation was conducted using the survived 11 items, and indicated that all difficulty and discrimination parameters were acceptable. The test information curve suggested that the scale better assesses higher engagement than average engagement. The estimated parameters provide a basis for future comparative studies. The results also suggested that a 7-point Likert scale is too broad; thus, the scaling should be modified to fewer graded scaling structure.
Item analysis of three Spanish naming tests: a cross-cultural investigation.

PubMed

Marquez de la Plata, Carlos; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C Munro

2009-01-01

Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test's construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (136 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided.
Validation of a clinical critical thinking skills test in nursing.

PubMed

Shin, Sujin; Jung, Dukyoo; Kim, Sungeun

2015-01-27

The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability.
Validation of a clinical critical thinking skills test in nursing

PubMed Central

2015-01-01

Purpose: The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. Methods: This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Results: Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. Conclusion: From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability. PMID:25622716
HIV/AIDS knowledge among men who have sex with men: applying the item response theory.

PubMed

Gomes, Raquel Regina de Freitas Magalhães; Batista, José Rodrigues; Ceccato, Maria das Graças Braga; Kerr, Lígia Regina Franco Sansigolo; Guimarães, Mark Drew Crosland

2014-04-01

To evaluate the level of HIV/AIDS knowledge among men who have sex with men in Brazil using the latent trait model estimated by Item Response Theory. Multicenter, cross-sectional study, carried out in ten Brazilian cities between 2008 and 2009. Adult men who have sex with men were recruited (n = 3,746) through Respondent Driven Sampling. HIV/AIDS knowledge was ascertained through ten statements by face-to-face interview and latent scores were obtained through two-parameter logistic modeling (difficulty and discrimination) using Item Response Theory. Differential item functioning was used to examine each item characteristic curve by age and schooling. Overall, the HIV/AIDS knowledge scores using Item Response Theory did not exceed 6.0 (scale 0-10), with mean and median values of 5.0 (SD = 0.9) and 5.3, respectively, with 40.7% of the sample with knowledge levels below the average. Some beliefs still exist in this population regarding the transmission of the virus by insect bites, by using public restrooms, and by sharing utensils during meals. With regard to the difficulty and discrimination parameters, eight items were located below the mean of the scale and were considered very easy, and four items presented very low discrimination parameter (< 0.34). The absence of difficult items contributed to the inaccuracy of the measurement of knowledge among those with median level and above. Item Response Theory analysis, which focuses on the individual properties of each item, allows measures to be obtained that do not vary or depend on the questionnaire, which provides better ascertainment and accuracy of knowledge scores. Valid and reliable scales are essential for monitoring HIV/AIDS knowledge among the men who have sex with men population over time and in different geographic regions, and this psychometric model brings this advantage.
Pick-N Multiple Choice-Exams: A Comparison of Scoring Algorithms

ERIC Educational Resources Information Center

Bauer, Daniel; Holzer, Matthias; Kopp, Veronika; Fischer, Martin R.

2011-01-01

To compare different scoring algorithms for Pick-N multiple correct answer multiple-choice (MC) exams regarding test reliability, student performance, total item discrimination and item difficulty. Data from six 3rd year medical students' end of term exams in internal medicine from 2005 to 2008 at Munich University were analysed (1,255 students,…
An Analysis of the Connectedness to Nature Scale Based on Item Response Theory.

PubMed

Pasca, Laura; Aragonés, Juan I; Coello, María T

2017-01-01

The Connectedness to Nature Scale (CNS) is used as a measure of the subjective cognitive connection between individuals and nature. However, to date, it has not been analyzed at the item level to confirm its quality. In the present study, we conduct such an analysis based on Item Response Theory. We employed data from previous studies using the Spanish-language version of the CNS, analyzing a sample of 1008 participants. The results show that seven items presented appropriate indices of discrimination and difficulty, in addition to a good fit. The remaining six have inadequate discrimination indices and do not present a good fit. A second study with 321 participants shows that the seven-item scale has adequate levels of reliability and validity. Therefore, it would be appropriate to use a reduced version of the scale after eliminating the items that display inappropriate behavior, since they may interfere with research results on connectedness to nature.
A New Functional Health Literacy Scale for Japanese Young Adults Based on Item Response Theory.

PubMed

Tsubakita, Takashi; Kawazoe, Nobuo; Kasano, Eri

2017-03-01

Health literacy predicts health outcomes. Despite concerns surrounding the health of Japanese young adults, to date there has been no objective assessment of health literacy in this population. This study aimed to develop a Functional Health Literacy Scale for Young Adults (funHLS-YA) based on item response theory. Each item in the scale requires participants to choose the most relevant term from 3 choices in relation to a target item, thus assessing objective rather than perceived health literacy. The 20-item scale was administered to 1816 university students and 1751 responded. Cronbach's α coefficient was .73. Difficulty and discrimination parameters of each item were estimated, resulting in the exclusion of 1 item. Some items showed different difficulty parameters for male and female participants, reflecting that some aspects of health literacy may differ by gender. The current 19-item version of funHLS-YA can reliably assess the objective health literacy of Japanese young adults.
Racial Discrimination and Ethnic Disparities in Sleep Disturbance: the 2002/03 New Zealand Health Survey.

PubMed

Paine, Sarah-Jane; Harris, Ricci; Cormack, Donna; Stanley, James

2016-02-01

Research on the relationship between racial discrimination and sleep is limited. The aims of this study were to: (1) examine the independent relationship between ethnicity, sex, age, socioeconomic position, experience of racial discrimination and self-reported sleep disturbances, and (2) determine the statistical contribution of experience of racial discrimination to ethnic disparities in sleep disturbances. The study used data from the 2002/03 New Zealand Health Survey, a nationally-representative, population-based survey of New Zealand adults (≥ 15 years). The sample included 4,108 self-identified Māori (indigenous New Zealanders) and 6,261 European adults. Outcome variables were difficulty falling asleep, frequent nocturnal awakenings, and early morning awakenings. Experiences of racial discrimination across five domains were used to assess overall racial discrimination "ever" and the level of exposure to racial discrimination. Socioeconomic position was measured using neighborhood deprivation, education, and equivalized household income. Māori had a higher prevalence of each sleep disturbance item than Europeans. Reported experiences of racial discrimination were independently associated with each sleep disturbance item, adjusted for ethnicity, sex, age group, and socioeconomic position. Sequential logistic regression models showed that racial discrimination and socioeconomic position explained most of the disparity in difficulty falling asleep and frequent nocturnal awakening between Māori and Europeans; however, ethnic differences in early morning awakenings remained. Racial discrimination may play an important role in ethnic disparities in sleep disturbances in New Zealand. Activities to improve the sleep health of non-dominant ethnic groups should consider the potentially multifarious ways in which racial discrimination can disturb sleep. © 2016 Associated Professional Sleep Societies, LLC.
A New Clinical Pain Knowledge Test for Nurses: Development and Psychometric Evaluation.

PubMed

Bernhofer, Esther I; St Marie, Barbara; Bena, James F

2017-08-01

All nurses care for patients with pain, and pain management knowledge and attitude surveys for nurses have been around since 1987. However, no validated knowledge test exists to measure postlicensure clinicians' knowledge of the core competencies of pain management in current complex patient populations. To develop and test the psychometric properties of an instrument designed to measure pain management knowledge of postlicensure nurses. Psychometric instrument validation. Four large Midwestern U.S. hospitals. Registered nurses employed full time and part time August 2015 to April 2016, aged M = 43.25 years; time as RN, M = 16.13 years. Prospective survey design using e-mail to invite nurses to take an electronic multiple choice pain knowledge test. Content validity of initial 36-item test "very good" (95.1% agreement). Completed tests that met analysis criteria, N = 747. Mean initial test score, 69.4% correct (range 27.8-97.2). After revision/removal of 13 unacceptable questions, mean test score was 50.4% correct (range 8.7-82.6). Initial test item percent difficulty range was 15.2%-98.1%; discrimination values range, 0.03-0.50; final test item percent difficulty range, 17.6%-91.1%, discrimination values range, -0.04 to 1.04. Split-half reliability final test was 0.66. A high decision consistency reliability was identified, with test cut-score of 75%. The final 23-item Clinical Pain Knowledge Test has acceptable discrimination, difficulty, decision consistency, reliability, and validity in the general clinical inpatient nurse population. This instrument will be useful in assessing pain management knowledge of clinical nurses to determine gaps in education, evaluate knowledge after pain management education, and measure research outcomes. Copyright © 2017 American Society for Pain Management Nursing. Published by Elsevier Inc. All rights reserved.
The Impact of Escape Alternative Position Change in Multiple-Choice Test on the Psychometric Properties of a Test and Its Items Parameters

ERIC Educational Resources Information Center

Hamadneh, Iyad Mohammed

2015-01-01

This study aimed at investigating the impact changing of escape alternative position in multiple-choice test on the psychometric properties of a test and it's items parameters (difficulty, discrimination & guessing), and estimation of examinee ability. To achieve the study objectives, a 4-alternative multiple choice type achievement test…
Content Validity and Psychometric Characteristics of the "Knowledge about Older Patients Quiz" for Nurses Using Item Response Theory.

PubMed

Dikken, Jeroen; Hoogerduijn, Jita G; Kruitwagen, Cas; Schuurmans, Marieke J

2016-11-01

To assess the content validity and psychometric characteristics of the Knowledge about Older Patients Quiz (KOP-Q), which measures nurses' knowledge regarding older hospitalized adults and their certainty regarding this knowledge. Cross-sectional. Content validity: general hospitals. Psychometric characteristics: nursing school and general hospitals in the Netherlands. Content validity: 12 nurse specialists in geriatrics. Psychometric characteristics: 107 first-year and 78 final-year bachelor of nursing students, 148 registered nurses, and 20 nurse specialists in geriatrics. Content validity: The nurse specialists rated each item of the initial KOP-Q (52 items) on relevance. Ratings were used to calculate Item-Content Validity Index and average Scale-Content Validity Index (S-CVI/ave) scores. Items with insufficient content validity were removed. Psychometric characteristics: Ratings of students, nurses, and nurse specialists were used to test for different item functioning (DIF) and unidimensionality before item characteristics (discrimination and difficulty) were examined using Item Response Theory. Finally, norm references were calculated and nomological validity was assessed. Content validity: Forty-three items remained after assessing content validity (S-CVI/ave = 0.90). Psychometric characteristics: Of the 43 items, two demonstrating ceiling effects and 11 distorting ability estimates (DIF) were subsequently excluded. Item characteristics were assessed for the remaining 30 items, all of which demonstrated good discrimination and difficulty parameters. Knowledge was positively correlated with certainty about this knowledge. The final 30-item KOP-Q is a valid, psychometrically sound, comprehensive instrument that can be used to assess the knowledge of nursing students, hospital nurses, and nurse specialists in geriatrics regarding older hospitalized adults. It can identify knowledge and certainty deficits for research purposes or serve as a tool in educational or quality improvement programs. © 2016, Copyright the Authors Journal compilation © 2016, The American Geriatrics Society.
Racial Discrimination and Ethnic Disparities in Sleep Disturbance: the 2002/03 New Zealand Health Survey

PubMed Central

Paine, Sarah-Jane; Harris, Ricci; Cormack, Donna; Stanley, James

2016-01-01

Study Objectives: Research on the relationship between racial discrimination and sleep is limited. The aims of this study were to: (1) examine the independent relationship between ethnicity, sex, age, socioeconomic position, experience of racial discrimination and self-reported sleep disturbances, and (2) determine the statistical contribution of experience of racial discrimination to ethnic disparities in sleep disturbances. Methods: The study used data from the 2002/03 New Zealand Health Survey, a nationally-representative, population-based survey of New Zealand adults (≥ 15 years). The sample included 4,108 self-identified Māori (indigenous New Zealanders) and 6,261 European adults. Outcome variables were difficulty falling asleep, frequent nocturnal awakenings, and early morning awakenings. Experiences of racial discrimination across five domains were used to assess overall racial discrimination “ever” and the level of exposure to racial discrimination. Socioeconomic position was measured using neighborhood deprivation, education, and equivalized household income. Results: Māori had a higher prevalence of each sleep disturbance item than Europeans. Reported experiences of racial discrimination were independently associated with each sleep disturbance item, adjusted for ethnicity, sex, age group, and socioeconomic position. Sequential logistic regression models showed that racial discrimination and socioeconomic position explained most of the disparity in difficulty falling asleep and frequent nocturnal awakening between Māori and Europeans; however, ethnic differences in early morning awakenings remained. Conclusions: Racial discrimination may play an important role in ethnic disparities in sleep disturbances in New Zealand. Activities to improve the sleep health of non-dominant ethnic groups should consider the potentially multifarious ways in which racial discrimination can disturb sleep. Citation: Paine SJ, Harris R, Cormack D, Stanley J. Racial discrimination and ethnic disparities in sleep disturbance: the 2002/03 New Zealand Health Survey. SLEEP 2016;39(2):477–485. PMID:26446108
ITEM ANALYSIS OF THREE SPANISH NAMING TESTS: A CROSS-CULTURAL INVESTIGATION

PubMed Central

de la Plata, Carlos Marquez; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C. Munro

2009-01-01

Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test’s construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (126 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided. PMID:19208960
Colorado Learning Difficulties Questionnaire:Validation of a parent-report screening measure

PubMed Central

Willcutt, Erik G.; Boada, Richard; Riddle, Margaret W.; Chhabildas, Nomita; DeFries, John C.; Pennington, Bruce F.

2011-01-01

This study evaluated the internal structure and convergent and discriminant evidence for the Colorado Learning Difficulties Questionnaire (CLDQ), a 20-item parent-report rating scale that was developed to provide a brief screening measure for learning difficulties. CLDQ ratings were obtained from parents of children in two large community samples and two samples from clinics that specialize in the assessment of learning disabilities and related disorders (total N = 8,004). Exploratory and confirmatory factor analyses revealed five correlated but separable dimensions that were labeled reading, math, social cognition, social anxiety, and spatial difficulties. Results revealed strong convergent and discriminant evidence for the CLDQ Reading scale, suggesting that this scale may provide a useful method to screen for reading difficulties in both research studies and clinical settings. Results are also promising for the other four CLDQ scales, but additional research is needed to refine each of these measures. PMID:21574721
Item development process and analysis of 50 case-based items for implementation on the Korean Nursing Licensing Examination.

PubMed

Park, In Sook; Suh, Yeon Ok; Park, Hae Sook; Kang, So Young; Kim, Kwang Sung; Kim, Gyung Hee; Choi, Yeon-Hee; Kim, Hyun-Ju

2017-01-01

The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge. We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory. A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%-100% for 12 items; 60%-80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below -2.0), easy for 8 items (-2.0 to -0.5), medium for 6 items (-0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate. We surmised that the low reliability of the mock examination was influenced by the timing of the test for the examinees and the inappropriate difficulty of the items. Our study suggested a methodology for the development of future case-based items for the Korean Nursing Licensing Examination.
Evaluation of diagnostic criteria for panic attack using item response theory: findings from the National Comorbidity Survey in USA.

PubMed

Ietsugu, Tetsuji; Sukigara, Masune; Furukawa, Toshiaki A

2007-12-01

The dichotomous diagnostic systems such as the Diagnostic and Statistical Manual of Mental Disorders (DSM) and International Classification of Diseases (ICD) lose much important information concerning what each symptom can offer. This study explored the characteristics and performances of DSM-IV and ICD-10 diagnostic criteria items for panic attack using modern item response theory (IRT). The National Comorbidity Survey used the Composite International Diagnostic Interview to assess 14 DSM-IV and ICD-10 panic attack diagnostic criteria items in the general population in the USA. The dimensionality and measurement properties of these items were evaluated using dichotomous factor analysis and the two-parameter IRT model. A total of 1213 respondents reported at least one subsyndromal or syndromal panic attack in their lifetime. Factor analysis indicated that all items constitute a unidimensional construct. The two-parameter IRT model produced meaningful and interpretable results. Among items with high discrimination parameters, the difficulty parameter for "palpitation" was relatively low, while those for "choking," "fear of dying" and "paresthesia" were relatively high. Several items including "dry mouth" and "fear of losing control" had low discrimination parameters. The item characteristics of diagnostic criteria among help-seeking clinical populations may be different from those that we observed in the general population and deserve further examination. "Paresthesia," "choking" and "fear of dying" can be thought to be good indicators of severe panic attacks, while "palpitation" can discriminate well between cases and non-cases at low level of panic attack severity. Items such as "dry mouth" would contribute less to the discrimination.

Visual search among items of different salience: removal of visual attention mimics a lesion in extrastriate area V4.

PubMed

Braun, J

1994-02-01

In more than one respect, visual search for the most salient or the least salient item in a display are different kinds of visual tasks. The present work investigated whether this difference is primarily one of perceptual difficulty, or whether it is more fundamental and relates to visual attention. Display items of different salience were produced by varying either size, contrast, color saturation, or pattern. Perceptual masking was employed and, on average, mask onset was delayed longer in search for the least salient item than in search for the most salient item. As a result, the two types of visual search presented comparable perceptual difficulty, as judged by psychophysical measures of performance, effective stimulus contrast, and stability of decision criterion. To investigate the role of attention in the two types of search, observers attempted to carry out a letter discrimination and a search task concurrently. To discriminate the letters, observers had to direct visual attention at the center of the display and, thus, leave unattended the periphery, which contained target and distractors of the search task. In this situation, visual search for the least salient item was severely impaired while visual search for the most salient item was only moderately affected, demonstrating a fundamental difference with respect to visual attention. A qualitatively identical pattern of results was encountered by Schiller and Lee (1991), who used similar visual search tasks to assess the effect of a lesion in extrastriate area V4 of the macaque.
A Classical Test Theory Analysis of the Light and Spectroscopy Concept Inventory National Study Data Set

ERIC Educational Resources Information Center

Schlingman, Wayne M.; Prather, Edward E.; Wallace, Colin S.; Brissenden, Gina; Rudolph, Alexander L.

2012-01-01

This paper is the first in a series of investigations into the data from the recent national study using the Light and Spectroscopy Concept Inventory (LSCI). In this paper, we use classical test theory to form a framework of results that will be used to evaluate individual item difficulties, item discriminations, and the overall reliability of the…
An Analysis of the Connectedness to Nature Scale Based on Item Response Theory

PubMed Central

Pasca, Laura; Aragonés, Juan I.; Coello, María T.

2017-01-01

The Connectedness to Nature Scale (CNS) is used as a measure of the subjective cognitive connection between individuals and nature. However, to date, it has not been analyzed at the item level to confirm its quality. In the present study, we conduct such an analysis based on Item Response Theory. We employed data from previous studies using the Spanish-language version of the CNS, analyzing a sample of 1008 participants. The results show that seven items presented appropriate indices of discrimination and difficulty, in addition to a good fit. The remaining six have inadequate discrimination indices and do not present a good fit. A second study with 321 participants shows that the seven-item scale has adequate levels of reliability and validity. Therefore, it would be appropriate to use a reduced version of the scale after eliminating the items that display inappropriate behavior, since they may interfere with research results on connectedness to nature. PMID:28824509
CTTITEM: SAS macro and SPSS syntax for classical item analysis.

PubMed

Lei, Pui-Wa; Wu, Qiong

2007-08-01

This article describes the functions of a SAS macro and an SPSS syntax that produce common statistics for conventional item analysis including Cronbach's alpha, item difficulty index (p-value or item mean), and item discrimination indices (D-index, point biserial and biserial correlations for dichotomous items and item-total correlation for polytomous items). These programs represent an improvement over the existing SAS and SPSS item analysis routines in terms of completeness and user-friendliness. To promote routine evaluations of item qualities in instrument development of any scale, the programs are available at no charge for interested users. The program codes along with a brief user's manual that contains instructions and examples are downloadable from suen.ed.psu.edu/-pwlei/plei.htm.
Accommodations for Multiple Choice Tests

ERIC Educational Resources Information Center

Trammell, Jack

2011-01-01

Students with learning or learning-related disabilities frequently struggle with multiple choice assessments due to difficulty discriminating between items, filtering out distracters, and framing a mental best answer. This Practice Brief suggests accommodations and strategies that disability service providers can utilize in conjunction with…
Normative Performance on the Brief Smell Identification Test (BSIT) in a Multi-Ethnic Bilingual Cohort: A Project FRONTIER Study

PubMed Central

Menon, Chloe; Westervelt, Holly James; Jahn, Danielle R.; Dressel, Jeffrey A.; O’Bryant, Sid E.

2013-01-01

The Brief Smell Identification Test (BSIT) is a commonly used measure of olfactory functioning in elderly populations. Few studies have provided normative data for this measure, and minimal data are available regarding the impact of sociodemographic factors on test scores. This study presents normative data for the BSIT in a sample of English- and Spanish-speaking Hispanic and non-Hispanic Whites. A Rasch analysis was also conducted to identify the items that best discriminated between varying levels of olfactory functioning, as measured by the BSIT. The total sample included 302 older adults seen as part of an ongoing study of rural cognitive aging, Project FRONTIER. Hierarchical regression analyses revealed that BSIT scores require adjustment by age and gender, but years of education, ethnicity, and language did not significantly influence BSIT performance. Four items best discriminated between varying levels of smell identification, accounting for 59.44% of total information provided by the measure. However, items did not represent a continuum of difficulty on the BSIT. The results of this study indicate that the BSIT appears to be well-suited for assessing odor identification deficits in older adults of diverse backgrounds, but that fine-tuning of this instrument may be recommended in light of its items’ difficulty and discrimination parameters. Clinical and empirical implications are discussed. PMID:23634698
Psychometric Properties of the Heart Disease Knowledge Scale: Evidence from Item and Confirmatory Factor Analyses

PubMed Central

Lim, Bee Chiu; Kueh, Yee Cheng; Arifin, Wan Nor; Ng, Kok Huan

2016-01-01

Background Heart disease knowledge is an important concept for health education, yet there is lack of evidence on proper validated instruments used to measure levels of heart disease knowledge in the Malaysian context. Methods A cross-sectional, survey design was conducted to examine the psychometric properties of the adapted English version of the Heart Disease Knowledge Questionnaire (HDKQ). Using proportionate cluster sampling, 788 undergraduate students at Universiti Sains Malaysia, Malaysia, were recruited and completed the HDKQ. Item analysis and confirmatory factor analysis (CFA) were used for the psychometric evaluation. Construct validity of the measurement model was included. Results Most of the students were Malay (48%), female (71%), and from the field of science (51%). An acceptable range was obtained with respect to both the difficulty and discrimination indices in the item analysis results. The difficulty index ranged from 0.12–0.91 and a discrimination index of ≥ 0.20 were reported for the final retained 23 items. The final CFA model showed an adequate fit to the data, yielding a 23-item, one-factor model [weighted least squares mean and variance adjusted scaled chi-square difference = 1.22, degrees of freedom = 2, P-value = 0.544, the root mean square error of approximation = 0.03 (90% confidence interval = 0.03, 0.04); close-fit P-value = > 0.950]. Conclusion Adequate psychometric values were obtained for Malaysian undergraduate university students using the 23-item, one-factor model of the adapted HDKQ. PMID:27660543
Psychometric Properties of the Heart Disease Knowledge Scale: Evidence from Item and Confirmatory Factor Analyses.

PubMed

Lim, Bee Chiu; Kueh, Yee Cheng; Arifin, Wan Nor; Ng, Kok Huan

2016-07-01

Heart disease knowledge is an important concept for health education, yet there is lack of evidence on proper validated instruments used to measure levels of heart disease knowledge in the Malaysian context. A cross-sectional, survey design was conducted to examine the psychometric properties of the adapted English version of the Heart Disease Knowledge Questionnaire (HDKQ). Using proportionate cluster sampling, 788 undergraduate students at Universiti Sains Malaysia, Malaysia, were recruited and completed the HDKQ. Item analysis and confirmatory factor analysis (CFA) were used for the psychometric evaluation. Construct validity of the measurement model was included. Most of the students were Malay (48%), female (71%), and from the field of science (51%). An acceptable range was obtained with respect to both the difficulty and discrimination indices in the item analysis results. The difficulty index ranged from 0.12-0.91 and a discrimination index of ≥ 0.20 were reported for the final retained 23 items. The final CFA model showed an adequate fit to the data, yielding a 23-item, one-factor model [weighted least squares mean and variance adjusted scaled chi-square difference = 1.22, degrees of freedom = 2, P-value = 0.544, the root mean square error of approximation = 0.03 (90% confidence interval = 0.03, 0.04); close-fit P-value = > 0.950]. Adequate psychometric values were obtained for Malaysian undergraduate university students using the 23-item, one-factor model of the adapted HDKQ.
Developing a situational judgment test blueprint for assessing the non-cognitive skills of applicants to the University of Utah School of Medicine, the United States

PubMed Central

2015-01-01

Purpose: The situational judgment test (SJT) shows promise for assessing the non-cognitive skills of medical school applicants, but has only been used in Europe. Since the admissions processes and education levels of applicants to medical school are different in the United States and in Europe, it is necessary to obtain validity evidence of the SJT based on a sample of United States applicants. Methods: Ninety SJT items were developed and Kane’s validity framework was used to create a test blueprint. A total of 489 applicants selected for assessment/interview day at the University of Utah School of Medicine during the 2014-2015 admissions cycle completed one of five SJTs, which assessed professionalism, coping with pressure, communication, patient focus, and teamwork. Item difficulty, each item’s discrimination index, internal consistency, and the categorization of items by two experts were used to create the test blueprint. Results: The majority of item scores were within an acceptable range of difficulty, as measured by the difficulty index (0.50-0.85) and had fair to good discrimination. However, internal consistency was low for each domain, and 63% of items appeared to assess multiple domains. The concordance of categorization between the two educational experts ranged from 24% to 76% across the five domains. Conclusion: The results of this study will help medical school admissions departments determine how to begin constructing a SJT. Further testing with a more representative sample is needed to determine if the SJT is a useful assessment tool for measuring the non-cognitive skills of medical school applicants. PMID:26582629
Item response theory analysis of the mechanics baseline test

NASA Astrophysics Data System (ADS)

Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.

2012-02-01

Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.
Psychometric properties of DSM assessments of illicit drug abuse and dependence: results from the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC).

PubMed

Lynskey, M T; Agrawal, A

2007-09-01

DSM-IV criteria for illicit drug abuse and dependence are largely based on criteria developed for alcohol use disorders and there is a lack of research evidence on the psychometric properties of these symptoms when applied to illicit drugs. This study utilizes data on abuse/dependence criteria for cannabis, cocaine, stimulants, sedatives, tranquilizers, opiates, hallucinogens and inhalants from the National Epidemiological Survey on Alcohol and Related Conditions (NESARC, n=43 093). Analyses included factor analysis to explore the dimensionality of illicit drug abuse and dependence criteria, calculation of item difficulty and discrimination within an item response framework and a descriptive analysis of 'diagnostic orphans': individuals meeting criteria for 1-2 dependence symptoms but not abuse. Rates of psychiatric disorders were compared across groups. Results favor a uni-dimensional construct for abuse/dependence on each of the eight drug classes. Factor loadings, item difficulty and discrimination were remarkably consistent across drug categories. For each drug category, between 29% and 51% of all individuals meeting criteria for at least one symptom did not receive a formal diagnosis of either abuse or dependence and were therefore classified as 'orphans'. Mean rates of disorder in these individuals suggested that illicit drug use disorders may be more adequately described along a spectrum of severity. While there were remarkable similarities across categories of illicit drugs, consideration of item difficulty suggested that some alterations to DSM regarding the relevant severity of specific abuse and dependence criteria may be warranted.
Development and Psychometric Properties of the Instrumental Activities of Daily Living: Compensation Scale

PubMed Central

Schmitter-Edgecombe, Maureen; Parsey, Carolyn; Lamb, Richard

2014-01-01

The Instrumental Activities of Daily Living – Compensation (IADL-C) scale was developed to capture early functional difficulties and to quantify compensatory strategy use that may mitigate functional decline in the aging population. The IADL-C was validated in a sample of cognitively healthy older adults (N=184) and individuals with mild cognitive impairment (MCI; N=92) and dementia (N=24). Factor analysis and Rasch item analysis led to the 27-item IADL-C informant questionnaire with four functional domain subscales (money and self-management, home daily living, travel and event memory, and social skills). The subscales demonstrated good internal consistency (Rasch reliability 0.80 to 0.93) and test-retest reliability (Spearman coefficients 0.70 to 0.91). The IADL-C total score and subscales showed convergent validity with other IADL measures, discriminant validity with psychosocial measures, and the ability to discriminate between diagnostic groups. The money and self management subscale showed notable difficulties for individuals with MCI, whereas difficulties with home daily living became more prominent for dementia participants. Compensatory strategy use increased in the MCI group and decreased in the dementia group. PMID:25344901
The EvoDevoCI: A Concept Inventory for Gauging Students’ Understanding of Evolutionary Developmental Biology

PubMed Central

Perez, Kathryn E.; Hiatt, Anna; Davis, Gregory K.; Trujillo, Caleb; French, Donald P.; Terry, Mark; Price, Rebecca M.

2013-01-01

The American Association for the Advancement of Science 2011 report Vision and Change in Undergraduate Biology Education encourages the teaching of developmental biology as an important part of teaching evolution. Recently, however, we found that biology majors often lack the developmental knowledge needed to understand evolutionary developmental biology, or “evo-devo.” To assist in efforts to improve evo-devo instruction among undergraduate biology majors, we designed a concept inventory (CI) for evolutionary developmental biology, the EvoDevoCI. The CI measures student understanding of six core evo-devo concepts using four scenarios and 11 multiple-choice items, all inspired by authentic scientific examples. Distracters were designed to represent the common conceptual difficulties students have with each evo-devo concept. The tool was validated by experts and administered at four institutions to 1191 students during preliminary (n = 652) and final (n = 539) field trials. We used student responses to evaluate the readability, difficulty, discriminability, validity, and reliability of the EvoDevoCI, which included items ranging in difficulty from 0.22–0.55 and in discriminability from 0.19–0.38. Such measures suggest the EvoDevoCI is an effective tool for assessing student understanding of evo-devo concepts and the prevalence of associated common conceptual difficulties among both novice and advanced undergraduate biology majors. PMID:24297293
Incorporation of core competency questions into an annual national self-assessment examination for residents in physical medicine and rehabilitation: results and implications.

PubMed

Webster, Joseph B

2009-03-01

To determine the performance and change over time when incorporating questions in the core competency domains of practice-based learning and improvement (PBLI), systems-based practice (SBP), and professionalism (PROF) into the national PM&R Self-Assessment Examination for Residents (SAER). Prospective, longitudinal analysis. The national Self-Assessment Examination for Residents (SAER) in Physical Medicine and Rehabilitation, which is administered annually. Approximately 1100 PM&R residents who take the examination annually. Inclusion of progressively more challenging questions in the core competency domains of PBLI, SBP, and PROF. Individual test item level of difficulty (P value) and discrimination (point biserial index). Compared with the overall test, questions in the subtopic areas of PBLI, SBP, and PROF were relatively easier and less discriminating (correlation of resident performance on these domains compared with that on the total test). These differences became smaller during the 3-year time period. The difficulty level of the questions in each of the subtopic domains was raised during the 3 year period to a level close to the overall exam. Discrimination of the test items improved or remained stable. This study demonstrates that, with careful item writing and review, multiple-choice items in the PBLI, SBP, and PROF domains can be successfully incorporated into an annual, national self-assessment examination for residents. The addition of these questions had value in assessing competency while not compromising the overall validity and reliability of the exam. It is yet to be determined if resident performance on these questions corresponds to performance on other measures of competency in the areas of PBLI, SBP, and PROF.
Internet-based survey of factors associated with subjective feeling of insomnia, depression, and low health-related quality of life among Japanese adults with sleep difficulty.

PubMed

Aritake, Sayaka; Asaoka, Shoichi; Kagimura, Tatsuo; Shimura, Akiyoshi; Futenma, Kunihiro; Komada, Yoko; Inoue, Yuichi

2015-04-01

This study was conducted to determine what symptom components or conditions of insomnia are related to subjective feelings of insomnia, low health-related quality of life (HRQOL), or depression. Data from 7,027 Japanese adults obtained using an Internet-based questionnaire survey was analyzed to examine associations between demographic variables and each sleep difficulty symptom item on the Pittsburgh Sleep Quality Index (PSQI) with the presence/absence of subjective insomnia and scores on the Short Form-8 (SF-8) and Center for Epidemiologic Studies Depression Scale (CES-D). Prevalence of subjective insomnia was 12.2% (n = 860). Discriminant function analysis revealed that item scores for sleep quality, sleep latency, and sleep medication use on the PSQI and CES-D showed relatively high discriminant function coefficients for identifying positivity for the subjective feeling of insomnia. Among respondents with subjective insomnia, a low SF-8 physical component summary score was associated with higher age, depressive state, and PSQI items for sleep difficulty and daytime dysfunction, whereas a low SF-8 mental component summary score was associated with depressive state, PSQI sleep latency, sleeping medication use, and daytime dysfunction. Depressive state was significantly associated with sleep latency, sleeping medication use, and daytime dysfunction. Among insomnia symptom components, disturbed sleep quality and sleep onset insomnia may be specifically associated with subjective feelings of the disorder. The existence of a depressive state could be significantly associated with not only subjective insomnia but also mental and physical QOL. Our results also suggest that different components of sleep difficulty, as measured by the PSQI, might be associated with mental and physical QOL and depressive status.
KIDMAP--A Diagnostic Tool for Teachers.

ERIC Educational Resources Information Center

Lee, Yew Jin; Linacre, John M.; Yeoh, Oon Chye

While assessment is the bread and butter of the teaching profession, its practitioners usually do not extend analysis of test responses beyond simple measures such as facility or discrimination indices in classical test theory. Item response theory (IRT) has much to offer but its nonintuitive content and difficulty make it a formidable obstacle in…
Psychometrics of Multiple Choice Questions with Non-Functioning Distracters: Implications to Medical Education.

PubMed

Deepak, Kishore K; Al-Umran, Khalid Umran; AI-Sheikh, Mona H; Dkoli, B V; Al-Rubaish, Abdullah

2015-01-01

The functionality of distracters in a multiple choice question plays a very important role. We examined the frequency and impact of functioning and non-functioning distracters on psychometric properties of 5-option items in clinical disciplines. We analyzed item statistics of 1115 multiple choice questions from 15 summative assessments of undergraduate medical students and classified the items into five groups by their number of non-functioning distracters. We analyzed the effect of varying degree of non-functionality ranging from 0 to 4, on test reliability, difficulty index, discrimination index and point biserial correlation. The non-functionality of distracters inversely affected the test reliability and quality of items in a predictable manner. The non-functioning distracters made the items easier and lowered the discrimination index significantly. Three non-functional distracters in a 5-option MCQ significantly affected all psychometric properties (p < 0.5). The corrected point biserial correlation revealed that the items with 3 functional options were psychometrically as effective as 5-option items. Our study reveals that a multiple choice question with 3 functional options provides lower most limit of item format that has adequate psychometric property. The test containing items with less number of functioning options have significantly lower reliability. The distracter function analysis and revision of nonfunctioning distracters can serve as important methods to improve the psychometrics and reliability of assessment.
[Instrument to measure adherence in hypertensive patients: contribution of Item Response Theory].

PubMed

Rodrigues, Malvina Thaís Pacheco; Moreira, Thereza Maria Magalhaes; Vasconcelos, Alexandre Meira de; Andrade, Dalton Francisco de; Silva, Daniele Braz da; Barbetta, Pedro Alberto

2013-06-01

To analyze, by means of "Item Response Theory", an instrument to measure adherence to t treatment for hypertension. Analytical study with 406 hypertensive patients with associated complications seen in primary care in Fortaleza, CE, Northeastern Brazil, 2011 using "Item Response Theory". The stages were: dimensionality test, calibrating the items, processing data and creating a scale, analyzed using the gradual response model. A study of the dimensionality of the instrument was conducted by analyzing the polychoric correlation matrix and factor analysis of complete information. Multilog software was used to calibrate items and estimate the scores. Items relating to drug therapy are the most directly related to adherence while those relating to drug-free therapy need to be reworked because they have less psychometric information and low discrimination. The independence of items, the small number of levels in the scale and low explained variance in the adjustment of the models show the main weaknesses of the instrument analyzed. The "Item Response Theory" proved to be a relevant analysis technique because it evaluated respondents for adherence to treatment for hypertension, the level of difficulty of the items and their ability to discriminate between individuals with different levels of adherence, which generates a greater amount of information. The instrument analyzed is limited in measuring adherence to hypertension treatment, by analyzing the "Item Response Theory" of the item, and needs adjustment. The proper formulation of the items is important in order to accurately measure the desired latent trait.
The Standardization of the Concepts about Print into Greek

ERIC Educational Resources Information Center

Tafa, Eufimia

2009-01-01

The purpose of this study was to translate and standardize Concepts About Print (C.A.P.) into Greek, and to assess its psychometric properties. Particularly, this study evaluated the reliability and validity of the Greek version of C.A.P., and item difficulty and discrimination index and examined whether there were differences between boys and…
On Interpreting the Model Parameters for the Three Parameter Logistic Model

ERIC Educational Resources Information Center

Maris, Gunter; Bechger, Timo

2009-01-01

This paper addresses two problems relating to the interpretability of the model parameters in the three parameter logistic model. First, it is shown that if the values of the discrimination parameters are all the same, the remaining parameters are nonidentifiable in a nontrivial way that involves not only ability and item difficulty, but also the…

Modeling the Psychometric Properties of Complex Performance Assessment Tasks Using Confirmatory Factor Analysis: A Multistage Model for Calibrating Tasks

ERIC Educational Resources Information Center

Kahraman, Nilufer; De Champlain, Andre; Raymond, Mark

2012-01-01

Item-level information, such as difficulty and discrimination are invaluable to the test assembly, equating, and scoring practices. Estimating these parameters within the context of large-scale performance assessments is often hindered by the use of unbalanced designs for assigning examinees to tasks and raters because such designs result in very…
Examination of the item structure of the Alberta infant motor scale.

PubMed

Liao, Pai-Jun M; Campbell, Suzann K

2004-01-01

The Alberta Infant Motor Scale (AIMS) is a screening tool for identifying delayed motor development from birth to 18 months of age. The purpose of this study was to examine the psychometric structure of the AIMS, including the hierarchical scale of items and the precision for measuring infant ability at different ages. Ninety-seven infants with varying degrees of risk of developmental disability were recruited from three hospitals or from the community in the Chicago metropolitan area. Infants were tested on the AIMS at three, six, nine, and 12 months of age. The hierarchical structure and the range and distribution of item difficulty on the AIMS were analyzed using Rasch psychometric analysis. The Rasch analysis confirmed that items for each of the four testing positions (supine, prone, sitting, and standing) were arranged in increasing order of difficulty, but a ceiling effect was present. Gaps exist at six ability levels, indicating low precision of measurement for differentiating among infants after about nine months of age. The AIMS shows a ceiling effect, measures infant ability best from three to nine months of age, and has few items available for discriminating among infants after they pass the controlled lowering through standing item. Clinical impressions should be drawn with caution at ages when the precision of measurement is low.
Development of multiple choice pictorial test for measuring the dimensions of knowledge

NASA Astrophysics Data System (ADS)

Nahadi, Siswaningsih, Wiwi; Erna

2017-05-01

This study aims to develop a multiple choice pictorial test as a tool to measure dimension of knowledge in chemical equilibrium subject. The method used is Research and Development and validation that was conducted in the preliminary studies and model development. The product is multiple choice pictorial test. The test was developed by 22 items and tested to 64 high school students in XII grade. The quality of test was determined by value of validity, reliability, difficulty index, discrimination power, and distractor effectiveness. The validity of test was determined by CVR calculation using 8 validators (4 university teachers and 4 high school teachers) with average CVR value 0,89. The reliability of test has very high category with value 0,87. Discrimination power of items with a very good category is 32%, 59% as good category, and 20% as sufficient category. This test has a varying level of difficulty, item with difficult category is 23%, the medium category is 50%, and the easy category is 27%. The distractor effectiveness of items with a very poor category is 1%, poor category is 1%, medium category is 4%, good category is 39%, and very good category is 55%. The dimension of knowledge that was measured consist of factual knowledge, conceptual knowledge, and procedural knowledge. Based on the questionnaire, students responded quite well to the developed test and most of the students like this kind of multiple choice pictorial test that include picture as evaluation tool compared to the naration tests was dominated by text.
Extending item response theory to online homework

NASA Astrophysics Data System (ADS)

Kortemeyer, Gerd

2014-06-01

Item response theory (IRT) becomes an increasingly important tool when analyzing "big data" gathered from online educational venues. However, the mechanism was originally developed in traditional exam settings, and several of its assumptions are infringed upon when deployed in the online realm. For a large-enrollment physics course for scientists and engineers, the study compares outcomes from IRT analyses of exam and homework data, and then proceeds to investigate the effects of each confounding factor introduced in the online realm. It is found that IRT yields the correct trends for learner ability and meaningful item parameters, yet overall agreement with exam data is moderate. It is also found that learner ability and item discrimination is robust over a wide range with respect to model assumptions and introduced noise. Item difficulty is also robust, but over a narrower range.
The second version of the L. V. Prasad-functional vision questionnaire.

PubMed

Gothwal, Vijaya K; Sumalini, Rebecca; Bharani, Seelam; Reddy, Shailaja P; Bagga, Deepak K

2012-11-01

The L. V. Prasad-Functional Vision Questionnaire (LVP-FVQ) was developed using Rasch analysis to assess self-reported difficulties in performing daily tasks in school children with visual impairment (VI) in India. However, the LVP-FVQ has psychometric problems of inadequate measurement precision and lack of detailed assessment of dimensionality. Furthermore, items pertaining to use of technology are lacking. The aim of this study was to present the development and validation of the second version of LVP-FVQ (LVP-FVQ II). Development of LVP-FVQ II involved extracting items from other similar questionnaires (albeit developed for Western populations) and focus group discussions of children with VI and their parents that resulted in a 32-item pilot questionnaire. Overall, six items from the LVP-FVQ were retained. The questionnaire underwent pilot testing in 25 such children, following which a 27-item LVP-FVQ II emerged, and this was administered to 150 children with VI. Response to each item was rated on a three-category scale. Rasch analysis was used to validate the LVP-FVQ II. Rating scale was used by participants as was intended to. Four mobility-related items required deletion, as these did not contribute toward measurement of a single construct, indicating a secondary dimension. Deletion of the four items resulted in the 23-item unidimensional LVP-FVQ II, with good measurement precision, effective targeting of item difficulty to participant ability, and lack of notable differential item functioning. The LVP-FVQ II has high reliability, indicating that it is effectively able to discriminate between visual disability of school children in India, and is valid across age, gender, duration of VI, and location of residence. Given the superior measurement properties and the interval-level scores, the LVP-FVQ II appears to offer advantages over LVP-FVQ in assessment of difficulties in performing daily tasks in this population. It can be adapted for use in other developing countries.
Development and evaluation of the Korean Health Literacy Instrument.

PubMed

Kang, Soo Jin; Lee, Tae Wha; Paasche-Orlow, Michael K; Kim, Gwang Suk; Won, Hee Kwan

2014-01-01

The purpose of this study is to develop and validate the Korean Health Literacy Instrument, which measures the capacity to understand and use health-related information and make informed health decisions in Korean adults. In Phase 1, 33 initial items were generated to measure functional, interactive, and critical health literacy with prose, document, and numeracy tasks. These items included content from health promotion, disease management, and health navigation contexts. Content validity assessment was conducted by an expert panel, and 11 items were excluded. In Phase 2, the 22 remaining items were administered to a convenience sample of 292 adults from community and clinical settings. Exploratory factor and item difficulty and discrimination analyses were conducted and four items with low discrimination were deleted. In Phase 3, the remaining 18 items were administered to a convenience sample of 315 adults 40-64 years of age from community and clinical settings. A confirmatory factor analysis was performed to test the construct validity of the instrument. The Korean Health Literacy Instrument has a range of 0 to 18. The mean score in our validation study was 11.98. The instrument exhibited an internal consistency reliability coefficient of 0.82, and a test-retest reliability of 0.89. The instrument is suitable for screening individuals who have limited health literacy skills. Future studies are needed to further define the psychometric properties and predictive validity of the Korean Health Literacy Instrument.
Response pattern of depressive symptoms among college students: What lies behind items of the Beck Depression Inventory-II?

PubMed

de Sá Junior, Antonio Reis; de Andrade, Arthur Guerra; Andrade, Laura Helena; Gorenstein, Clarice; Wang, Yuan-Pang

2018-07-01

This study examines the response pattern of depressive symptoms in a nationwide student sample, through item analyses of a rating scale by both classical test theory (CTT) and item response theory (IRT). The 21-item Beck Depression Inventory-II (BDI-II) was administered to 12,711 college students. First, the psychometric properties of the scale were described. Thereafter, the endorsement probability of depressive symptom in each scale item was analyzed through CTT and IRT. Graphical plots depicted the endorsement probability of scale items and intensity of depression. Three items of different difficulty level were compared through CTT and IRT approach. Four in five students reported the presence of depressive symptoms. The BDI-II items presented good reliability and were distributed along the symptomatic continuum of depression. Similarly, in both CTT and IRT approaches, the item 'changes in sleep' was easily endorsed, 'loss of interest' moderately and 'suicidal thoughts' hardly. Graphical representation of BDI-II of both methods showed much equivalence in terms of item discrimination and item difficulty. The item characteristic curve of the IRT method provided informative evaluation of item performance. The inventory was applied only in college students. Depressive symptoms were frequent psychopathological manifestations among college students. The performance of the BDI-II items indicated convergent results from both methods of analysis. While the CTT was easy to understand and to apply, the IRT was more complex to understand and to implement. Comprehensive assessment of the functioning of each BDI-II item might be helpful in efficient detection of depressive conditions in college students. Copyright © 2018 Elsevier B.V. All rights reserved.
The promise and challenge of including multimedia items in medical licensure examinations: some insights from an empirical trial.

PubMed

Shen, Linjun; Li, Feiming; Wattleworth, Roberta; Filipetto, Frank

2010-10-01

The Comprehensive Osteopathic Medical Licensing Examination conducted a trial of multimedia items in the 2008-2009 Level 3 testing cycle to determine (1) if multimedia items were able to test additional elements of medical knowledge and skills and (2) how to develop effective multimedia items. Forty-four content-matched multimedia and text multiple-choice items were randomly delivered to Level 3 candidates. Logistic regression and paired-samples t tests were used for pairwise and group-level comparisons, respectively. Nine pairs showed significant differences in either difficulty or/and discrimination. Content analysis found that, if text narrations were less direct, multimedia materials could make items easier. When textbook terminologies were replaced by multimedia presentations, multimedia items could become more difficult. Moreover, a multimedia item was found not uniformly difficult for candidates at different ability levels, possibly because multimedia and text items tested different elements of a same concept. Multimedia items may be capable of measuring some constructs different from what text items can measure. Effective multimedia items with reasonable psychometric properties can be intentionally developed.
A Psychometric Review of Measures Assessing Discrimination Against Sexual Minorities.

PubMed

Morrison, Todd G; Bishop, C J; Morrison, Melanie A; Parker-Taneo, Kandice

2016-08-01

Discrimination against sexual minorities is widespread and has deleterious consequences on victims' psychological and physical wellbeing. However, a review of the psychometric properties of instruments measuring lesbian, gay, and bisexual (LGB) discrimination has not been conducted. The results of this review, which involved evaluating 162 articles, reveal that most have suboptimal psychometric properties. Specifically, myriad scales possess questionable content validity as (1) items are not created in collaboration with sexual minorities; (2) measures possess a small number of items and, thus, may not sufficiently represent the domain of interest; and (3) scales are "adapted" from measures designed to examine race- and gender-based discrimination. Additional limitations include (1) summed scores are computed, often in the absence of scale score reliability metrics; (2) summed scores operate from the questionable assumption that diverse forms of discrimination are necessarily interrelated; (3) the dimensionality of instruments presumed to consist of subscales is seldom tested; (4) tests of criterion-related validity are routinely omitted; and (5) formal tests of measures' construct validity are seldom provided, necessitating that one infer validity based on the results obtained. The absence of "gold standard" measures, the attendant difficulty in formulating a coherent picture of this body of research, and suggestions for psychometric improvements are noted.
Effects of Test Level Discrimination and Difficulty on Answer-Copying Indices

ERIC Educational Resources Information Center

Sunbul, Onder; Yormaz, Seha

2018-01-01

In this study Type I Error and the power rates of omega (?) and GBT (generalized binomial test) indices were investigated for several nominal alpha levels and for 40 and 80-item test lengths with 10,000-examinee sample size under several test level restrictions. As a result, Type I error rates of both indices were found to be below the acceptable…
Assessing adolescents' personality with the NEO PI-R.

PubMed

De Fruyt, F; Mervielde, I; Hoekstra, H A; Rolland, J P

2000-12-01

The suitability of the Revised NEO Personality Inventory (NEO PI-R) to assess adolescents' personality traits was investigated in an unselected heterogeneous sample of 469 adolescents aged 12 to 17 years. They were further administered the Hierarchical Personality Inventory for Children (HiPIC) to allow an examination of convergent and discriminant validity. The adult NEO PI-R factor structure proved to be highly replicable in the sample of adolescents, with all facet scales primarily loading on the expected factors, independent of the age group. Domain and facet internal consistency coefficients were comparable to those obtained in adult samples, with less than 12% of the items showing corrected item-facet correlations below absolute value .20. Although, in general, adolescents reported few difficulties with the comprehensibility of the items, they tend to report more problems with the Openness to Ideas (05) and Openness to Values (06) items. Correlations between NEO PI-R and HiPIC scales underscored the convergent and discriminant validity of the NEO facets and HiPIC scales. It was concluded that the NEO PI-R in its present form is useful for assessing adolescents' traits at the primary level, but additional research is necessary to infer the most appropriate facet level structure.
The Utrecht questionnaire (U-CEP) measuring knowledge on clinical epidemiology proved to be valid.

PubMed

Kortekaas, Marlous F; Bartelink, Marie-Louise E L; de Groot, Esther; Korving, Helen; de Wit, Niek J; Grobbee, Diederick E; Hoes, Arno W

2017-02-01

Knowledge on clinical epidemiology is crucial to practice evidence-based medicine. We describe the development and validation of the Utrecht questionnaire on knowledge on Clinical epidemiology for Evidence-based Practice (U-CEP); an assessment tool to be used in the training of clinicians. The U-CEP was developed in two formats: two sets of 25 questions and a combined set of 50. The validation was performed among postgraduate general practice (GP) trainees, hospital trainees, GP supervisors, and experts. Internal consistency, internal reliability (item-total correlation), item discrimination index, item difficulty, content validity, construct validity, responsiveness, test-retest reliability, and feasibility were assessed. The questionnaire was externally validated. Internal consistency was good with a Cronbach alpha of 0.8. The median item-total correlation and mean item discrimination index were satisfactory. Both sets were perceived as relevant to clinical practice. Construct validity was good. Both sets were responsive but failed on test-retest reliability. One set took 24 minutes and the other 33 minutes to complete, on average. External GP trainees had comparable results. The U-CEP is a valid questionnaire to assess knowledge on clinical epidemiology, which is a prerequisite for practicing evidence-based medicine in daily clinical practice. Copyright © 2016 Elsevier Inc. All rights reserved.
The Necessity of the Medial Temporal Lobe for Statistical Learning

PubMed Central

Schapiro, Anna C.; Gregory, Emma; Landau, Barbara; McCloskey, Michael; Turk-Browne, Nicholas B.

2014-01-01

The sensory input that we experience is highly patterned, and we are experts at detecting these regularities. Although the extraction of such regularities, or statistical learning (SL), is typically viewed as a cortical process, recent studies have implicated the medial temporal lobe (MTL), including the hippocampus. These studies have employed fMRI, leaving open the possibility that the MTL is involved but not necessary for SL. Here, we examined this issue in a case study of LSJ, a patient with complete bilateral hippocampal loss and broader MTL damage. In Experiments 1 and 2, LSJ and matched control participants were passively exposed to a continuous sequence of shapes, syllables, scenes, or tones containing temporal regularities in the co-occurrence of items. In a subsequent test phase, the control groups exhibited reliable SL in all conditions, successfully discriminating regularities from recombinations of the same items into novel foil sequences. LSJ, however, exhibited no SL, failing to discriminate regularities from foils. Experiment 3 ruled out more general explanations for this failure, such as inattention during exposure or difficulty following test instructions, by showing that LSJ could discriminate which individual items had been exposed. These findings provide converging support for the importance of the MTL in extracting temporal regularities. PMID:24456393
Some factors underlying individual differences in speech recognition on PRESTO: a first report.

PubMed

Tamati, Terrin N; Gilbert, Jaimie L; Pisoni, David B

2013-01-01

Previous studies investigating speech recognition in adverse listening conditions have found extensive variability among individual listeners. However, little is currently known about the core underlying factors that influence speech recognition abilities. To investigate sensory, perceptual, and neurocognitive differences between good and poor listeners on the Perceptually Robust English Sentence Test Open-set (PRESTO), a new high-variability sentence recognition test under adverse listening conditions. Participants who fell in the upper quartile (HiPRESTO listeners) or lower quartile (LoPRESTO listeners) on key word recognition on sentences from PRESTO in multitalker babble completed a battery of behavioral tasks and self-report questionnaires designed to investigate real-world hearing difficulties, indexical processing skills, and neurocognitive abilities. Young, normal-hearing adults (N = 40) from the Indiana University community participated in the current study. Participants' assessment of their own real-world hearing difficulties was measured with a self-report questionnaire on situational hearing and hearing health history. Indexical processing skills were assessed using a talker discrimination task, a gender discrimination task, and a forced-choice regional dialect categorization task. Neurocognitive abilities were measured with the Auditory Digit Span Forward (verbal short-term memory) and Digit Span Backward (verbal working memory) tests, the Stroop Color and Word Test (attention/inhibition), the WordFam word familiarity test (vocabulary size), the Behavioral Rating Inventory of Executive Function-Adult Version (BRIEF-A) self-report questionnaire on executive function, and two performance subtests of the Wechsler Abbreviated Scale of Intelligence (WASI) Performance Intelligence Quotient (IQ; nonverbal intelligence). Scores on self-report questionnaires and behavioral tasks were tallied and analyzed by listener group (HiPRESTO and LoPRESTO). The extreme groups did not differ overall on self-reported hearing difficulties in real-world listening environments. However, an item-by-item analysis of questions revealed that LoPRESTO listeners reported significantly greater difficulty understanding speakers in a public place. HiPRESTO listeners were significantly more accurate than LoPRESTO listeners at gender discrimination and regional dialect categorization, but they did not differ on talker discrimination accuracy or response time, or gender discrimination response time. HiPRESTO listeners also had longer forward and backward digit spans, higher word familiarity ratings on the WordFam test, and lower (better) scores for three individual items on the BRIEF-A questionnaire related to cognitive load. The two groups did not differ on the Stroop Color and Word Test or either of the WASI performance IQ subtests. HiPRESTO listeners and LoPRESTO listeners differed in indexical processing abilities, short-term and working memory capacity, vocabulary size, and some domains of executive functioning. These findings suggest that individual differences in the ability to encode and maintain highly detailed episodic information in speech may underlie the variability observed in speech recognition performance in adverse listening conditions using high-variability PRESTO sentences in multitalker babble. American Academy of Audiology.
Some Factors Underlying Individual Differences in Speech Recognition on PRESTO: A First Report

PubMed Central

Tamati, Terrin N.; Gilbert, Jaimie L.; Pisoni, David B.

2013-01-01

Background Previous studies investigating speech recognition in adverse listening conditions have found extensive variability among individual listeners. However, little is currently known about the core, underlying factors that influence speech recognition abilities. Purpose To investigate sensory, perceptual, and neurocognitive differences between good and poor listeners on PRESTO, a new high-variability sentence recognition test under adverse listening conditions. Research Design Participants who fell in the upper quartile (HiPRESTO listeners) or lower quartile (LoPRESTO listeners) on key word recognition on sentences from PRESTO in multitalker babble completed a battery of behavioral tasks and self-report questionnaires designed to investigate real-world hearing difficulties, indexical processing skills, and neurocognitive abilities. Study Sample Young, normal-hearing adults (N = 40) from the Indiana University community participated in the current study. Data Collection and Analysis Participants’ assessment of their own real-world hearing difficulties was measured with a self-report questionnaire on situational hearing and hearing health history. Indexical processing skills were assessed using a talker discrimination task, a gender discrimination task, and a forced-choice regional dialect categorization task. Neurocognitive abilities were measured with the Auditory Digit Span Forward (verbal short-term memory) and Digit Span Backward (verbal working memory) tests, the Stroop Color and Word Test (attention/inhibition), the WordFam word familiarity test (vocabulary size), the BRIEF-A self-report questionnaire on executive function, and two performance subtests of the WASI Performance IQ (non-verbal intelligence). Scores on self-report questionnaires and behavioral tasks were tallied and analyzed by listener group (HiPRESTO and LoPRESTO). Results The extreme groups did not differ overall on self-reported hearing difficulties in real-world listening environments. However, an item-by-item analysis of questions revealed that LoPRESTO listeners reported significantly greater difficulty understanding speakers in a public place. HiPRESTO listeners were significantly more accurate than LoPRESTO listeners at gender discrimination and regional dialect categorization, but they did not differ on talker discrimination accuracy or response time, or gender discrimination response time. HiPRESTO listeners also had longer forward and backward digit spans, higher word familiarity ratings on the WordFam test, and lower (better) scores for three individual items on the BRIEF-A questionnaire related to cognitive load. The two groups did not differ on the Stroop Color and Word Test or either of the WASI performance IQ subtests. Conclusions HiPRESTO listeners and LoPRESTO listeners differed in indexical processing abilities, short-term and working memory capacity, vocabulary size, and some domains of executive functioning. These findings suggest that individual differences in the ability to encode and maintain highly detailed episodic information in speech may underlie the variability observed in speech recognition performance in adverse listening conditions using high-variability PRESTO sentences in multitalker babble. PMID:24047949
Psychometric properties of the neck disability index amongst patients with chronic neck pain using item response theory.

PubMed

Saltychev, Mikhail; Mattie, Ryan; McCormick, Zachary; Laimi, Katri

2017-05-13

The Neck Disability Index (NDI) is commonly used for clinical and research assessment for chronic neck pain, yet the original version of this tool has not undergone significant validity testing, and in particular, there has been minimal assessment using Item Response Theory. The goal of the present study was to investigate the psychometric properties of the original version of the NDI in a large sample of individuals with chronic neck pain by defining its internal consistency, construct structure and validity, and its ability to discriminate between different degrees of functional limitation. This is a cross-sectional cohort study of 585 consecutive patients with chronic neck pain seen in a university hospital rehabilitation clinic. Internal consistency was evaluated using Cronbach's alpha, construct structure was evaluated by exploratory factor analysis, and discrimination ability was determined by Item Response Theory. The NDI demonstrated good internal consistency assessed by Cronbach's alpha (0.87). The exploratory factor analysis identified only one factor with eigenvalue considered significant (cutoff 1.0). When analyzed by Item Response Theory, eight out of 10 items demonstrated almost ideal difficulty parameter estimates. In addition, eight out of 10 items showed high to perfect estimates of discrimination ability (overall range 0.8 to 2.9). Amongst patients with chronic neck pain, the NDI was found to have good internal consistency, have unidimensional properties, and an excellent ability to distinguish patients with different levels of perceived disability. Implications for Rehabilitation The Neck Disability Index has good internal consistency, unidimensional properties, and an excellent ability to distinguish patients with different levels of perceived disability. The Neck Disability Index is recommended for use when selecting patients for rehabilitation, setting rehabilitation goals, and measuring the outcome of intervention.
Analyzing force concept inventory with item response theory

NASA Astrophysics Data System (ADS)

Wang, Jing; Bao, Lei

2010-10-01

Item response theory is a popular assessment method used in education. It rests on the assumption of a probability framework that relates students' innate ability and their performance on test questions. Item response theory transforms students' raw test scores into a scaled proficiency score, which can be used to compare results obtained with different test questions. The scaled score also addresses the issues of ceiling effects and guessing, which commonly exist in quantitative assessment. We used item response theory to analyze the force concept inventory (FCI). Our results show that item response theory can be useful for analyzing physics concept surveys such as the FCI and produces results about the individual questions and student performance that are beyond the capability of classical statistics. The theory yields detailed measurement parameters regarding the difficulty, discrimination features, and probability of correct guess for each of the FCI questions.
Informed choice: understanding knowledge in the context of screening uptake.

PubMed

Michie, Susan; Dormandy, Elizabeth; Marteau, Theresa M

2003-07-01

This study evaluates a scale measuring knowledge about a screening test and investigates the association between knowledge, uptake and attitudes towards screening. One thousand four hundred ninety-nine pregnant women completed the knowledge scale of the multidimensional measure of informed choice (MMIC). Three hundred forty-five of these women and 152 professionals providing antenatal care also rated the importance of the knowledge items. Item characteristic curves show that, with one exception, the knowledge items reflect a spread of difficulty and are able to discriminate between people. All items were seen as essential or helpful by both women and health professionals, with two items seen as particularly important and one as unimportant. There were some differences between health professionals, women with low risk results and women with high risk results. Knowledge was not associated with uptake, attitude, or the extent to which uptake was consistent with women's attitudes towards undergoing the test.
Lawton IADL scale in dementia: can item response theory make it more informative?

PubMed

McGrory, Sarah; Shenkin, Susan D; Austin, Elizabeth J; Starr, John M

2014-07-01

impairment of functional abilities represents a crucial component of dementia diagnosis. Current functional measures rely on the traditional aggregate method of summing raw scores. While this summary score provides a quick representation of a person's ability, it disregards useful information on the item level. to use item response theory (IRT) methods to increase the interpretive power of the Lawton Instrumental Activities of Daily Living (IADL) scale by establishing a hierarchy of item 'difficulty' and 'discrimination'. this cross-sectional study applied IRT methods to the analysis of IADL outcomes. Participants were 202 members of the Scottish Dementia Research Interest Register (mean age = 76.39, range = 56-93, SD = 7.89 years) with complete itemised data available. a Mokken scale with good reliability (Molenaar Sijtsama statistic 0.79) was obtained, satisfying the IRT assumption that the items comprise a single unidimensional scale. The eight items in the scale could be placed on a hierarchy of 'difficulty' (H coefficient = 0.55), with 'Shopping' being the most 'difficult' item and 'Telephone use' being the least 'difficult' item. 'Shopping' was the most discriminatory item differentiating well between patients of different levels of ability. IRT methods are capable of providing more information about functional impairment than a summed score. 'Shopping' and 'Telephone use' were identified as items that reveal key information about a patient's level of ability, and could be useful screening questions for clinicians. © The Author 2013. Published by Oxford University Press on behalf of the British Geriatrics Society. All rights reserved. For Permissions, please email: journals.permissions@ oup.com.
Development of Thermodynamic Conceptual Evaluation

NASA Astrophysics Data System (ADS)

Talaeb, P.; Wattanakasiwich, P.

2010-07-01

This research aims to develop a test for assessing student understanding of fundamental principles in thermodynamics. Misconceptions found from previous physics education research were used to develop the test. Its topics include heat and temperature, the zeroth and the first law of thermodynamics, and the thermodynamics processes. The content validity was analyzed by three physics experts. Then the test was administered to freshmen, sophomores and juniors majored in physics in order to determine item difficulties and item discrimination of the test. A few items were eliminated from the test. Finally, the test will be administered to students taking Physics I course in order to evaluate the effectiveness of Interactive Lecture Demonstrations that will be used for the first time at Chiang Mai University.

The development and validation of a test of science critical thinking for fifth graders.

PubMed

Mapeala, Ruslan; Siew, Nyet Moi

2015-01-01

The paper described the development and validation of the Test of Science Critical Thinking (TSCT) to measure the three critical thinking skill constructs: comparing and contrasting, sequencing, and identifying cause and effect. The initial TSCT consisted of 55 multiple choice test items, each of which required participants to select a correct response and a correct choice of critical thinking used for their response. Data were obtained from a purposive sampling of 30 fifth graders in a pilot study carried out in a primary school in Sabah, Malaysia. Students underwent the sessions of teaching and learning activities for 9 weeks using the Thinking Maps-aided Problem-Based Learning Module before they answered the TSCT test. Analyses were conducted to check on difficulty index (p) and discrimination index (d), internal consistency reliability, content validity, and face validity. Analysis of the test-retest reliability data was conducted separately for a group of fifth graders with similar ability. Findings of the pilot study showed that out of initial 55 administered items, only 30 items with relatively good difficulty index (p) ranged from 0.40 to 0.60 and with good discrimination index (d) ranged within 0.20-1.00 were selected. The Kuder-Richardson reliability value was found to be appropriate and relatively high with 0.70, 0.73 and 0.92 for identifying cause and effect, sequencing, and comparing and contrasting respectively. The content validity index obtained from three expert judgments equalled or exceeded 0.95. In addition, test-retest reliability showed good, statistically significant correlations ([Formula: see text]). From the above results, the selected 30-item TSCT was found to have sufficient reliability and validity and would therefore represent a useful tool for measuring critical thinking ability among fifth graders in primary science.
Three controversies over item disclosure in medical licensure examinations.

PubMed

Park, Yoon Soo; Yang, Eunbae B

2015-01-01

In response to views on public's right to know, there is growing attention to item disclosure - release of items, answer keys, and performance data to the public - in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations - 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure - by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers' right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration.
Accuracy of a Classical Test Theory-Based Procedure for Estimating the Reliability of a Multistage Test. Research Report. ETS RR-17-02

ERIC Educational Resources Information Center

Kim, Sooyeon; Livingston, Samuel A.

2017-01-01

The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…
Constructing objective tests

NASA Astrophysics Data System (ADS)

Aubrecht, Gordon J.; Aubrecht, Judith D.

1983-07-01

True-false or multiple-choice tests can be useful instruments for evaluating student progress. We examine strategies for planning objective tests which serve to test the material covered in science (physics) courses. We also examine strategies for writing questions for tests within a test blueprint. The statistical basis for judging the quality of test items are discussed. Reliability, difficulty, and discrimination indices are defined and examples presented. Our recommendation are rather easily put into practice.
Validation of an instrument for assessing teacher knowledge of basic language constructs of literacy.

PubMed

Binks-Cantrell, Emily; Joshi, R Malatesha; Washburn, Erin K

2012-10-01

Recent national reports have stressed the importance of teacher knowledge in teaching reading. However, in the past, teachers' knowledge of language and literacy constructs has typically been assessed with instruments that are not fully tested for validity. In the present study, an instrument was developed; and its reliability, item difficulty, and item discrimination were computed and examined to identify model fit by applying exploratory factor analysis. Such analyses showed that the instrument demonstrated adequate estimates of reliability in assessing teachers' knowledge of language constructs. The implications for professional development of in-service teachers as well as preservice teacher education are also discussed.
Analysis of Validity and Reliability of the Health Literacy Index for Female Marriage Immigrants (HLI-FMI).

PubMed

Yang, Sook Ja; Chee, Yeon Kyung; An, Jisook; Park, Min Hee; Jung, Sunok

2016-05-01

The purpose of this study was to obtain an independent evaluation of the factor structure of the 12-item Health Literacy Index for Female Marriage Immigrants (HLI-FMI), the first measure for assessing health literacy for FMIs in Korea. Participants were 250 Asian women who migrated from China, Vietnam, and the Philippines to marry. The HLI-FMI was originally developed and administered in Korean, and other questionnaires were translated into participants' native languages. The HLI-FMI consisted of 2 factors: (1) Access-Understand Health Literacy (7 items) and (2) Appraise-Apply Health Literacy (5 items); Cronbach's α = .73. Confirmatory factor analysis indicated adequate fit for the 2-factor model. HLI-FMI scores were positively associated with time since immigration and Korean proficiency. Based on classical test theory and item response theory, strong support was provided for item discrimination and item difficulty. Findings suggested that the HLI-FMI is an easily administered, reliable, and valid scale. © 2016 APJPH.
Development and validity of a questionnaire to test the knowledge of primary care personnel regarding nutrition in obese adolescents.

PubMed

de Pinho, Lucinéia; Moura, Paulo Henrique Tolentino; Silveira, Marise Fagundes; de Botelho, Ana Cristina Carvalho; Caldeira, Antônio Prates

2013-07-18

In light of its epidemic proportions in developed and developing countries, obesity is considered a serious public health issue. In order to increase knowledge concerning the ability of health care professionals in caring for obese adolescents and adopt more efficient preventive and control measures, a questionnaire was developed and validated to assess non-dietitian health professionals regarding their Knowledge of Nutrition in Obese Adolescents (KNOA). The development and evaluation of a questionnaire to assess the knowledge of primary care practitioners with respect to nutrition in obese adolescents was carried out in five phases, as follows: 1) definition of study dimensions 2) development of 42 questions and preliminary evaluation of the questionnaire by a panel of experts; 3) characterization and selection of primary care practitioners (35 dietitians and 265 non-dietitians) and measurement of questionnaire criteria by contrasting the responses of dietitians and non-dietitians; 4) reliability assessment by question exclusion based on item difficulty (too easy and too difficult for non-dietitian practitioners), item discrimination, internal consistency and reproducibility index determination; and 5) scoring the completed questionnaires. Dietitians obtained higher scores than non-dietitians (Mann-Whitney U test, P < 0.05), confirming the validity of the questionnaire criteria. Items were discriminated by correlating the score for each item with the total score, using a minimum of 0.2 as a correlation coefficient cutoff value. Item difficulty was controlled by excluding questions answered correctly by more than 90% of the non-dietitian subjects (too easy) or by less than 10% of them (too difficult). The final questionnaire contained 26 of the original 42 questions, increasing Cronbach's α value from 0.788 to 0.807. Test-retest agreement between respondents was classified as good to very good (Kappa test, >0.60). The KNOA questionnaire developed for primary care practitioners is a valid, consistent and suitable instrument that can be applied over time, making it a promising tool for developing and guiding public health policies.
A Non-Parametric Item Response Theory Evaluation of the CAGE Instrument Among Older Adults.

PubMed

Abdin, Edimansyah; Sagayadevan, Vathsala; Vaingankar, Janhavi Ajit; Picco, Louisa; Chong, Siow Ann; Subramaniam, Mythily

2018-02-23

The validity of the CAGE using item response theory (IRT) has not yet been examined in older adult population. This study aims to investigate the psychometric properties of the CAGE using both non-parametric and parametric IRT models, assess whether there is any differential item functioning (DIF) by age, gender and ethnicity and examine the measurement precision at the cut-off scores. We used data from the Well-being of the Singapore Elderly study to conduct Mokken scaling analysis (MSA), dichotomous Rasch and 2-parameter logistic IRT models. The measurement precision at the cut-off scores were evaluated using classification accuracy (CA) and classification consistency (CC). The MSA showed the overall scalability H index was 0.459, indicating a medium performing instrument. All items were found to be homogenous, measuring the same construct and able to discriminate well between respondents with high levels of the construct and the ones with lower levels. The item discrimination ranged from 1.07 to 6.73 while the item difficulty ranged from 0.33 to 2.80. Significant DIF was found for 2-item across ethnic group. More than 90% (CC and CA ranged from 92.5% to 94.3%) of the respondents were consistently and accurately classified by the CAGE cut-off scores of 2 and 3. The current study provides new evidence on the validity of the CAGE from the IRT perspective. This study provides valuable information of each item in the assessment of the overall severity of alcohol problem and the precision of the cut-off scores in older adult population.
An assessment of functioning and non-functioning distractors in multiple-choice questions: a descriptive analysis.

PubMed

Tarrant, Marie; Ware, James; Mohammed, Ahmed M

2009-07-07

Four- or five-option multiple choice questions (MCQs) are the standard in health-science disciplines, both on certification-level examinations and on in-house developed tests. Previous research has shown, however, that few MCQs have three or four functioning distractors. The purpose of this study was to investigate non-functioning distractors in teacher-developed tests in one nursing program in an English-language university in Hong Kong. Using item-analysis data, we assessed the proportion of non-functioning distractors on a sample of seven test papers administered to undergraduate nursing students. A total of 514 items were reviewed, including 2056 options (1542 distractors and 514 correct responses). Non-functioning options were defined as ones that were chosen by fewer than 5% of examinees and those with a positive option discrimination statistic. The proportion of items containing 0, 1, 2, and 3 functioning distractors was 12.3%, 34.8%, 39.1%, and 13.8% respectively. Overall, items contained an average of 1.54 (SD = 0.88) functioning distractors. Only 52.2% (n = 805) of all distractors were functioning effectively and 10.2% (n = 158) had a choice frequency of 0. Items with more functioning distractors were more difficult and more discriminating. The low frequency of items with three functioning distractors in the four-option items in this study suggests that teachers have difficulty developing plausible distractors for most MCQs. Test items should consist of as many options as is feasible given the item content and the number of plausible distractors; in most cases this would be three. Item analysis results can be used to identify and remove non-functioning distractors from MCQs that have been used in previous tests.
Psychometrics of the preschool behavioral and emotional rating scale with children from early childhood special education settings.

PubMed

Lambert, Matthew C; Cress, Cynthia J; Epstein, Michael H

2015-01-01

In a previous study with a nationally representative sample, researchers found that the items of the Preschool Behavioral and Emotional Rating Scale can best be described by a four-factor structure model (Emotional Regulation, School Readiness, Social Confidence, and Family Involvement). The findings of this investigation replicate and extend these previous results with a national sample of children (N = 1,075) with disabilities enrolled in early childhood special education programs. Data were analyzed using classical tests theory, Rasch modeling, and confirmatory factor analysis. Results confirmed that for the most part, individual items were internally consistent within a four-factor model and showed consistent item difficulty, discrimination, and fit relative to their respective subscale scores. © 2015 Michigan Association for Infant Mental Health.
Assessment of Differential Item Functioning in the Experiences of Discrimination Index

PubMed Central

Cunningham, Timothy J.; Berkman, Lisa F.; Gortmaker, Steven L.; Kiefe, Catarina I.; Jacobs, David R.; Seeman, Teresa E.; Kawachi, Ichiro

2011-01-01

The psychometric properties of instruments used to measure self-reported experiences of discrimination in epidemiologic studies are rarely assessed, especially regarding construct validity. The authors used 2000–2001 data from the Coronary Artery Risk Development in Young Adults (CARDIA) Study to examine differential item functioning (DIF) in 2 versions of the Experiences of Discrimination (EOD) Index, an index measuring self-reported experiences of racial/ethnic and gender discrimination. DIF may confound interpretation of subgroup differences. Large DIF was observed for 2 of 7 racial/ethnic discrimination items: White participants reported more racial/ethnic discrimination for the “at school” item, and black participants reported more racial/ethnic discrimination for the “getting housing” item. The large DIF by race/ethnicity in the index for racial/ethnic discrimination probably reflects item impact and is the result of valid group differences between blacks and whites regarding their respective experiences of discrimination. The authors also observed large DIF by race/ethnicity for 3 of 7 gender discrimination items. This is more likely to have been due to item bias. Users of the EOD Index must consider the advantages and disadvantages of DIF adjustment (omitting items, constructing separate measures, and retaining items). The EOD Index has substantial usefulness as an instrument that can assess self-reported experiences of discrimination. PMID:22038104
Can manual ability be measured with a generic ABILHAND scale? A cross-sectional study conducted on six diagnostic groups

PubMed Central

Arnould, Carlyne; Vandervelde, Laure; Batcho, Charles Sèbiyo; Penta, Massimo; Thonnard, Jean-Louis

2012-01-01

Objectives Several ABILHAND Rasch-built manual ability scales were previously developed for chronic stroke (CS), cerebral palsy (CP), rheumatoid arthritis (RA), systemic sclerosis (SSc) and neuromuscular disorders (NMD). The present study aimed to explore the applicability of a generic manual ability scale unbiased by diagnosis and to study the nature of manual ability across diagnoses. Design Cross-sectional study. Setting Outpatient clinic homes (CS, CP, RA), specialised centres (CP), reference centres (CP, NMD) and university hospitals (SSc). Participants 762 patients from six diagnostic groups: 103 CS adults, 113 CP children, 112 RA adults, 156 SSc adults, 124 NMD children and 124 NMD adults. Primary and secondary outcome measures Manual ability as measured by the ABILHAND disease-specific questionnaires, diagnosis and nature (ie, uni-manual or bi-manual involvement and proximal or distal joints involvement) of the ABILHAND manual activities. Results The difficulties of most manual activities were diagnosis dependent. A principal component analysis highlighted that 57% of the variance in the item difficulty between diagnoses was explained by the symmetric or asymmetric nature of the disorders. A generic scale was constructed, from a metric point of view, with 11 items sharing a common difficulty among diagnoses and 41 items displaying a category-specific location (asymmetric: CS, CP; and symmetric: RA, SSc, NMD). This generic scale showed that CP and NMD children had significantly less manual ability than RA patients, who had significantly less manual ability than CS, SSc and NMD adults. However, the generic scale was less discriminative and responsive to small deficits than disease-specific instruments. Conclusions Our finding that most of the manual item difficulties were disease-dependent emphasises the danger of using generic scales without prior investigation of item invariance across diagnostic groups. Nevertheless, a generic manual ability scale could be developed by adjusting and accounting for activities perceived differently in various disorders. PMID:23117570
Psychometric evaluation of the pediatric and parent-proxy Patient-Reported Outcomes Measurement Information System and the Neurology and Traumatic Brain Injury Quality of Life measurement item banks in pediatric traumatic brain injury.

PubMed

Bertisch, Hilary; Rivara, Frederick P; Kisala, Pamela A; Wang, Jin; Yeates, Keith Owen; Durbin, Dennis; Zonfrillo, Mark R; Bell, Michael J; Temkin, Nancy; Tulsky, David S

2017-07-01

The primary objective is to provide evidence of convergent and discriminant validity for the pediatric and parent-proxy versions of the Patient-Reported Outcomes Measurement Information System (PROMIS) Anxiety, Depression, Anger, Peer Relations, Mobility, Pain Interference, and Fatigue item banks, the Neurology Quality of Life measurement system (Neuro-QOL) Cognition-General Concerns and Stigma item banks, and the Traumatic Brain Injury Quality of Life (TBI-QOL) Executive Function and Headache item banks in a pediatric traumatic brain injury (TBI) sample. Participants were 134 parent-child (ages 8-18 years) days. Children all sustained TBI and the dyads completed outcome ratings 6 months after injury at one of six medical centers across the United States. Ratings included PROMIS, Neuro-QOL, and TBI-QOL item banks, as well as the Pediatric Quality of Life inventory (PedsQL), the Health Behavior Inventory (HBI), and the Strengths and Difficulties Questionnaire (SDQ) as legacy criterion measures against which these item banks were validated. The PROMIS, Neuro-QOL, and TBI-QOL item banks demonstrated good convergent validity, as evidenced by moderate to strong correlations with comparable scales on the legacy measures. PROMIS, Neuro-QOL, and TBI-QOL item banks showed weaker correlations with ratings of unrelated constructs on legacy measures, providing evidence of discriminant validity. Our results indicate that the constructs measured by the PROMIS, Neuro-QOL, and TBI-QOL item banks are valid in our pediatric TBI sample and that it is appropriate to use these standardized scores for our primary study analyses.
Three controversies over item disclosure in medical licensure examinations

PubMed Central

Park, Yoon Soo; Yang, Eunbae B.

2015-01-01

In response to views on public's right to know, there is growing attention to item disclosure – release of items, answer keys, and performance data to the public – in medical licensure examinations and their potential impact on the test's ability to measure competence and select qualified candidates. Recent debates on this issue have sparked legislative action internationally, including South Korea, with prior discussions among North American countries dating over three decades. The purpose of this study is to identify and analyze three issues associated with item disclosure in medical licensure examinations – 1) fairness and validity, 2) impact on passing levels, and 3) utility of item disclosure – by synthesizing existing literature in relation to standards in testing. Historically, the controversy over item disclosure has centered on fairness and validity. Proponents of item disclosure stress test takers’ right to know, while opponents argue from a validity perspective. Item disclosure may bias item characteristics, such as difficulty and discrimination, and has consequences on setting passing levels. To date, there has been limited research on the utility of item disclosure for large scale testing. These issues requires ongoing and careful consideration. PMID:26374693
Item validity vs. item discrimination index: a redundancy?

NASA Astrophysics Data System (ADS)

Panjaitan, R. L.; Irawati, R.; Sujana, A.; Hanifah, N.; Djuanda, D.

2018-03-01

In several literatures about evaluation and test analysis, it is common to find that there are calculations of item validity as well as item discrimination index (D) with different formula for each. Meanwhile, other resources said that item discrimination index could be obtained by calculating the correlation between the testee’s score in a particular item and the testee’s score on the overall test, which is actually the same concept as item validity. Some research reports, especially undergraduate theses tend to include both item validity and item discrimination index in the instrument analysis. It seems that these concepts might overlap for both reflect the test quality on measuring the examinees’ ability. In this paper, examples of some results of data processing on item validity and item discrimination index were compared. It would be discussed whether item validity and item discrimination index can be represented by one of them only or it should be better to present both calculations for simple test analysis, especially in undergraduate theses where test analyses were included.
Psychometric characteristics of Clinical Reasoning Problems (CRPs) and its correlation with routine multiple choice question (MCQ) in Cardiology department.

PubMed

Derakhshandeh, Zahra; Amini, Mitra; Kojuri, Javad; Dehbozorgian, Marziyeh

2018-01-01

Clinical reasoning is one of the most important skills in the process of training a medical student to become an efficient physician. Assessment of the reasoning skills in a medical school program is important to direct students' learning. One of the tests for measuring the clinical reasoning ability is Clinical Reasoning Problems (CRPs). The major aim of this study is to measure psychometric qualities of CRPs and define correlation between this test and routine MCQ in cardiology department of Shiraz medical school. This study was a descriptive study conducted on total cardiology residents of Shiraz Medical School. The study population consists of 40 residents in 2014. The routine CRPs and the MCQ tests was designed based on similar objectives and were carried out simultaneously. Reliability, item difficulty, item discrimination, and correlation between each item and the total score of CRPs were all measured by Excel and SPSS software for checking psycometeric CRPs test. Furthermore, we calculated the correlation between CRPs test and MCQ test. The mean differences of CRPs test score between residents' academic year [second, third and fourth year] were also evaluated by Analysis of variances test (One Way ANOVA) using SPSS software (version 20)(α=0.05). The mean and standard deviation of score in CRPs was 10.19 ±3.39 out of 20; in MCQ, it was 13.15±3.81 out of 20. Item difficulty was in the range of 0.27-0.72; item discrimination was 0.30-0.75 with question No.3 being the exception (that was 0.24). The correlation between each item and the total score of CRP was 0.26-0.87; the correlation between CRPs test and MCQ test was 0.68 (p<0.001). The reliability of the CRPs was 0.72 as calculated by using Cronbach's alpha. The mean score of CRPs was different among residents based on their academic year and this difference was statistically significant (p<0.001). The results of this present investigation revealed that CRPs could be reliable test for measuring clinical reasoning in residents. It can be included in cardiology residency assessment programs.
Item Writer Judgments of Item Difficulty versus Actual Item Difficulty: A Case Study

ERIC Educational Resources Information Center

Sydorenko, Tetyana

2011-01-01

This study investigates how accurate one item writer can be on item difficulty estimates and whether factors affecting item writer judgments correspond to predictors of actual item difficulty. The items were based on conversational dialogs (presented as videos online) that focus on pragmatic functions. Thirty-five 2nd-, 3rd-, and 4th-year learners…
The Dysexecutive Questionnaire advanced: item and test score characteristics, 4-factor solution, and severity classification.

PubMed

Bodenburg, Sebastian; Dopslaff, Nina

2008-01-01

The Dysexecutive Questionnaire (DEX, , Behavioral assessment of the dysexecutive syndrome, 1996) is a standardized instrument to measure possible behavioral changes as a result of the dysexecutive syndrome. Although initially intended only as a qualitative instrument, the DEX has also been used increasingly to address quantitative problems. Until now there have not been more fundamental statistical analyses of the questionnaire's testing quality. The present study is based on an unselected sample of 191 patients with acquired brain injury and reports on the data relating to the quality of the items, the reliability and the factorial structure of the DEX. Item 3 displayed too great an item difficulty, whereas item 11 was not sufficiently discriminating. The DEX's reliability in self-rating is r = 0.85. In addition to presenting the statistical values of the tests, a clinical severity classification of the overall scores of the 4 found factors and of the questionnaire as a whole is carried out on the basis of quartile standards.
Development and Validation of a Novel Generic Health-related Quality of Life Instrument With 20 Items (HINT-20).

PubMed

Jo, Min-Woo; Lee, Hyeon-Jeong; Kim, Soo Young; Kim, Seon-Ha; Chang, Hyejung; Ahn, Jeonghoon; Ock, Minsu

2017-01-01

Few attempts have been made to develop a generic health-related quality of life (HRQoL) instrument and to examine its validity and reliability in Korea. We aimed to do this in our present study. After a literature review of existing generic HRQoL instruments, a focus group discussion, in-depth interviews, and expert consultations, we selected 30 tentative items for a new HRQoL measure. These items were evaluated by assessing their ceiling effects, difficulty, and redundancy in the first survey. To validate the HRQoL instrument that was developed, known-groups validity and convergent/discriminant validity were evaluated and its test-retest reliability was examined in the second survey. Of the 30 items originally assessed for the HRQoL instrument, four were excluded due to high ceiling effects and six were removed due to redundancy. We ultimately developed a HRQoL instrument with a reduced number of 20 items, known as the Health-related Quality of Life Instrument with 20 items (HINT-20), incorporating physical, mental, social, and positive health dimensions. The results of the HINT-20 for known-groups validity were poorer in women, the elderly, and those with a low income. For convergent/discriminant validity, the correlation coefficients of items (except vitality) in the physical health dimension with the physical component summary of the Short Form 36 version 2 (SF-36v2) were generally higher than the correlations of those items with the mental component summary of the SF-36v2, and vice versa. Regarding test-retest reliability, the intraclass correlation coefficient of the total HINT-20 score was 0.813 (p<0.001). A novel generic HRQoL instrument, the HINT-20, was developed for the Korean general population and showed acceptable validity and reliability.
Validation of self-directed learning instrument and establishment of normative data for nursing students in taiwan: using polytomous item response theory.

PubMed

Cheng, Su-Fen; Lee-Hsieh, Jane; Turton, Michael A; Lin, Kuan-Chia

2014-06-01

Little research has investigated the establishment of norms for nursing students' self-directed learning (SDL) ability, recognized as an important capability for professional nurses. An item response theory (IRT) approach was used to establish norms for SDL abilities valid for the different nursing programs in Taiwan. The purposes of this study were (a) to use IRT with a graded response model to reexamine the SDL instrument, or the SDLI, originally developed by this research team using confirmatory factor analysis and (b) to establish SDL ability norms for the four different nursing education programs in Taiwan. Stratified random sampling with probability proportional to size was used. A minimum of 15% of students from the four different nursing education degree programs across Taiwan was selected. A total of 7,879 nursing students from 13 schools were recruited. The research instrument was the 20-item SDLI developed by Cheng, Kuo, Lin, and Lee-Hsieh (2010). IRT with the graded response model was used with a two-parameter logistic model (discrimination and difficulty) for the data analysis, calculated using MULTILOG. Norms were established using percentile rank. Analysis of item information and test information functions revealed that 18 items exhibited very high discrimination and two items had high discrimination. The test information function was higher in this range of scores, indicating greater precision in the estimate of nursing student SDL. Reliability fell between .80 and .94 for each domain and the SDLI as a whole. The total information function shows that the SDLI is appropriate for all nursing students, except for the top 2.5%. SDL ability norms were established for each nursing education program and for the nation as a whole. IRT is shown to be a potent and useful methodology for scale evaluation. The norms for SDL established in this research will provide practical standards for nursing educators and students in Taiwan.

The Assessment of Physiotherapy Practice (APP) is a valid measure of professional competence of physiotherapy students: a cross-sectional study with Rasch analysis.

PubMed

Dalton, Megan; Davidson, Megan; Keating, Jenny

2011-01-01

Is the Assessment of Physiotherapy Practice (APP) a valid instrument for the assessment of entry-level competence in physiotherapy students? Cross-sectional study with Rasch analysis of initial (n=326) and validation samples (n=318). Students were assessed on completion of 4, 5, or 6-week clinical placements across one university semester. 298 clinical educators and 456 physiotherapy students at nine universities in Australia and New Zealand provided 644 completed APP instruments. APP data in both samples showed overall fit to a Rasch model of expected item functioning for interval scale measurement. Item 6 (Written communication) exhibited misfit in both samples, but was retained as an important element of competence. The hierarchy of item difficulty was the same in both samples with items related to professional behaviour and communication the easiest to achieve and items related to clinical reasoning the most difficult. Item difficulty was well targeted to person ability. No Differential Item Functioning was identified, indicating that the scale performed in a comparable way regardless of the student's age, gender or amount of prior clinical experience, and the educator's age, gender, or experience as an educator, or the type of facility, university, or clinical area. The instrument demonstrated unidimensionality confirming the appropriateness of summing the scale scores on each item to provide an overall score of clinical competence and was able to discriminate four levels of professional competence (Person Separation Index=0.96). Person ability and raw APP scores had a linear relationship (r(2)=0.99). Rasch analysis supports the interpretation that a student's APP score is an indication of their underlying level of professional competence in workplace practice. Copyright © 2011 Australian Physiotherapy Association. Published by .. All rights reserved.
Detecting Differential Item Discrimination (DID) and the Consequences of Ignoring DID in Multilevel Item Response Models

ERIC Educational Resources Information Center

Lee, Woo-yeol; Cho, Sun-Joo

2017-01-01

Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…
Development and psychometric characteristics of the SCI-QOL Bladder Management Difficulties and Bowel Management Difficulties item banks and short forms and the SCI-QOL Bladder Complications scale.

PubMed

Tulsky, David S; Kisala, Pamela A; Tate, Denise G; Spungen, Ann M; Kirshblum, Steven C

2015-05-01

To describe the development and psychometric properties of the Spinal Cord Injury--Quality of Life (SCI-QOL) Bladder Management Difficulties and Bowel Management Difficulties item banks and Bladder Complications scale. Using a mixed-methods design, a pool of items assessing bladder and bowel-related concerns were developed using focus groups with individuals with spinal cord injury (SCI) and SCI clinicians, cognitive interviews, and item response theory (IRT) analytic approaches, including tests of model fit and differential item functioning. Thirty-eight bladder items and 52 bowel items were tested at the University of Michigan, Kessler Foundation Research Center, the Rehabilitation Institute of Chicago, the University of Washington, Craig Hospital, and the James J. Peters VA Medical Center, Bronx, NY. Seven hundred fifty-seven adults with traumatic SCI. The final item banks demonstrated unidimensionality (Bladder Management Difficulties CFI=0.965; RMSEA=0.093; Bowel Management Difficulties CFI=0.955; RMSEA=0.078) and acceptable fit to a graded response IRT model. The final calibrated Bladder Management Difficulties bank includes 15 items, and the final Bowel Management Difficulties item bank consists of 26 items. Additionally, 5 items related to urinary tract infections (UTI) did not fit with the larger Bladder Management Difficulties item bank but performed relatively well independently (CFI=0.992, RMSEA=0.050) and were thus retained as a separate scale. The SCI-QOL Bladder Management Difficulties and Bowel Management Difficulties item banks are psychometrically robust and are available as computer adaptive tests or short forms. The SCI-QOL Bladder Complications scale is a brief, fixed-length outcomes instrument for individuals with a UTI.
Effect of clinically discriminating, evidence-based checklist items on the reliability of scores from an Internal Medicine residency OSCE.

PubMed

Daniels, Vijay J; Bordage, Georges; Gierl, Mark J; Yudkowsky, Rachel

2014-10-01

Objective structured clinical examinations (OSCEs) are used worldwide for summative examinations but often lack acceptable reliability. Research has shown that reliability of scores increases if OSCE checklists for medical students include only clinically relevant items. Also, checklists are often missing evidence-based items that high-achieving learners are more likely to use. The purpose of this study was to determine if limiting checklist items to clinically discriminating items and/or adding missing evidence-based items improved score reliability in an Internal Medicine residency OSCE. Six internists reviewed the traditional checklists of four OSCE stations classifying items as clinically discriminating or non-discriminating. Two independent reviewers augmented checklists with missing evidence-based items. We used generalizability theory to calculate overall reliability of faculty observer checklist scores from 45 first and second-year residents and predict how many 10-item stations would be required to reach a Phi coefficient of 0.8. Removing clinically non-discriminating items from the traditional checklist did not affect the number of stations (15) required to reach a Phi of 0.8 with 10 items. Focusing the checklist on only evidence-based clinically discriminating items increased test score reliability, needing 11 stations instead of 15 to reach 0.8; adding missing evidence-based clinically discriminating items to the traditional checklist modestly improved reliability (needing 14 instead of 15 stations). Checklists composed of evidence-based clinically discriminating items improved the reliability of checklist scores and reduced the number of stations needed for acceptable reliability. Educators should give preference to evidence-based items over non-evidence-based items when developing OSCE checklists.
Development and preliminary evaluation of a music-based attention assessment for patients with traumatic brain injury.

PubMed

Jeong, Eunju; Lesiuk, Teresa L

2011-01-01

Impairments in attention are commonly seen in individuals with traumatic brain injury (TBI). While visual attention assessment measurements have been rigorously developed and frequently used in cognitive neurorehabilitation, there is a paucity of auditory attention assessment measurements for patients with TBI. The purpose of this study was to field test a researcher-developed Music-based Attention Assessment (MAA), a melodic contour identification test designed to assess three different types of attention (i.e., sustained attention, selective attention, and divided attention), for patients with TBI. Additionally, this study aimed to evaluate the readability and comprehensibility of the test items and to examine the preliminary psychometric properties of the scale and test items. Fifteen patients diagnosed with TBI completed 3 different series of tasks in which they were required to identify melodic contours. The resulting data showed that (a) test items in each of the 3 subtests were found to have an easy to moderate level of item difficulty and an acceptable to high level of item discrimination, and (b) the musical characteristics (i.e., contour, congruence, and pitch interference) were found to be associated with the level of item difficulty, and (c) the internal consistency of the MAA as computed by Cronbach's alpha was .95. Subsequent studies using a larger sample of typical participants, along with individuals with TBI, are needed to confirm construct validity and internal consistency of the MAA. In addition, the authors recommend examination of criterion validity of the MAA as correlated with current neuropsychological attention assessment measurements.
Development and validation of a new knowledge, attitude, belief and practice questionnaire on leptospirosis in Malaysia.

PubMed

Zahiruddin, Wan Mohd; Arifin, Wan Nor; Mohd-Nazri, Shafei; Sukeri, Surianti; Zawaha, Idris; Bakar, Rahman Abu; Hamat, Rukman Awang; Malina, Osman; Jamaludin, Tengku Zetty Maztura Tengku; Pathman, Arumugam; Mas-Harithulfadhli-Agus, Ab Rahman; Norazlin, Idris; Suhailah, Binti Samsudin; Saudi, Siti Nor Sakinah; Abdullah, Nurul Munirah; Nozmi, Noramira; Zainuddin, Abdul Wahab; Aziah, Daud

2018-03-07

In Malaysia, leptospirosis is considered an endemic disease, with sporadic outbreaks following rainy or flood seasons. The objective of this study was to develop and validate a new knowledge, attitude, belief and practice (KABP) questionnaire on leptospirosis for use in urban and rural populations in Malaysia. The questionnaire comprised development and validation stages. The development phase encompassed a literature review, expert panel review, focus-group testing, and evaluation. The validation phase consisted of exploratory and confirmatory parts to verify the psychometric properties of the questionnaire. A total of 214 and 759 participants were recruited from two Malaysian states, Kelantan and Selangor respectively, for the validation phase. The participants comprised urban and rural communities with a high reported incidence of leptospirosis. The knowledge section of the validation phase utilized item response theory (IRT) analysis. The attitude and belief sections utilized exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). The development phase resulted in a questionnaire that included four main sections: knowledge, attitude, belief, and practice. In the exploratory phase, as shown by the IRT analysis of knowledge about leptospirosis, the difficulty and discrimination values of the items were acceptable, with the exception of two items. Based on the EFA, the psychometric properties of the attitude, belief, and practice sections were poor. Thus, these sections were revised, and no further factor analysis of the practice section was conducted. In the confirmatory stage, the difficulty and discrimination values of the items in the knowledge section remained within the acceptable range. The CFA of the attitude section resulted in a good-fitting two-factor model. The CFA of the belief section retained low number of items, although the analysis resulted in a good fit in the final three-factor model. Based on the IRT analysis and factor analytic evidence, the knowledge and attitude sections of the KABP questionnaire on leptospirosis were psychometrically valid. However, the psychometric properties of the belief section were unsatisfactory, despite being revised after the initial validation study. Further development of this section is warranted in future studies.
Development and validity of a questionnaire to test the knowledge of primary care personnel regarding nutrition in obese adolescents

PubMed Central

2013-01-01

Background In light of its epidemic proportions in developed and developing countries, obesity is considered a serious public health issue. In order to increase knowledge concerning the ability of health care professionals in caring for obese adolescents and adopt more efficient preventive and control measures, a questionnaire was developed and validated to assess non-dietitian health professionals regarding their Knowledge of Nutrition in Obese Adolescents (KNOA). Methods The development and evaluation of a questionnaire to assess the knowledge of primary care practitioners with respect to nutrition in obese adolescents was carried out in five phases, as follows: 1) definition of study dimensions 2) development of 42 questions and preliminary evaluation of the questionnaire by a panel of experts; 3) characterization and selection of primary care practitioners (35 dietitians and 265 non-dietitians) and measurement of questionnaire criteria by contrasting the responses of dietitians and non-dietitians; 4) reliability assessment by question exclusion based on item difficulty (too easy and too difficult for non-dietitian practitioners), item discrimination, internal consistency and reproducibility index determination; and 5) scoring the completed questionnaires. Results Dietitians obtained higher scores than non-dietitians (Mann–Whitney U test, P < 0.05), confirming the validity of the questionnaire criteria. Items were discriminated by correlating the score for each item with the total score, using a minimum of 0.2 as a correlation coefficient cutoff value. Item difficulty was controlled by excluding questions answered correctly by more than 90% of the non-dietitian subjects (too easy) or by less than 10% of them (too difficult). The final questionnaire contained 26 of the original 42 questions, increasing Cronbach’s α value from 0.788 to 0.807. Test-retest agreement between respondents was classified as good to very good (Kappa test, >0.60). Conclusion The KNOA questionnaire developed for primary care practitioners is a valid, consistent and suitable instrument that can be applied over time, making it a promising tool for developing and guiding public health policies. PMID:23865564
Defining and validating a short form Montreal Cognitive Assessment (s-MoCA) for use in neurodegenerative disease

PubMed Central

Roalf, David R; Moore, Tyler M; Wolk, David A; Arnold, Steven E; Mechanic-Hamilton, Dawn; Rick, Jacqueline; Kabadi, Sushila; Ruparel, Kosha; Chen-Plotkin, Alice S; Chahine, Lama M; Dahodwala, Nabila A; Duda, John E; Weintraub, Daniel A; Moberg, Paul J

2016-01-01

Introduction Screening for cognitive deficits is essential in neurodegenerative disease. Screening tests, such as the Montreal Cognitive Assessment (MoCA), are easily administered, correlate with neuropsychological performance and demonstrate diagnostic utility. Yet, administration time is too long for many clinical settings. Methods Item response theory and computerised adaptive testing simulation were employed to establish an abbreviated MoCA in 1850 well-characterised community-dwelling individuals with and without neurodegenerative disease. Results 8 MoCA items with high item discrimination and appropriate difficulty were identified for use in a short form (s-MoCA). The s-MoCA was highly correlated with the original MoCA, showed robust diagnostic classification and cross-validation procedures substantiated these items. Discussion Early detection of cognitive impairment is an important clinical and public health concern, but administration of screening measures is limited by time constraints in demanding clinical settings. Here, we provide as-MoCA that is valid across neurological disorders and can be administered in approximately 5 min. PMID:27071646
Who was that masked man? Conjoint representations of intrinsic motions with actor appearance.

PubMed

Kersten, Alan W; Earles, Julie L; Negri, Leehe

2018-09-01

Motion plays an important role in recognising animate creatures. This research supports a distinction between intrinsic and extrinsic motions in their relationship to identifying information about the characters performing the motions. Participants viewed events involving costumed human characters. Intrinsic motions involved relative movements of a character's body parts, whereas extrinsic motions involved movements with respect to external landmarks. Participants were later tested for recognition of the motions and who had performed them. The critical test items involved familiar characters performing motions that had previously been performed by other characters. Participants falsely recognised extrinsic conjunction items, in which characters followed the paths of other characters, more often than intrinsic conjunction items, in which characters moved in the manner of other characters. In contrast, participants falsely recognised new extrinsic motions less often than new intrinsic motions, suggesting that they remembered extrinsic motions but had difficulty remembering who had performed them. Modelling of receiver operating characteristics indicated that participants discriminated old items from intrinsic conjunction items via familiarity, consistent with conjoint representations of intrinsic motion and identity information. In contrast, participants used recollection to distinguish old items from extrinsic conjunction items, consistent with separate but associated representations of extrinsic motion and identity information.
IRTs of the ABCs: Children's Letter Name Acquisition

PubMed Central

Piasta, Shayne B.; Anthony, Jason L.; Lonigan, Christopher J.; Francis, David J.

2015-01-01

We examined the developmental sequence of letter name knowledge acquisition by children from 2 to five years of age. Data from 2 samples representing diverse regions, ethnicity, and socioeconomic backgrounds (ns = 1074 & 500) were analyzed using item response theory (IRT) and differential item functioning techniques. Results from factor analyses indicated that letter name knowledge represented a unidimensional skill; IRT results yielded significant differences between letters in both difficulty and discrimination. Results also indicated an approximate developmental sequence in letter name learning for the simplest and most challenging to learn letters -- but with no clear sequence between these extremes. Findings also suggested that children were most likely to first learn their first initial. We discuss implications for assessment and instruction. PMID:22710016
Quantity discrimination in canids: Dogs (Canis familiaris) and wolves (Canis lupus) compared.

PubMed

Miletto Petrazzini, Maria Elena; Wynne, Clive D L

2017-11-01

Accumulating evidence indicates that animals are able to discriminate between quantities. Recent studies have shown that dogs' and coyotes' ability to discriminate between quantities of food items decreases with increasing numerical ratio. Conversely, wolves' performance is not affected by numerical ratio. Cross-species comparisons are difficult because of differences in the methodologies employed, and hence it is still unclear whether domestication altered quantitative abilities in canids. Here we used the same procedure to compare pet dogs and wolves in a spontaneous food choice task. Subjects were presented with two quantities of food items and allowed to choose only one option. Four numerical contrasts of increasing difficulty (range 1-4) were used to assess the influence of numerical ratio on the performance of the two species. Dogs' accuracy was affected by numerical ratio, while no ratio effect was observed in wolves. These results align with previous findings and reinforce the idea of different quantitative competences in dogs and wolves. Although we cannot exclude that other variables might have played a role in shaping quantitative abilities in these two species, our results might suggest that the interspecific differences here reported may have arisen as a result of domestication. Copyright © 2017 Elsevier B.V. All rights reserved.
Developing an African youth psychosocial assessment: an application of item response theory.

PubMed

Betancourt, Theresa S; Yang, Frances; Bolton, Paul; Normand, Sharon-Lise

2014-06-01

This study aimed to refine a dimensional scale for measuring psychosocial adjustment in African youth using item response theory (IRT). A 60-item scale derived from qualitative data was administered to 667 war-affected adolescents (55% female). Exploratory factor analysis (EFA) determined the dimensionality of items based on goodness-of-fit indices. Items with loadings less than 0.4 were dropped. Confirmatory factor analysis (CFA) was used to confirm the scale's dimensionality found under the EFA. Item discrimination and difficulty were estimated using a graded response model for each subscale using weighted least squares means and variances. Predictive validity was examined through correlations between IRT scores (θ) for each subscale and ratings of functional impairment. All models were assessed using goodness-of-fit and comparative fit indices. Fisher's Information curves examined item precision at different underlying ranges of each trait. Original scale items were optimized and reconfigured into an empirically-robust 41-item scale, the African Youth Psychosocial Assessment (AYPA). Refined subscales assess internalizing and externalizing problems, prosocial attitudes/behaviors and somatic complaints without medical cause. The AYPA is a refined dimensional assessment of emotional and behavioral problems in African youth with good psychometric properties. Validation studies in other cultures are recommended. Copyright © 2014 John Wiley & Sons, Ltd.
Developing an African youth psychosocial assessment: an application of item response theory

PubMed Central

BETANCOURT, THERESA S.; YANG, FRANCES; BOLTON, PAUL; NORMAND, SHARON-LISE

2014-01-01

This study aimed to refine a dimensional scale for measuring psychosocial adjustment in African youth using item response theory (IRT). A 60-item scale derived from qualitative data was administered to 667 war-affected adolescents (55% female). Exploratory factor analysis (EFA) determined the dimensionality of items based on goodness-of-fit indices. Items with loadings less than 0.4 were dropped. Confirmatory factor analysis (CFA) was used to confirm the scale's dimensionality found under the EFA. Item discrimination and difficulty were estimated using a graded response model for each subscale using weighted least squares means and variances. Predictive validity was examined through correlations between IRT scores (θ) for each subscale and ratings of functional impairment. All models were assessed using goodness-of-fit and comparative fit indices. Fisher's Information curves examined item precision at different underlying ranges of each trait. Original scale items were optimized and reconfigured into an empirically-robust 41-item scale, the African Youth Psychosocial Assessment (AYPA). Refined subscales assess internalizing and externalizing problems, prosocial attitudes/behaviors and somatic complaints without medical cause. The AYPA is a refined dimensional assessment of emotional and behavioral problems in African youth with good psychometric properties. Validation studies in other cultures are recommended. PMID:24478113
A signal detection-item response theory model for evaluating neuropsychological measures.

PubMed

Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Risbrough, Victoria B; Baker, Dewleen G

2018-02-05

Models from signal detection theory are commonly used to score neuropsychological test data, especially tests of recognition memory. Here we show that certain item response theory models can be formulated as signal detection theory models, thus linking two complementary but distinct methodologies. We then use the approach to evaluate the validity (construct representation) of commonly used research measures, demonstrate the impact of conditional error on neuropsychological outcomes, and evaluate measurement bias. Signal detection-item response theory (SD-IRT) models were fitted to recognition memory data for words, faces, and objects. The sample consisted of U.S. Infantry Marines and Navy Corpsmen participating in the Marine Resiliency Study. Data comprised item responses to the Penn Face Memory Test (PFMT; N = 1,338), Penn Word Memory Test (PWMT; N = 1,331), and Visual Object Learning Test (VOLT; N = 1,249), and self-report of past head injury with loss of consciousness. SD-IRT models adequately fitted recognition memory item data across all modalities. Error varied systematically with ability estimates, and distributions of residuals from the regression of memory discrimination onto self-report of past head injury were positively skewed towards regions of larger measurement error. Analyses of differential item functioning revealed little evidence of systematic bias by level of education. SD-IRT models benefit from the measurement rigor of item response theory-which permits the modeling of item difficulty and examinee ability-and from signal detection theory-which provides an interpretive framework encompassing the experimentally validated constructs of memory discrimination and response bias. We used this approach to validate the construct representation of commonly used research measures and to demonstrate how nonoptimized item parameters can lead to erroneous conclusions when interpreting neuropsychological test data. Future work might include the development of computerized adaptive tests and integration with mixture and random-effects models.
Development of the Sexual Minority Adolescent Stress Inventory

PubMed Central

Schrager, Sheree M.; Goldbach, Jeremy T.; Mamey, Mary Rose

2018-01-01

Although construct measurement is critical to explanatory research and intervention efforts, rigorous measure development remains a notable challenge. For example, though the primary theoretical model for understanding health disparities among sexual minority (e.g., lesbian, gay, bisexual) adolescents is minority stress theory, nearly all published studies of this population rely on minority stress measures with poor psychometric properties and development procedures. In response, we developed the Sexual Minority Adolescent Stress Inventory (SMASI) with N = 346 diverse adolescents ages 14–17, using a comprehensive approach to de novo measure development designed to produce a measure with desirable psychometric properties. After exploratory factor analysis on 102 candidate items informed by a modified Delphi process, we applied item response theory techniques to the remaining 72 items. Discrimination and difficulty parameters and item characteristic curves were estimated overall, within each of 12 initially derived factors, and across demographic subgroups. Two items were removed for excessive discrimination and three were removed following reliability analysis. The measure demonstrated configural and scalar invariance for gender and age; a three-item factor was excluded for demonstrating substantial differences by sexual identity and race/ethnicity. The final 64-item measure comprised 11 subscales and demonstrated excellent overall (α = 0.98), subscale (α range 0.75–0.96), and test–retest (scale r > 0.99; subscale r range 0.89–0.99) reliabilities. Subscales represented a mix of proximal and distal stressors, including domains of internalized homonegativity, identity management, intersectionality, and negative expectancies (proximal) and social marginalization, family rejection, homonegative climate, homonegative communication, negative disclosure experiences, religion, and work domains (distal). Thus, the SMASI development process illustrates a method to incorporate information from multiple sources, including item response theory models, to guide item selection in building a psychometrically sound measure. We posit that similar methods can be used to improve construct measurement across all areas of psychological research, particularly in areas where a strong theoretical framework exists but existing measures are limited. PMID:29599737
Psychometric characteristics of Clinical Reasoning Problems (CRPs) and its correlation with routine multiple choice question (MCQ) in Cardiology department

PubMed Central

DERAKHSHANDEH, ZAHRA; AMINI, MITRA; KOJURI, JAVAD; DEHBOZORGIAN, MARZIYEH

2018-01-01

Introduction: Clinical reasoning is one of the most important skills in the process of training a medical student to become an efficient physician. Assessment of the reasoning skills in a medical school program is important to direct students’ learning. One of the tests for measuring the clinical reasoning ability is Clinical Reasoning Problems (CRPs). The major aim of this study is to measure psychometric qualities of CRPs and define correlation between this test and routine MCQ in cardiology department of Shiraz medical school. Methods: This study was a descriptive study conducted on total cardiology residents of Shiraz Medical School. The study population consists of 40 residents in 2014. The routine CRPs and the MCQ tests was designed based on similar objectives and were carried out simultaneously. Reliability, item difficulty, item discrimination, and correlation between each item and the total score of CRPs were all measured by Excel and SPSS software for checking psycometeric CRPs test. Furthermore, we calculated the correlation between CRPs test and MCQ test. The mean differences of CRPs test score between residents’ academic year [second, third and fourth year] were also evaluated by Analysis of variances test (One Way ANOVA) using SPSS software (version 20)(α=0.05). Results: The mean and standard deviation of score in CRPs was 10.19 ±3.39 out of 20; in MCQ, it was 13.15±3.81 out of 20. Item difficulty was in the range of 0.27-0.72; item discrimination was 0.30-0.75 with question No.3 being the exception (that was 0.24). The correlation between each item and the total score of CRP was 0.26-0.87; the correlation between CRPs test and MCQ test was 0.68 (p<0.001). The reliability of the CRPs was 0.72 as calculated by using Cronbach's alpha. The mean score of CRPs was different among residents based on their academic year and this difference was statistically significant (p<0.001). Conclusion: The results of this present investigation revealed that CRPs could be reliable test for measuring clinical reasoning in residents. It can be included in cardiology residency assessment programs. PMID:29344528
Assessing the Performance of Classical Test Theory Item Discrimination Estimators in Monte Carlo Simulations

ERIC Educational Resources Information Center

Bazaldua, Diego A. Luna; Lee, Young-Sun; Keller, Bryan; Fellers, Lauren

2017-01-01

The performance of various classical test theory (CTT) item discrimination estimators has been compared in the literature using both empirical and simulated data, resulting in mixed results regarding the preference of some discrimination estimators over others. This study analyzes the performance of various item discrimination estimators in CTT:…
Discrimination, Other Psychosocial Stressors, and Self-Reported Sleep Duration and Difficulties

PubMed Central

Slopen, Natalie; Williams, David R.

2014-01-01

Objectives: To advance understanding of the relationship between discrimination and sleep duration and difficulties, with consideration of multiple dimensions of discrimination, and attention to concurrent stressors; and to examine the contribution of discrimination and other stressors to racial/ ethnic differences in these outcomes. Design: Cross-sectional probability sample. Setting: Chicago, IL. Participants: There were 2,983 black, Hispanic, and white adults. Measurements and Results: Outcomes included self-reported sleep duration and difficulties. Discrimination, including racial and nonracial everyday and major experiences of discrimination, workplace harassment and incivilities, and other stressors were assessed via questionnaire. In models adjusted for sociodemographic characteristics, greater exposure to racial (β = -0.14)) and nonracial (β = -0.08) everyday discrimination, major experiences of discrimination attributed to race/ethnicity (β = -0.17), and workplace harassment and incivilities (β = -0.14) were associated with shorter sleep (P < 0.05). The association between major experiences of discrimination attributed to race/ethnicity and sleep duration (β = -0.09, P < 0.05) was independent of concurrent stressors (i.e., acute events, childhood adversity, and financial, community, employment, and relationship stressors). Racial (β = 0.04) and non-racial (β = 0.05) everyday discrimination and racial (β = 0.04) and nonracial (β = 0.04) major experiences of discrimination, and workplace harassment and incivilities (β = 0.04) were also associated with more (log) sleep difficulties, and associations between racial and nonracial everyday discrimination and sleep difficulties remained after adjustment for other stressors (P < 0.05). Racial/ethnic differences in sleep duration and difficulties were not significant after adjustment for discrimination (P > 0.05). Conclusions: Discrimination was associated with shorter sleep and more sleep difficulties, independent of socioeconomic status and other stressors, and may account for some of the racial/ethnic differences in sleep. Citation: Slopen N; Williams DR. Discrimination, other psychosocial stressors, and self-reported sleep duration and difficulties. SLEEP 2014;37(1):147-156. PMID:24381373
Unidimensional IRT Item Parameter Estimates across Equivalent Test Forms with Confounding Specifications within Dimensions

ERIC Educational Resources Information Center

Matlock, Ki Lynn; Turner, Ronna

2016-01-01

When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
Exploring Item Characteristics That Are Related to the Difficulty of TOEFL Dialogue Items. Research Reports. RR-79. RR-04-11

ERIC Educational Resources Information Center

Kostin, Irene

2004-01-01

The purpose of this study is to explore the relationship between a set of item characteristics and the difficulty of TOEFL[R] dialogue items. Identifying characteristics that are related to item difficulty has the potential to improve the efficiency of the item-writing process The study employed 365 TOEFL dialogue items, which were coded on 49…

Statistical Approaches to the Study of Item Difficulty.

ERIC Educational Resources Information Center

Olson, John F.; And Others

Traditionally, item difficulty has been defined in terms of the performance of examinees. For test development purposes, a more useful concept would be some kind of intrinsic item difficulty, defined in terms of the item's content, context, or characteristics and the task demands set by the item. In this investigation, the measurement literature…
Modelling Question Difficulty in an A Level Physics Examination

ERIC Educational Resources Information Center

Crisp, Victoria; Grayson, Rebecca

2013-01-01

"Item difficulty modelling" is a technique used for a number of purposes such as to support future item development, to explore validity in relation to the constructs that influence difficulty and to predict the difficulty of items. This research attempted to explore the factors influencing question difficulty in a general qualification…
Discrimination, other psychosocial stressors, and self-reported sleep duration and difficulties.

PubMed

Slopen, Natalie; Williams, David R

2014-01-01

To advance understanding of the relationship between discrimination and sleep duration and difficulties, with consideration of multiple dimensions of discrimination, and attention to concurrent stressors; and to examine the contribution of discrimination and other stressors to racial/ ethnic differences in these outcomes. Cross-sectional probability sample. Chicago, IL. There were 2,983 black, Hispanic, and white adults. Outcomes included self-reported sleep duration and difficulties. Discrimination, including racial and nonracial everyday and major experiences of discrimination, workplace harassment and incivilities, and other stressors were assessed via questionnaire. In models adjusted for sociodemographic characteristics, greater exposure to racial (β = -0.14)) and nonracial (β = -0.08) everyday discrimination, major experiences of discrimination attributed to race/ethnicity (β = -0.17), and workplace harassment and incivilities (β = -0.14) were associated with shorter sleep (P < 0.05). The association between major experiences of discrimination attributed to race/ethnicity and sleep duration (β = -0.09, P < 0.05) was independent of concurrent stressors (i.e., acute events, childhood adversity, and financial, community, employment, and relationship stressors). Racial (β = 0.04) and non-racial (β = 0.05) everyday discrimination and racial (β = 0.04) and nonracial (β = 0.04) major experiences of discrimination, and workplace harassment and incivilities (β = 0.04) were also associated with more (log) sleep difficulties, and associations between racial and nonracial everyday discrimination and sleep difficulties remained after adjustment for other stressors (P < 0.05). Racial/ethnic differences in sleep duration and difficulties were not significant after adjustment for discrimination (P > 0.05). Discrimination was associated with shorter sleep and more sleep difficulties, independent of socioeconomic status and other stressors, and may account for some of the racial/ethnic differences in sleep.
Using Rasch rating scale model to reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales in school children.

PubMed

Jafari, Peyman; Bagheri, Zahra; Ayatollahi, Seyyed Mohamad Taghi; Soltani, Zahra

2012-03-13

Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales. The PedsQL™ 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients.
Development and Validation of a Novel Generic Health-related Quality of Life Instrument With 20 Items (HINT-20)

PubMed Central

2017-01-01

Objectives Few attempts have been made to develop a generic health-related quality of life (HRQoL) instrument and to examine its validity and reliability in Korea. We aimed to do this in our present study. Methods After a literature review of existing generic HRQoL instruments, a focus group discussion, in-depth interviews, and expert consultations, we selected 30 tentative items for a new HRQoL measure. These items were evaluated by assessing their ceiling effects, difficulty, and redundancy in the first survey. To validate the HRQoL instrument that was developed, known-groups validity and convergent/discriminant validity were evaluated and its test-retest reliability was examined in the second survey. Results Of the 30 items originally assessed for the HRQoL instrument, four were excluded due to high ceiling effects and six were removed due to redundancy. We ultimately developed a HRQoL instrument with a reduced number of 20 items, known as the Health-related Quality of Life Instrument with 20 items (HINT-20), incorporating physical, mental, social, and positive health dimensions. The results of the HINT-20 for known-groups validity were poorer in women, the elderly, and those with a low income. For convergent/discriminant validity, the correlation coefficients of items (except vitality) in the physical health dimension with the physical component summary of the Short Form 36 version 2 (SF-36v2) were generally higher than the correlations of those items with the mental component summary of the SF-36v2, and vice versa. Regarding test-retest reliability, the intraclass correlation coefficient of the total HINT-20 score was 0.813 (p<0.001). Conclusions A novel generic HRQoL instrument, the HINT-20, was developed for the Korean general population and showed acceptable validity and reliability. PMID:28173686
Predicting Item Difficulty of Science National Curriculum Tests: The Case of Key Stage 2 Assessments

ERIC Educational Resources Information Center

El Masri, Yasmine H.; Ferrara, Steve; Foltz, Peter W.; Baird, Jo-Anne

2017-01-01

Predicting item difficulty is highly important in education for both teachers and item writers. Despite identifying a large number of explanatory variables, predicting item difficulty remains a challenge in educational assessment with empirical attempts rarely exceeding 25% of variance explained. This paper analyses 216 science items of key stage…
The Curiosity and Exploration Inventory-II: Development, Factor Structure, and Psychometrics

PubMed Central

Kashdan, Todd B.; Gallagher, Matthew W.; Silvia, Paul J.; Winterstein, Beate P.; Breen, William E.; Terhar, Daniel; Steger, Michael F.

2009-01-01

Given curiosity’s fundamental role in motivation, learning, and well-being, we sought to refine the measurement of trait curiosity with an improved version of the Curiosity and Exploration Inventory (CEI; Kashdan, Rose, & Fincham, 2004). A preliminary pool of 36 items was administered to 311 undergraduate students, who also completed measures of emotion, emotion regulation, personality, and well-being. Factor analyses indicated a two factor model—motivation to seek out knowledge and new experiences (Stretching; 5 items) and a willingness to embrace the novel, uncertain, and unpredictable nature of everyday life (Embracing; 5 items). In two additional samples (ns = 150 and 119), we cross-validated this factor structure and provided initial evidence for construct validity. This includes positive correlations with personal growth, openness to experience, autonomy, purpose in life, self-acceptance, psychological flexibility, positive affect, and positive social relations, among others. Applying item response theory (IRT) to these samples (n = 578), we showed that the items have good discrimination and a desirable breadth of difficulty. The item information functions and test information function were centered near zero, indicating that the scale assesses the mid-range of the latent curiosity trait most reliably. The findings thus far provide good evidence for the psychometric properties of the 10-item CEI-II. PMID:20160913
An Improved Internal Consistency Reliability Estimate.

ERIC Educational Resources Information Center

Cliff, Norman

1984-01-01

The proposed coefficient is derived by assuming that the average Goodman-Kruskal gamma between items of identical difficulty would be the same for items of different difficulty. An estimate of covariance between items of identical difficulty leads to an estimate of the correlation between two tests with identical distributions of difficulty.…
Psychometric properties and reliability of the Assessment Screen to Identify Survivors Toolkit for Gender Based Violence (ASIST-GBV): results from humanitarian settings in Ethiopia and Colombia.

PubMed

Vu, Alexander; Wirtz, Andrea; Pham, Kiemanh; Singh, Sonal; Rubenstein, Leonard; Glass, Nancy; Perrin, Nancy

2016-01-01

Refugees and internally displaced persons who are affected by armed-conflict are at increased vulnerability to some forms of sexual violence or other types of gender-based violence. A validated, brief and easy-to-administer screening tool will help service providers identify GBV survivors and refer them to appropriate GBV services. To date, no such GBV screening tool exists. We developed the 7-item ASIST-GBV screening tool from qualitative research that included individual interviews and focus groups with GBV refugee and IDP survivors. This study presents the psychometric properties of the ASIST-GBV with female refugees living in Ethiopia and IDPs in Colombia. Several strategies were used to validate ASIST-GBV, including a 3 month implementation to validate the brief screening tool with women/girls seeking health services, aged ≥15 years in Ethiopia (N = 487) and female IDPs aged ≥ 18 years in Colombia (N = 511). High proportions of women screened positive for past-year GBV according to the ASIST-GBV: 50.6 % in Ethiopia and 63.4 % in Colombia. The factor analysis identified a single dimension, meaning that all items loaded on the single factor. Cronbach's α = 0.77. A 2-parameter logistic IRT model was used for estimating the precision and discriminating power of each item. Item difficulty varied across the continuum of GBV experiences in the following order (lowest to highest): threats of violence (0.690), physical violence (1.28), forced sex (2.49), coercive sex for survival (2.25), forced marriage (3.51), and forced pregnancy (6.33). Discrimination results showed that forced pregnancy was the item with the strongest ability to discriminate between different levels of GBV. Physical violence and forced sex also have higher levels of discrimination with threats of violence discriminating among women at the low end of the GBV continuum and coercive sex for survival among women at the mid-range of the continuum. The findings demonstrate that the ASIST-GBV has strong psychometric properties and good reliability. The tool can be used to screen and identify female GBV survivors confidentially and efficiently among IDPs in Colombia and refugees in Ethiopia. Early identification of GBV survivors can enable safety planning, early referral for treatment, and psychosocial support to prevent long-term harmful consequence of GBV.
The Confounding Effects of Ability, Item Difficulty, and Content Balance within Multiple Dimensions on the Estimation of Unidimensional Thetas

ERIC Educational Resources Information Center

Matlock, Ki Lynn

2013-01-01

When test forms that have equal total test difficulty and number of items vary in difficulty and length within sub-content areas, an examinee's estimated score may vary across equivalent forms, depending on how well his or her true ability in each sub-content area aligns with the difficulty of items and number of items within these areas.…
Developing Item Response Theory-Based Short Forms to Measure the Social Impact of Burn Injuries.

PubMed

Marino, Molly E; Dore, Emily C; Ni, Pengsheng; Ryan, Colleen M; Schneider, Jeffrey C; Acton, Amy; Jette, Alan M; Kazis, Lewis E

2018-03-01

To develop self-reported short forms for the Life Impact Burn Recovery Evaluation (LIBRE) Profile. Short forms based on the item parameters of discrimination and average difficulty. A support network for burn survivors, peer support networks, social media, and mailings. Burn survivors (N=601) older than 18 years. Not applicable. The LIBRE Profile. Ten-item short forms were developed to cover the 6 LIBRE Profile scales: Relationships with Family & Friends, Social Interactions, Social Activities, Work & Employment, Romantic Relationships, and Sexual Relationships. Ceiling effects were ≤15% for all scales; floor effects were <1% for all scales. The marginal reliability of the short forms ranged from .85 to .89. The LIBRE Profile-Short Forms demonstrated credible psychometric properties. The short form version provides a viable alternative to administering the LIBRE Profile when resources do not allow computer or Internet access. The full item bank, computerized adaptive test, and short forms are all scored along the same metric, and therefore scores are comparable regardless of the mode of administration. Copyright © 2017 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Item Information and Discrimination Functions for Trinary PCM Items.

ERIC Educational Resources Information Center

Akkermans, Wies; Muraki, Eiji

1997-01-01

For trinary partial credit items, the shape of the item information and item discrimination functions is examined in relation to the item parameters. Conditions under which these functions are unimodal and bimodal are discussed, and the locations and values of maxima are derived. Practical relevance of the results is discussed. (SLD)
A new self-report inventory of dyslexia for students: criterion and construct validity.

PubMed

Tamboer, Peter; Vorst, Harrie C M

2015-02-01

The validity of a Dutch self-report inventory of dyslexia was ascertained in two samples of students. Six biographical questions, 20 general language statements and 56 specific language statements were based on dyslexia as a multi-dimensional deficit. Dyslexia and non-dyslexia were assessed with two criteria: identification with test results (Sample 1) and classification using biographical information (both samples). Using discriminant analyses, these criteria were predicted with various groups of statements. All together, 11 discriminant functions were used to estimate classification accuracy of the inventory. In Sample 1, 15 statements predicted the test criterion with classification accuracy of 98%, and 18 statements predicted the biographical criterion with classification accuracy of 97%. In Sample 2, 16 statements predicted the biographical criterion with classification accuracy of 94%. Estimations of positive and negative predictive value were 89% and 99%. Items of various discriminant functions were factor analysed to find characteristic difficulties of students with dyslexia, resulting in a five-factor structure in Sample 1 and a four-factor structure in Sample 2. Answer bias was investigated with measures of internal consistency reliability. Less than 20 self-report items are sufficient to accurately classify students with and without dyslexia. This supports the usefulness of self-assessment of dyslexia as a valid alternative to diagnostic test batteries. Copyright © 2015 John Wiley & Sons, Ltd.
Development and Psychometric Evaluation of the Gay Male Sexual Difficulties Scale.

PubMed

McDonagh, Lorraine K; Stewart, Ian; Morrison, Melanie A; Morrison, Todd G

2016-08-01

Sexual difficulties (i.e., disturbances in normal sexual responding) have the potential to significantly and negatively affect men's social and psychological well-being. However, a review of published measurement tools indicates that most have limited applicability to gay men, and none offer a nuanced understanding of sexual difficulties, as experienced by members of this population. To address this omission, the Gay Male Sexual Difficulties Scale (GMSDS) was developed using a sequential mixed-methods approach. The 25-item GMSDS uses a 6-point frequency Likert-type response format and examines: difficulties with receptive and insertive anal intercourse (5 items each); erectile difficulties (4 items); foreskin difficulties (4 items); body embarrassment (4 items); and seminal fluid concerns (3 items). The measure's scale score dimensionality, assessed using both exploratory and confirmatory factor analyses, as well as scale score reliability and validity (e.g., known-groups and convergent) was tested and deemed to be satisfactory. Limitations of the current series of studies and directions for future research are discussed.
[Development of critical thinking skill evaluation scale for nursing students].

PubMed

You, So Young; Kim, Nam Cho

2014-04-01

To develop a Critical Thinking Skill Test for Nursing Students. The construct concepts were drawn from a literature review and in-depth interviews with hospital nurses and surveys were conducted among students (n=607) from nursing colleges. The data were collected from September 13 to November 23, 2012 and analyzed using the SAS program, 9.2 version. The KR 20 coefficient for reliability, difficulty index, discrimination index, item-total correlation and known group technique for validity were performed. Four domains and 27 skills were identified and 35 multiple choice items were developed. Thirty multiple choice items which had scores higher than .80 on the content validity index were selected for the pre test. From the analysis of the pre test data, a modified 30 items were selected for the main test. In the main test, the KR 20 coefficient was .70 and Corrected Item-Total Correlations range was .11-.38. There was a statistically significant difference between two academic systems (p=.001). The developed instrument is the first critical thinking skill test reflecting nursing perspectives in hospital settings and is expected to be utilized as a tool which contributes to improvement of the critical thinking ability of nursing students.
Comparison of Alternate and Original Items on the Montreal Cognitive Assessment.

PubMed

Lebedeva, Elena; Huang, Mei; Koski, Lisa

2016-03-01

The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. None of the five items from the alternate versions matched the difficulty level of their corresponding original items. This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time.
The Effect of Mental Rotation on Surgical Pathological Diagnosis.

PubMed

Park, Heejung; Kim, Hyun Soo; Cha, Yoon Jin; Choi, Junjeong; Minn, Yangki; Kim, Kyung Sik; Kim, Se Hoon

2018-05-01

Pathological diagnosis involves very delicate and complex consequent processing that is conducted by a pathologist. The recognition of false patterns might be an important cause of misdiagnosis in the field of surgical pathology. In this study, we evaluated the influence of visual and cognitive bias in surgical pathologic diagnosis, focusing on the influence of "mental rotation." We designed three sets of the same images of uterine cervix biopsied specimens (original, left to right mirror images, and 180-degree rotated images), and recruited 32 pathologists to diagnose the 3 set items individually. First, the items found to be adequate for analysis by classical test theory, Generalizability theory, and item response theory. The results showed statistically no differences in difficulty, discrimination indices, and response duration time between the image sets. Mental rotation did not influence the pathologists' diagnosis in practice. Interestingly, outliers were more frequent in rotated image sets, suggesting that the mental rotation process may influence the pathological diagnoses of a few individual pathologists. © Copyright: Yonsei University College of Medicine 2018.
An Analysis of Factors Affecting the Difficulty of Dialogue Items in TOEFL Listening Comprehension. TOEFL Research Reports, 51.

ERIC Educational Resources Information Center

Nissan, Susan; And Others

One of the item types in the Listening Comprehension section of the Test of English as a Foreign Language (TOEFL) test is the dialogue. Because the dialogue item pool needs to have an appropriate balance of items at a range of difficulty levels, test developers have examined items at various difficulty levels in an attempt to identify their…
Is It Working? Distractor Analysis Results from the Test Of Astronomy STandards (TOAST) Assessment Instrument

NASA Astrophysics Data System (ADS)

Slater, Stephanie

2009-05-01

The Test Of Astronomy STandards (TOAST) assessment instrument is a multiple-choice survey tightly aligned to the consensus learning goals stated by the American Astronomical Society - Chair's Conference on ASTRO 101, the American Association of the Advancement of Science's Project 2061 Benchmarks, and the National Research Council's National Science Education Standards. Researchers from the Cognition in Astronomy, Physics and Earth sciences Research (CAPER) Team at the University of Wyoming's Science and Math Teaching Center (UWYO SMTC) have been conducting a question-by-question distractor analysis procedure to determine the sensitivity and effectiveness of each item. In brief, the frequency each possible answer choice, known as a foil or distractor on a multiple-choice test, is determined and compared to the existing literature on the teaching and learning of astronomy. In addition to having statistical difficulty and discrimination values, a well functioning assessment item will show students selecting distractors in the relative proportions to how we expect them to respond based on known misconceptions and reasoning difficulties. In all cases, our distractor analysis suggests that all items are functioning as expected. These results add weight to the validity of the Test Of Astronomy STandards (TOAST) assessment instrument, which is designed to help instructors and researchers measure the impact of course-length duration instructional strategies for undergraduate science survey courses with learning goals tightly aligned to the consensus goals of the astronomy education community.
Development of a Meiosis Concept Inventory

PubMed Central

Kalas, Pamela; O’Neill, Angie; Pollock, Carol; Birol, Gülnur

2013-01-01

We have designed, developed, and validated a 17-question Meiosis Concept Inventory (Meiosis CI) to diagnose student misconceptions on meiosis, which is a fundamental concept in genetics. We targeted large introductory biology and genetics courses and used published methodology for question development, which included the validation of questions by student interviews (n = 28), in-class testing of the questions by students (n = 193), and expert (n = 8) consensus on the correct answers. Our item analysis showed that the questions’ difficulty and discrimination indices were in agreement with published recommended standards and discriminated effectively between high- and low-scoring students. We foresee other institutions using the Meiosis CI as both a diagnostic tool and an instrument to assess teaching effectiveness and student progress, and invite instructors to visit http://q4b.biology.ubc.ca for more information. PMID:24297292

Development and validation of brief scales to measure emotional and behavioural problems among Chinese adolescents

PubMed Central

Shen, Minxue; Hu, Ming; Sun, Zhenqiu

2017-01-01

Objectives To develop and validate brief scales to measure common emotional and behavioural problems among adolescents in the examination-oriented education system and collectivistic culture of China. Setting Middle schools in Hunan province. Participants 5442 middle school students aged 11–19 years were sampled. 4727 valid questionnaires were collected and used for validation of the scales. The final sample included 2408 boys and 2319 girls. Primary and secondary outcome measures The tools were assessed by the item response theory, classical test theory (reliability and construct validity) and differential item functioning. Results Four scales to measure anxiety, depression, study problem and sociality problem were established. Exploratory factor analysis showed that each scale had two solutions. Confirmatory factor analysis showed acceptable to good model fit for each scale. Internal consistency and test–retest reliability of all scales were above 0.7. Item response theory showed that all items had acceptable discrimination parameters and most items had appropriate difficulty parameters. 10 items demonstrated differential item functioning with respect to gender. Conclusions Four brief scales were developed and validated among adolescents in middle schools of China. The scales have good psychometric properties with minor differential item functioning. They can be used in middle school settings, and will help school officials to assess the students’ emotional/behavioural problems. PMID:28062469
Sources of difficulty in assessment: example of PISA science items

NASA Astrophysics Data System (ADS)

Le Hebel, Florence; Montpied, Pascale; Tiberghien, Andrée; Fontanieu, Valérie

2017-03-01

The understanding of what makes a question difficult is a crucial concern in assessment. To study the difficulty of test questions, we focus on the case of PISA, which assesses to what degree 15-year-old students have acquired knowledge and skills essential for full participation in society. Our research question is to identify PISA science item characteristics that could influence the item's proficiency level. It is based on an a-priori item analysis and a statistical analysis. Results show that only the cognitive complexity and the format out of the different characteristics of PISA science items determined in our a-priori analysis have an explanatory power on an item's proficiency levels. The proficiency level cannot be explained by the dependence/independence of the information provided in the unit and/or item introduction and the competence. We conclude that in PISA, it appears possible to anticipate a high proficiency level, that is, students' low scores for items displaying a high cognitive complexity. In the case of a middle or low cognitive complexity level item, the cognitive complexity level is not sufficient to predict item difficulty. Other characteristics play a crucial role in item difficulty. We discuss anticipating the difficulties in assessment in a broader perspective.
The Genetics Concept Assessment: a new concept inventory for gauging student understanding of genetics.

PubMed

Smith, Michelle K; Wood, William B; Knight, Jennifer K

2008-01-01

We have designed, developed, and validated a 25-question Genetics Concept Assessment (GCA) to test achievement of nine broad learning goals in majors and nonmajors undergraduate genetics courses. Written in everyday language with minimal jargon, the GCA is intended for use as a pre- and posttest to measure student learning gains. The assessment was reviewed by genetics experts, validated by student interviews, and taken by >600 students at three institutions. Normalized learning gains on the GCA were positively correlated with averaged exam scores, suggesting that the GCA measures understanding of topics relevant to instructors. Statistical analysis of our results shows that differences in the item difficulty and item discrimination index values between different questions on pre- and posttests can be used to distinguish between concepts that are well or poorly learned during a course.
The Genetics Concept Assessment: A New Concept Inventory for Gauging Student Understanding of Genetics

PubMed Central

Wood, William B.; Knight, Jennifer K.

2008-01-01

We have designed, developed, and validated a 25-question Genetics Concept Assessment (GCA) to test achievement of nine broad learning goals in majors and nonmajors undergraduate genetics courses. Written in everyday language with minimal jargon, the GCA is intended for use as a pre- and posttest to measure student learning gains. The assessment was reviewed by genetics experts, validated by student interviews, and taken by >600 students at three institutions. Normalized learning gains on the GCA were positively correlated with averaged exam scores, suggesting that the GCA measures understanding of topics relevant to instructors. Statistical analysis of our results shows that differences in the item difficulty and item discrimination index values between different questions on pre- and posttests can be used to distinguish between concepts that are well or poorly learned during a course. PMID:19047428
Symptoms of anxiety in depression: assessment of item performance of the Hamilton Anxiety Rating Scale in patients with depression.

PubMed

Vaccarino, Anthony L; Evans, Kenneth R; Sills, Terrence L; Kalali, Amir H

2008-01-01

Although diagnostically dissociable, anxiety is strongly co-morbid with depression. To examine further the clinical symptoms of anxiety in major depressive disorder (MDD), a non-parametric item response analysis on "blinded" data from four pharmaceutical company clinical trials was performed on the Hamilton Anxiety Rating Scale (HAMA) across levels of depressive severity. The severity of depressive symptoms was assessed using the 17-item Hamilton Depression Rating Scale (HAMD). HAMA and HAMD measures were supplied for each patient on each of two post-screen visits (n=1,668 observations). Option characteristic curves were generated for all 14 HAMA items to determine the probability of scoring a particular option on the HAMA in relation to the total HAMD score. Additional analyses were conducted using Pearson's product-moment correlations. Results showed that anxiety-related symptomatology generally increased as a function of overall depressive severity, though there were clear differences between individual anxiety symptoms in their relationship with depressive severity. In particular, anxious mood, tension, insomnia, difficulties in concentration and memory, and depressed mood were found to discriminate over the full range of HAMD scores, increasing continuously with increases in depressive severity. By contrast, many somatic-related symptoms, including muscular, sensory, cardiovascular, respiratory, gastro-intestinal, and genito-urinary were manifested primarily at higher levels of depression and did not discriminate well at lower HAMD scores. These results demonstrate anxiety as a core feature of depression, and the relationship between anxiety-related symptoms and depression should be considered in the assessment of depression and evaluation of treatment strategies and outcome.
Is the Factor Observed in Investigations on the Item-Position Effect Actually the Difficulty Factor?

ERIC Educational Resources Information Center

Schweizer, Karl; Troche, Stefan

2018-01-01

In confirmatory factor analysis quite similar models of measurement serve the detection of the difficulty factor and the factor due to the item-position effect. The item-position effect refers to the increasing dependency among the responses to successively presented items of a test whereas the difficulty factor is ascribed to the wide range of…
Student perception and post-exam analysis of one best MCQs and one correct MCQs: A comparative study.

PubMed

Adhi, Mohammad Idrees; Aly, Syed Moyn

2018-04-01

To find differences between One-Correct and One-Best multiple-choice questions with relation to student scores, post-exam item analyses results and student perception. This comparative cross-sectional study was conducted at the Dow University of Health Sciences, Karachi, from November 2010 to April 2011, and comprised medical students. Data was analysed using SPSS 18. Of the 207 participants, 16(7.7%) were boys and 191(92.3%) were girls. The mean score in Paper I was 18.62±4.7, while in Paper II it was 19.58±6.1. One-Best multiple-choice questions performed better than One-Correct. There was no statistically significant difference in the mean scores of the two papers or in the difficulty indices. Difficulty and discrimination indices correlated well in both papers. Cronbach's alpha of paper I was 0.584 and that of paper II was 0.696. Point-biserial values were better for paper II than for paper I. Most students expressed dissatisfaction with paper II. One-Best multiple-choice questions showed better scores, higher reliability, better item performance and correlation values.
Comparison of Alternate and Original Items on the Montreal Cognitive Assessment

PubMed Central

Lebedeva, Elena; Huang, Mei; Koski, Lisa

2016-01-01

Background The Montreal Cognitive Assessment (MoCA) is a screening tool for mild cognitive impairment (MCI) in elderly individuals. We hypothesized that measurement error when using the new alternate MoCA versions to monitor change over time could be related to the use of items that are not of comparable difficulty to their corresponding originals of similar content. The objective of this study was to compare the difficulty of the alternate MoCA items to the original ones. Methods Five selected items from alternate versions of the MoCA were included with items from the original MoCA administered adaptively to geriatric outpatients (N = 78). Rasch analysis was used to estimate the difficulty level of the items. Results None of the five items from the alternate versions matched the difficulty level of their corresponding original items. Conclusions This study demonstrates the potential benefits of a Rasch analysis-based approach for selecting items during the process of development of parallel forms. The results suggest that better match of the items from different MoCA forms by their difficulty would result in higher sensitivity to changes in cognitive function over time. PMID:27076861
The Caregiver Contribution to Heart Failure Self-Care (CACHS): Further Psychometric Testing of a Novel Instrument.

PubMed

Buck, Harleah G; Harkness, Karen; Ali, Muhammad Usman; Carroll, Sandra L; Kryworuchko, Jennifer; McGillion, Michael

2017-04-01

Caregivers (CGs) contribute important assistance with heart failure (HF) self-care, including daily maintenance, symptom monitoring, and management. Until CGs' contributions to self-care can be quantified, it is impossible to characterize it, account for its impact on patient outcomes, or perform meaningful cost analyses. The purpose of this study was to conduct psychometric testing and item reduction on the recently developed 34-item Caregiver Contribution to Heart Failure Self-care (CACHS) instrument using classical and item response theory methods. Fifty CGs (mean age 63 years ±12.84; 70% female) recruited from a HF clinic completed the CACHS in 2014 and results evaluated using classical test theory and item response theory. Items would be deleted for low (<.05) or high (>.95) endorsement, low (<.3) or high (>.7) corrected item-total correlations, significant pairwise correlation coefficients, floor or ceiling effects, relatively low latent trait and item information function levels (<1.5 and p > .5), and differential item functioning. After analysis, 14 items were excluded, resulting in a 20-item instrument (self-care maintenance eight items; monitoring seven items; and management five items). Most items demonstrated moderate to high discrimination (median 2.13, minimum .77, maximum 5.05), and appropriate item difficulty (-2.7 to 1.4). Internal consistency reliability was excellent (Cronbach α = .94, average inter-item correlation = .41) with no ceiling effects. The newly developed 20-item version of the CACHS is supported by rigorous instrument development and represents a novel instrument to measure CGs' contribution to HF self-care. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Using the Nominal Response Model to Evaluate Response Category Discrimination in the PROMIS Emotional Distress Item Pools

ERIC Educational Resources Information Center

Preston, Kathleen; Reise, Steven; Cai, Li; Hays, Ron D.

2011-01-01

The authors used a nominal response item response theory model to estimate category boundary discrimination (CBD) parameters for items drawn from the Emotional Distress item pools (Depression, Anxiety, and Anger) developed in the Patient-Reported Outcomes Measurement Information Systems (PROMIS) project. For polytomous items with ordered response…
Comorbid Attentional Factors and Frequency Discrimination Performance in a Child with Reading Difficulties

ERIC Educational Resources Information Center

Sutcliffe, Paul

2006-01-01

This research investigated the frequency discrimination performance of a 6-year-old boy (MH) with language and attentional difficulties. MH had been reported to have literacy problems not paralleling an advanced verbal ability, and he showed difficulties in discriminating non-verbal tones of different frequencies in comparison with children of his…
Examining Differential Item Functions of Different Item Ordered Test Forms According to Item Difficulty Levels

ERIC Educational Resources Information Center

Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem

2016-01-01

The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
Perceived discrimination, social support, and perceived stress among people living with HIV/AIDS in China.

PubMed

Su, Xiaoyou; Lau, Joseph T F; Mak, Winnie W S; Chen, Lin; Choi, K C; Song, Junmin; Zhang, Yan; Zhao, Guanglu; Feng, Tiejian; Chen, Xi; Liu, Chuliang; Liu, Jun; Liu, De; Cheng, Jinquan

2013-01-01

Perceived stress among people living with HIV/AIDS (PLWH) was associated with severe mental health problems and risk behaviors. Discrimination toward PLWH in China is prevalent. Both perceived discrimination and social supports are determinants of the stress level among PLWH. Psychological support services for PLWH in China are scarce. It is unknown whether social support is a buffer between the perceived discrimination and perceived stress. With written consent, this study surveyed 258 PLWH recruited from multiple sources in two cities in China. Instruments were validated in previous or the present study, including the perceived stress scale for PLWH (PSSHIV), the perceived social support scale (PSSS), and the perceived discrimination scale for PLWH (PDSHIV). Pearson correlations and multiple regression models were fit. PDSHIV was associated with the Overall Scale and all subscales of PSSHIV, whilst lower socioeconomic status in general and lower scores of PSSS were associated with various subscales of PSSHIV. The interaction item (PSSS×PSDHIV) was nonsignificant in modeling PSSHIV, hence no significant moderating effect was detected. Whilst perceived discrimination is a major source of stress and social support can reduce stress among PLWH in China, improved social support cannot buffer the stressful consequences due to perceived discrimination. The results highlight the importance to reduce discrimination toward PLWH and the difficulty to alleviate its negative consequences. It is warranted to improve mental health among PLWH in China and it is still important to foster social support among PLWH as it has direct effects on perceived stress.
Using Differential Item Functioning Procedures to Explore Sources of Item Difficulty and Group Performance Characteristics.

ERIC Educational Resources Information Center

Scheuneman, Janice Dowd; Gerritz, Kalle

1990-01-01

Differential item functioning (DIF) methodology for revealing sources of item difficulty and performance characteristics of different groups was explored. A total of 150 Scholastic Aptitude Test items and 132 Graduate Record Examination general test items were analyzed. DIF was evaluated for males and females and Blacks and Whites. (SLD)
Item Structural Properties as Predictors of Item Difficulty and Item Association.

ERIC Educational Resources Information Center

Solano-Flores, Guillermo

1993-01-01

Studied the ability of logical test design (LTD) to predict student performance in reading Roman numerals for 211 sixth graders in Mexico City tested on Roman numeral items varying on LTD-related and non-LTD-related variables. The LTD-related variable item iterativity was found to be the best predictor of item difficulty. (SLD)
Using Rasch rating scale model to reassess the psychometric properties of the Persian version of the PedsQLTM 4.0 Generic Core Scales in school children

PubMed Central

2012-01-01

Background Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQLTM 4.0 Generic Core Scales. Methods The PedsQLTM 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. Results The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. Conclusions This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients. PMID:22414135
Predicting Item Difficulty in a Reading Comprehension Test with an Artificial Neural Network.

ERIC Educational Resources Information Center

Perkins, Kyle; And Others

1995-01-01

This article reports the results of using a three-layer back propagation artificial neural network to predict item difficulty in a reading comprehension test. Three classes of variables were examined: text structure, propositional analysis, and cognitive demand. Results demonstrate that the networks can consistently predict item difficulty. (JL)
Multiple choice questions can be designed or revised to challenge learners' critical thinking.

PubMed

Tractenberg, Rochelle E; Gushta, Matthew M; Mulroney, Susan E; Weissinger, Peggy A

2013-12-01

Multiple choice (MC) questions from a graduate physiology course were evaluated by cognitive-psychology (but not physiology) experts, and analyzed statistically, in order to test the independence of content expertise and cognitive complexity ratings of MC items. Integration of higher order thinking into MC exams is important, but widely known to be challenging-perhaps especially when content experts must think like novices. Expertise in the domain (content) may actually impede the creation of higher-complexity items. Three cognitive psychology experts independently rated cognitive complexity for 252 multiple-choice physiology items using a six-level cognitive complexity matrix that was synthesized from the literature. Rasch modeling estimated item difficulties. The complexity ratings and difficulty estimates were then analyzed together to determine the relative contributions (and independence) of complexity and difficulty to the likelihood of correct answers on each item. Cognitive complexity was found to be statistically independent of difficulty estimates for 88 % of items. Using the complexity matrix, modifications were identified to increase some item complexities by one level, without affecting the item's difficulty. Cognitive complexity can effectively be rated by non-content experts. The six-level complexity matrix, if applied by faculty peer groups trained in cognitive complexity and without domain-specific expertise, could lead to improvements in the complexity targeted with item writing and revision. Targeting higher order thinking with MC questions can be achieved without changing item difficulties or other test characteristics, but this may be less likely if the content expert is left to assess items within their domain of expertise.
Diagnostic Utility of Craving in Predicting Nicotine Dependence: Impact of Craving Content and Item Stability

PubMed Central

2013-01-01

Introduction: Craving is useful in the diagnosis of drug dependence, but it is unclear how various items used to assess craving might influence the diagnostic performance of craving measures. This study determined the diagnostic performance of individual items and item subgroups of the 32-item Questionnaire on Smoking Urges (QSU) as a function of item wording, level of craving intensity, and item stability. Methods: Nondaily and daily smokers (n = 222) completed the QSU on 6 separate occasions, and item responses were averaged across the administrations. Nicotine dependence was assessed with the Wisconsin Inventory of Smoking Dependence Motives. The discriminative performance of the QSU items was evaluated with receiver-operating characteristic curves and area under the curve statistics. Results: Although each of the QSU items and selected subgroups of items significantly discriminated dependent from nondependent smokers, certain item subgroups outperformed others. There was no difference in discriminative performance between use of the specific terms urge and crave or between items assessing intention to smoke relative to those assessing desire to smoke, but there were significant differences in the two major factors represented on the QSU and in craving items reflecting more intense relative to less intense craving. Stability of the item scores was strongly related to the discriminative performance of craving. Conclusions: Items indexing stable, high-intensity aspects of craving that reflect the negative reinforcing effects of smoking will likely be most useful for diagnostic purposes. Future directions and implications are discussed. PMID:23817585
Diagnostic utility of craving in predicting nicotine dependence: impact of craving content and item stability.

PubMed

Germeroth, Lisa J; Wray, Jennifer M; Gass, Julie C; Tiffany, Stephen T

2013-12-01

Craving is useful in the diagnosis of drug dependence, but it is unclear how various items used to assess craving might influence the diagnostic performance of craving measures. This study determined the diagnostic performance of individual items and item subgroups of the 32-item Questionnaire on Smoking Urges (QSU) as a function of item wording, level of craving intensity, and item stability. Nondaily and daily smokers (n = 222) completed the QSU on 6 separate occasions, and item responses were averaged across the administrations. Nicotine dependence was assessed with the Wisconsin Inventory of Smoking Dependence Motives. The discriminative performance of the QSU items was evaluated with receiver-operating characteristic curves and area under the curve statistics. Although each of the QSU items and selected subgroups of items significantly discriminated dependent from nondependent smokers, certain item subgroups outperformed others. There was no difference in discriminative performance between use of the specific terms urge and crave or between items assessing intention to smoke relative to those assessing desire to smoke, but there were significant differences in the two major factors represented on the QSU and in craving items reflecting more intense relative to less intense craving. Stability of the item scores was strongly related to the discriminative performance of craving. Items indexing stable, high-intensity aspects of craving that reflect the negative reinforcing effects of smoking will likely be most useful for diagnostic purposes. Future directions and implications are discussed.

Disruptive behaviors in the classroom: initial standardization data on a new teacher rating scale.

PubMed

Burns, G L; Owen, S M

1990-10-01

This study presents initial standardization data on the Sutter-Eyberg Student Behavior Inventory (SESBI), a teacher-completed measure of disruptive classroom behaviors. SESBIs were completed on 1116 children in kingergarten through fifth grade in a rural eastern Washington school district. Various analyses (Cronbach's alpha, corrected item-total correlations, average interitem correlations, principal components analyses) indicated that the SESBI provides a homogeneous measure of disruptive behaviors. Support was also found for three factors within the scale (e.g., overt aggression, oppositional behavior, and attentional difficulties). While the child's age did not have a significant effect on the SESBI, the child's gender did have a significant effect on scale scores as well as on most of the items, with males being rated more problematic than females. The SESBI was also able to discriminate between children in treatment for behavioral problems or learning disabilities and children not in treatment.
Conceptualizing and Measuring Weekend versus Weekday Alcohol Use: Item Response Theory and Confirmatory Factor Analysis

PubMed Central

Handren, Lindsay; Crano, William D.

2018-01-01

Culturally, people tend to abstain from alcohol intake during the weekdays and wait to consume in greater frequency and quantity during the weekends. The current research sought to empirically justify the days representing weekday versus weekend alcohol consumption. In study 1 (N = 419), item response theory was applied to a two-parameter (difficulty and discrimination) model that evaluated the days of drinking (frequency) during the typical 7-day week. Item characteristic curves were most similar for Monday, Tuesday, and Wednesday (prototypical weekday) and for Friday and Saturday (prototypical weekend). Thursday and Sunday, however, exhibited item characteristics that bordered the properties of weekday and weekend consumption. In study 2 (N = 403), confirmatory factor analysis was applied to test six hypothesized measurement structures representing drinks per day (quantity) during the typical week. The measurement model producing the strongest fit indices was a correlated two-factor structure involving separate weekday and weekend factors that permitted Thursday and Sunday to double load on both dimensions. The proper conceptualization and accurate measurement of the days demarcating the normative boundaries of “dry” weekdays and “wet” weekends are imperative to inform research and prevention efforts targeting temporal alcohol intake patterns. PMID:27488456
Conceptualizing and Measuring Weekend versus Weekday Alcohol Use: Item Response Theory and Confirmatory Factor Analysis.

PubMed

Lac, Andrew; Handren, Lindsay; Crano, William D

2016-10-01

Culturally, people tend to abstain from alcohol intake during the weekdays and wait to consume in greater frequency and quantity during the weekends. The current research sought to empirically justify the days representing weekday versus weekend alcohol consumption. In study 1 (N = 419), item response theory was applied to a two-parameter (difficulty and discrimination) model that evaluated the days of drinking (frequency) during the typical 7-day week. Item characteristic curves were most similar for Monday, Tuesday, and Wednesday (prototypical weekday) and for Friday and Saturday (prototypical weekend). Thursday and Sunday, however, exhibited item characteristics that bordered the properties of weekday and weekend consumption. In study 2 (N = 403), confirmatory factor analysis was applied to test six hypothesized measurement structures representing drinks per day (quantity) during the typical week. The measurement model producing the strongest fit indices was a correlated two-factor structure involving separate weekday and weekend factors that permitted Thursday and Sunday to double load on both dimensions. The proper conceptualization and accurate measurement of the days demarcating the normative boundaries of "dry" weekdays and "wet" weekends are imperative to inform research and prevention efforts targeting temporal alcohol intake patterns.
Validation of a General and Sport Nutrition Knowledge Questionnaire in Adolescents and Young Adults: GeSNK.

PubMed

Calella, Patrizia; Iacullo, Vittorio Maria; Valerio, Giuliana

2017-04-29

Good knowledge of nutrition is widely thought to be an important aspect to maintaining a balanced and healthy diet. The aim of this study was to develop and validate a new reliable tool to measure the general and the sport nutrition knowledge (GeSNK) in people who used to practice sports at different levels. The development of (GeSNK) was carried out in six phases as follows: (1) item development and selection by a panel of experts; (2) pilot study in order to assess item difficulty and item discrimination; (3) measurement of the internal consistency; (4) reliability assessment with a 2-week test-retest analysis; (5) concurrent validity was tested by administering the questionnaire along with other two similar tools; (6) construct validity by administering the questionnaire to three groups of young adults with different general nutrition and sport nutrition knowledge. The final questionnaire, consisted of 62 items of the original 183 questions. It is a consistent, valid, and suitable instrument that can be applied over time, making it a promising tool to look at the relationship between nutrition knowledge, demographic characteristics, and dietary behavior in adolescents and young adults.
The Effects of Judgment-Based Stratum Classifications on the Efficiency of Stratum Scored CATs.

ERIC Educational Resources Information Center

Finney, Sara J.; Smith, Russell W.; Wise, Steven L.

Two operational item pools were used to investigate the performance of stratum computerized adaptive tests (CATs) when items were assigned to strata based on empirical estimates of item difficulty or human judgments of item difficulty. Items from the first data set consisted of 54 5-option multiple choice items from a form of the ACT mathematics…
Repeated retrieval practice and item difficulty: does criterion learning eliminate item difficulty effects?

PubMed

Vaughn, Kalif E; Rawson, Katherine A; Pyc, Mary A

2013-12-01

A wealth of previous research has established that retrieval practice promotes memory, particularly when retrieval is successful. Although successful retrieval promotes memory, it remains unclear whether successful retrieval promotes memory equally well for items of varying difficulty. Will easy items still outperform difficult items on a final test if all items have been correctly recalled equal numbers of times during practice? In two experiments, normatively difficult and easy Lithuanian-English word pairs were learned via test-restudy practice until each item had been correctly recalled a preassigned number of times (from 1 to 11 correct recalls). Despite equating the numbers of successful recalls during practice, performance on a delayed final cued-recall test was lower for difficult than for easy items. Experiment 2 was designed to diagnose whether the disadvantage for difficult items was due to deficits in cue memory, target memory, and/or associative memory. The results revealed a disadvantage for the difficult versus the easy items only on the associative recognition test, with no differences on cue recognition, and even an advantage on target recognition. Although successful retrieval enhanced memory for both difficult and easy items, equating retrieval success during practice did not eliminate normative item difficulty differences.
Development of a Comprehensive Heart Disease Knowledge Questionnaire

PubMed Central

Bergman, Hannah E.; Reeve, Bryce B.; Moser, Richard P.; Scholl, Sarah; Klein, William M. P.

2011-01-01

Background Heart disease is the number one killer of both men and women in the United States, yet a comprehensive and evidence-based heart disease knowledge assessment is currently not available. Purpose This paper describes the 2 phase development of a novel heart disease knowledge questionnaire. Methods After review and critique of the existing literature, a questionnaire addressing 5 central domains of heart disease knowledge was constructed. In Phase I, 606 undergraduates completed a 82-item questionnaire. In Phase II, 248 undergraduates completed a revised 74-item questionnaire. In both phases, item clarity and difficulty were evaluated, along with the overall factor structure of the scale. Results Exploratory and confirmatory factor analyses were used to reduce the scale to 30 items with fit statistics, CFI = .82, TLI = .88, and RMSEA = .03. Scores were correlated moderately positively with an existing scale and weakly positively with a measure of health literacy, thereby establishing both convergent and divergent validity. Discussion The finalized 30-item questionnaire is a concise, yet discriminating instrument that reliably measures participants' heart disease knowledge levels. Translation to Health Education Practice Health professionals can use this scale to assess their patients' heart disease knowledge so that they can create a tailored program to help their patients reduce their heart disease risk. PMID:21720571
The revised Stress Measurement of Female Marriage Immigrants in Korea: Evaluation of the psychometric properties.

PubMed

Park, Min Hee; Yang, Sook Ja; Chee, Yeon Kyung

2016-01-01

The twenty-one item Stress Measurement of Female Marriage Immigrants (SMFMI) was developed to assess stress of female marriage immigrants in Korea. This study reports the psychometric properties of a revised SMFMI (SMFMI-R) for application with female marriage immigrants to Korea who were raising children. Participants were 190 female marriage immigrants from China, Vietnam, the Philippines, and other Asian countries, who were recruited using convenience sampling between November 2013 and December 2013. Survey questionnaires were translated into study participants' native languages (Chinese, Vietnamese, and English). Principal component analysis yielded nineteen items in four factors (family, parenting, cultural, and economic stress), explaining 63.5% of the variance, which was slightly better than the original scale. Confirmatory factor analysis indicated adequate fit for the four-factor model. Based on classic test theory and item response theory, strong support was provided for item discrimination, item difficulty, and internal consistency (Cronbach's alpha = 0.923). SMFMI-R scores were negatively associated with Korean proficiency and subjective economic status. The SMFMI-R is a valid, reliable, and comprehensive measure of stress for female marriage immigrants and can provide useful information to develop intervention programs for those who may be at risk for emotional stress.
Difficulties with Pitch Discrimination Influences Pitch Memory Performance: Evidence from Congenital Amusia

PubMed Central

Jiang, Cunmei; Lim, Vanessa K.; Wang, Hang; Hamm, Jeff P.

2013-01-01

Music processing is influenced by pitch perception and memory. Additionally these features interact, with pitch memory performance decreasing as the perceived distance between two pitches decreases. This study examined whether or not the difficulty of pitch discrimination influences pitch retention by testing individuals with congenital amusia. Pitch discrimination difficulty was equated by determining an individual’s threshold with a two down one up staircase procedure and using this to create conditions where two pitches (the standard and the comparison tones) differed by 1x, 2x, and 3x the threshold setting. For comparison with the literature a condition that employed a constant pitch difference of four semitones was also included. The results showed that pitch memory performance improved as the discrimination between the standard and the comparison tones was made easier for both amusic and control groups, and more importantly, that amusics did not show any pitch retention deficits when the discrimination difficulty was equated. In contrast, consistent with previous literature, amusics performed worse than controls when the physical pitch distance was held constant at four semitones. This impaired performance has been interpreted as evidence for pitch memory impairment in the past. However, employing a constant pitch distance always makes the difference closer to the discrimination threshold for the amusic group than for the control group. Therefore, reduced performance in this condition may simply reflect differences in the perceptual difficulty of the discrimination. The findings indicate the importance of equating the discrimination difficulty when investigating memory. PMID:24205375
The Effect of the Position of an Item within a Test on the Item Difficulty Value.

ERIC Educational Resources Information Center

Rubin, Lois S.; Mott, David E. W.

An investigation of the effect on the difficulty value of an item due to position placement within a test was made. Using a 60-item operational test comprised of 5 subtests, 60 items were placed as experimental items on a number of spiralled test forms in three different positions (first, middle, last) within the subtest composed of like items.…
Psychometrics of the Fitness-to-Drive Screening Measure.

PubMed

Classen, Sherrilene; Velozo, Craig A; Winter, Sandra M; Bédard, Michel; Wang, Yanning

2015-01-01

We employed item response theory (IRT), specifically using Rasch modeling, to determine the measurement precision of the Fitness-to-Drive Screening Measure (FTDS), a tool that can be used by caregivers and occupational therapists to help detect at-risk drivers. We examined unidimensionality through the factor structure (how items contribute to the central construct of fitness to drive), rating scale (use of the categories of the rating scale), item/person-level separation (distinguishing between items with different difficulty levels or persons with different ability levels) and reliability, item hierarchy (easier driving items advancing to more difficult driving items), rater reliability, rater effects (severity vs. leniency of a rater), and criterion validity of the FTDS to an on-road assessment, via three rater groups (n = 200 older drivers; n = 200 caregivers; n = 2 evaluators). The FTDS is unidimensional, the rating scale performed well, has good person (> 3.07) and item (> 5.43) separation, good person (> 0.90) and item reliability (> 0.97), with < 10% misfitting items for two rater groups (caregivers and drivers). The intraclass correlation (ICC) coefficient among the three rater groups was significant (.253, p < .001) and the evaluators were the most severe raters. When comparing the caregivers' FTDS rating with the drivers' on-road assessment, the areas under the curve (index of discriminability; caregivers .726, p < .001) suggested concurrent validity between the FTDS and the on-road assessment. Despite limitations, the FTDS is a reliable and accurate screening measure for caregivers to help identify at-risk older drivers and for occupational therapy practitioners to start conversations about driving.
Development and Validation of a Multimedia-based Assessment of Scientific Inquiry Abilities

NASA Astrophysics Data System (ADS)

Kuo, Che-Yu; Wu, Hsin-Kai; Jen, Tsung-Hau; Hsu, Ying-Shao

2015-09-01

The potential of computer-based assessments for capturing complex learning outcomes has been discussed; however, relatively little is understood about how to leverage such potential for summative and accountability purposes. The aim of this study is to develop and validate a multimedia-based assessment of scientific inquiry abilities (MASIA) to cover a more comprehensive construct of inquiry abilities and target secondary school students in different grades while this potential is leveraged. We implemented five steps derived from the construct modeling approach to design MASIA. During the implementation, multiple sources of evidence were collected in the steps of pilot testing and Rasch modeling to support the validity of MASIA. Particularly, through the participation of 1,066 8th and 11th graders, MASIA showed satisfactory psychometric properties to discriminate students with different levels of inquiry abilities in 101 items in 29 tasks when Rasch models were applied. Additionally, the Wright map indicated that MASIA offered accurate information about students' inquiry abilities because of the comparability of the distributions of student abilities and item difficulties. The analysis results also suggested that MASIA offered precise measures of inquiry abilities when the components (questioning, experimenting, analyzing, and explaining) were regarded as a coherent construct. Finally, the increased mean difficulty thresholds of item responses along with three performance levels across all sub-abilities supported the alignment between our scoring rubrics and our inquiry framework. Together with other sources of validity in the pilot testing, the results offered evidence to support the validity of MASIA.
Item Response Theory Modeling of the Philadelphia Naming Test.

PubMed

Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D

2015-06-01

In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating explanatory variables to item difficulty. This article describes the statistical model underlying the computer adaptive PNT presented in a companion article (Hula, Kellough, & Fergadiotis, 2015). Using archival data, we evaluated the fit of the PNT to 1- and 2-parameter logistic models and examined the precision of the resulting parameter estimates. We regressed the item difficulty estimates on three predictor variables: word length, age of acquisition, and contextual diversity. The 2-parameter logistic model demonstrated marginally better fit, but the fit of the 1-parameter logistic model was adequate. Precision was excellent for both person ability and item difficulty estimates. Word length, age of acquisition, and contextual diversity all independently contributed to variance in item difficulty. Item-response-theory methods can be productively used to analyze and quantify anomia severity in aphasia. Regression of item difficulty on lexical variables supported the validity of the PNT and interpretation of anomia severity scores in the context of current word-finding models.
The Effects of Clinically Relevant Multiple-Choice Items on the Statistical Discrimination of Physician Clinical Competence.

ERIC Educational Resources Information Center

Downing, Steven M.; Maatsch, Jack L.

To test the effect of clinically relevant multiple-choice item content on the validity of statistical discriminations of physicians' clinical competence, data were collected from a field test of the Emergency Medicine Examination, test items for the certification of specialists in emergency medicine. Two 91-item multiple-choice subscales were…
Using the Nudge and Shove Methods to Adjust Item Difficulty Values.

PubMed

Royal, Kenneth D

2015-01-01

In any examination, it is important that a sufficient mix of items with varying degrees of difficulty be present to produce desirable psychometric properties and increase instructors' ability to make appropriate and accurate inferences about what a student knows and/or can do. The purpose of this "teaching tip" is to demonstrate how examination items can be affected by the quality of distractors, and to present a simple method for adjusting items to meet difficulty specifications.
Component Identification and Item Difficulty of Raven's Matrices Items.

ERIC Educational Resources Information Center

Green, Kathy E.; Kluever, Raymond C.

Item components that might contribute to the difficulty of items on the Raven Colored Progressive Matrices (CPM) and the Standard Progressive Matrices (SPM) were studied. Subjects providing responses to CPM items were 269 children aged 2 years 9 months to 11 years 8 months, most of whom were referred for testing as potentially gifted. A second…
Rasch Measurement and Item Banking: Theory and Practice.

ERIC Educational Resources Information Center

Nakamura, Yuji

The Rasch Model is an item response theory, one parameter model developed that states that the probability of a correct response on a test is a function of the difficulty of the item and the ability of the candidate. Item banking is useful for language testing. The Rasch Model provides estimates of item difficulties that are meaningful,…
When Listening Is Better Than Reading: Performance Gains on Cardiac Auscultation Test Questions.

PubMed

Short, Kathleen; Bucak, S Deniz; Rosenthal, Francine; Raymond, Mark R

2018-05-01

In 2007, the United States Medical Licensing Examination embedded multimedia simulations of heart sounds into multiple-choice questions. This study investigated changes in item difficulty as determined by examinee performance over time. The data reflect outcomes obtained following initial use of multimedia items from 2007 through 2012, after which an interface change occurred. A total of 233,157 examinees responded to 1,306 cardiology test items over the six-year period; 138 items included multimedia simulations of heart sounds, while 1,168 text-based items without multimedia served as controls. The authors compared changes in difficulty of multimedia items over time with changes in difficulty of text-based cardiology items over time. Further, they compared changes in item difficulty for both groups of items between graduates of Liaison Committee on Medical Education (LCME)-accredited and non-LCME-accredited (i.e., international) medical schools. Examinee performance on cardiology test items with multimedia heart sounds improved by 12.4% over the six-year period, while performance on text-based cardiology items improved by approximately 1.4%. These results were similar for graduates of LCME-accredited and non-LCME-accredited medical schools. Examinees' ability to interpret auscultation findings in test items that include multimedia presentations increased from 2007 to 2012.
Enhancing the Equating of Item Difficulty Metrics: Estimation of Reference Distribution. Research Report. ETS RR-14-07

ERIC Educational Resources Information Center

Ali, Usama S.; Walker, Michael E.

2014-01-01

Two methods are currently in use at Educational Testing Service (ETS) for equating observed item difficulty statistics. The first method involves the linear equating of item statistics in an observed sample to reference statistics on the same items. The second method, or the item response curve (IRC) method, involves the summation of conditional…
Ubiquitous testing using tablets: its impact on medical student perceptions of and engagement in learning.

PubMed

Kim, Kyong-Jee; Hwang, Jee-Young

2016-03-01

Ubiquitous testing has the potential to affect medical education by enhancing the authenticity of the assessment using multimedia items. This study explored medical students' experience with ubiquitous testing and its impact on student learning. A cohort (n=48) of third-year students at a medical school in South Korea participated in this study. The students were divided into two groups and were given different versions of 10 content-matched items: one in text version (the text group) and the other in multimedia version (the multimedia group). Multimedia items were delivered using tablets. Item response analyses were performed to compare item characteristics between the two versions. Additionally, focus group interviews were held to investigate the students' experiences of ubiquitous testing. The mean test score was significantly higher in the text group. Item difficulty and discrimination did not differ between text and multimedia items. The participants generally showed positive responses on ubiquitous testing. Still, they felt that the lectures that they had taken in preclinical years did not prepare them enough for this type of assessment and clinical encounters during clerkships were more helpful. To be better prepared, the participants felt that they needed to engage more actively in learning in clinical clerkships and have more access to multimedia learning resources. Ubiquitous testing can positively affect student learning by reinforcing the importance of being able to understand and apply knowledge in clinical contexts, which drives students to engage more actively in learning in clinical settings.

Psychometric properties of the Chinese version of resilience scale specific to cancer: an item response theory analysis.

PubMed

Ye, Zeng Jie; Liang, Mu Zi; Zhang, Hao Wei; Li, Peng Fei; Ouyang, Xue Ren; Yu, Yuan Liang; Liu, Mei Ling; Qiu, Hong Zhong

2018-06-01

Classic theory test has been used to develop and validate the 25-item Resilience Scale Specific to Cancer (RS-SC) in Chinese patients with cancer. This study was designed to provide additional information about the discriminative value of the individual items tested with an item response theory analysis. A two-parameter graded response model was performed to examine whether any of the items of the RS-SC exhibited problems with the ordering and steps of thresholds, as well as the ability of items to discriminate patients with different resilience levels using item characteristic curves. A sample of 214 Chinese patients with cancer diagnosis was analyzed. The established three-dimension structure of the RS-SC was confirmed. Several items showed problematic thresholds or discrimination ability and require further revision. Some problematic items should be refined and a short-form of RS-SC maybe feasible in clinical settings in order to reduce burden on patients. However, the generalizability of these findings warrants further investigations.
The Discriminating Power of Items that Measure More than One Dimension.

ERIC Educational Resources Information Center

Reckase, Mark D.

The work presented in this paper defined conceptually the concepts of multidimensional discrimination and information, derived mathematical expressions for the concepts for a particular multidimensional item response theory (IRT) model, and applied the concepts to actual test data. Multidimensional discrimination was defined as a function of the…
A Comparison of Three Test Formats to Assess Word Difficulty

ERIC Educational Resources Information Center

Culligan, Brent

2015-01-01

This study compared three common vocabulary test formats, the Yes/No test, the Vocabulary Knowledge Scale (VKS), and the Vocabulary Levels Test (VLT), as measures of vocabulary difficulty. Vocabulary difficulty was defined as the item difficulty estimated through Item Response Theory (IRT) analysis. Three tests were given to 165 Japanese students,…
A Differential Item Functional Analysis by Age of Perceived Interpersonal Discrimination in a Multi-racial/ethnic Sample of Adults.

PubMed

Owens, Sherry; Kristjansson, Alfgeir L; Hunte, Haslyn E R

2015-11-05

We investigated whether individual items on the nine item William's Perceived Everyday Discrimination Scale (EDS) functioned differently by age (<45 vs ≥ 45) within five racial groups in the United States: Asians (n=2,017); Hispanics (n=2,688); Black Caribbeans (n=1,377); African Americans (n=3,434); and Whites (n=854). We used data from the 2001-2003 National Survey of American Lives and the 2001-2003 National Latino and Asian Studies. Multiple-indicator, multiple-cause models (MIMIC) were used to examine differential item functioning (DIF) on the EDS by age within each racial/ethnic group. Overall, Asian and Hispanic respondents reported less discrimination than Whites; on the other hand, African Americans and Black Caribbeans reported more discrimination than Whites. Regardless of race/ethnicity, the younger respondents (aged <45 years) reported less discrimination than the older respondents (aged ≥ 45 years). In terms of age by race/ethnicity, the results were mixed for 19 out of 45 tests of DIF (40%). No differences in item function were observed among Black Caribbeans. "Being called names or insulted" and others acting as "if they are afraid" of the respondents were the only two items that did not exhibit differential item functioning by age across all racial/ethnic groups. Overall, our findings suggest that the EDS scale should be used with caution in multi-age multi-racial/ethnic samples.
The Relationship between Older Adults’ Risk for a Future Fall and Difficulty Performing Activities of Daily Living

PubMed Central

Mamikonian-Zarpas, Ani; Laganá, Luciana

2016-01-01

Functional status is often defined by cumulative scores across indices of independence in performing basic and instrumental activities of daily living (ADL/IADL), but little is known about the unique relationship of each daily activity item with the fall outcome. The purpose of this retrospective study was to examine the level of relative risk for a future fall associated with difficulty with performing various tasks of normal daily functioning among older adults who had fallen at least once in the past 12 months. The sample was comprised of community-dwelling individuals 70 years and older from the 1984–1990 Longitudinal Study of Aging by Kovar, Fitti, and Chyba (1992). Risk analysis was performed on individual items quantifying 6 ADLs and 7 IADLs, as well as 10 items related to mobility limitations. Within a subsample of 1,675 older adults with a history of at least one fall within the past year, the responses of individuals who reported multiple falls were compared to the responses of participants who had a single fall and reported 1) difficulty with walking and/or balance (FRAIL group, n = 413) vs. 2) no difficulty with walking or dizziness (NDW+ND group, n = 415). The items that had the strongest relationships and highest risk ratios for the FRAIL group (which had the highest probabilities for a future fall) included difficulty with: eating (73%); managing money (70%); biting or chewing food (66%); walking a quarter of a mile (65%); using fingers to grasp (65%); and dressing without help (65%). For the NDW+ND group, the most noteworthy items included difficulty with: bathing or showering (79%); managing money (77%); shopping for personal items (75%); walking up 10 steps without rest (72%); difficulty with walking a quarter of a mile (72%); and stooping/crouching/kneeling (70%). These findings suggest that individual items quantifying specific ADLs and IADLs have substantive relationships with the fall outcome among older adults who have difficulty with walking and balance, as well as among older individuals without dizziness or difficulty with walking. Furthermore, the examination of the relationships between items that are related to more challenging activities and the fall outcome revealed that higher functioning older adults who reported difficulty with the 6 items that yielded the highest risk ratios may also be at elevated risk for a fall. PMID:27200366
Psychometric Properties of Difficulties of Working with Patients with Personality Disorders and Attitudes Towards Patients with Personality Disorders Scales.

PubMed

Eren, Nurhan

2014-12-01

In this study, we aimed to develop two reliable and valid assessment instruments for investigating the level of difficulties mental health workers experience while working with patients with personality disorders and the attitudes they develop tt the patients. The research was carried out based on the general screening model. The study sample consisted of 332 mental health workers in several mental health clinics of Turkey, with a certain amount of experience in working with personality disorders, who were selected with a random assignment method. In order to collect data, the Personal Information Questionnaire, Difficulty of Working with Personality Disorders Scale (PD-DWS), and Attitudes Towards Patients with Personality Disorders Scale (PD-APS), which are being examined for reliability and validity, were applied. To determine construct validity, the Adjective Check List, Maslach Burnout Inventory, and State and Trait Anxiety Inventory were used. Explanatory factor analysis was used for investigating the structural validity, and Cronbach alpha, Spearman-Brown, Guttman Split-Half reliability analyses were utilized to examine the reliability. Also, item reliability and validity computations were carried out by investigating the corrected item-total correlations and discriminative indexes of the items in the scales. For the PD-DWS KMO test, the value was .946; also, a significant difference was found for the Bartlett sphericity test (p<.001). The computed test-retest coefficient reliability was .702; the Cronbach alpha value of the total test score was .952. For PD-APS KMO, the value was .925; a significant difference was found in Bartlett sphericity test (p<.001); the computed reliability coefficient based on continuity was .806; and the Cronbach alpha value of the total test score was .913. Analyses on both scales were based on total scores. It was found that PD-DWS and PD-APS have good psychometric properties, measuring the structure that is being investigated, are compatible with other scales, have high levels of internal reliability between their items, and are consistent across time. Therefore, it was concluded that both scales are valid and reliable instruments.
Discrimination and Romani health: a validation study of discrimination scales among Romani women in Macedonia and Serbia.

PubMed

Janevic, T; Gundersen, D; Stojanovski, K; Jankovic, J; Nikolic, Z; Kasapinov, B

2015-09-01

Scales used to assess discrimination in public health research have rarely been validated outside of high income countries. Our objective was to validate the Experiences of Discrimination (EOD) scale and the Everyday Discrimination Scale (EDS) among 410 Romani women in Macedonia and Serbia. Romani female interviewers conducted interviews in 2012-2013. We used a multiple indicator multiple cause approach to test a one-factor model for each scale and to assess differential item functioning (DIF) by age, wealth, country, and education. We also measured associations between the EOD and EDS with smoking in the past year and psychological distress. Three items of the EOD were conceptually irrelevant. Two items of the EDS were not conditionally independent. DIF was found by country for one item in each scale. After excluding these items, all scales exhibited good model fit and were associated with smoking (EOD beta = 0.40, 95% CI = 0.18, 0.63; EDS beta = 0.33, 95% CI = 0.12, 0.54) and psychological distress (EOD beta = 0.26, 95% CI = 0.15, 0.37; EDS beta = 0.26, 95% CI = 0.04, 0.47). Discrimination scales can be adapted for use among Romani women and are associated with both smoking and psychological distress.
Development and validation of a vision-specific quality-of-life questionnaire for Timor-Leste.

PubMed

du Toit, Rènée; Palagyi, Anna; Ramke, Jacqueline; Brian, Garry; Lamoureux, Ecosse L

2008-10-01

To develop and determine the reliability and validity of a vision-specific quality-of-life instrument (TL-VSQOL) designed to assess the impact of distance and near vision impairment in adults living in Timor-Leste. A vision-specific quality-of-life questionnaire was developed, piloted, and administered to 704 Timorese aged >or=40 years during a population-based eye health rapid assessment. Rasch analysis was performed on the data of 457 participants with presenting near vision worse than N8 (78.5%) and/or distance vision worse than 6/18 (69.8%). Unidimensionality, item fit to the model, response category performance, differential item functioning, and targeting of items to participants were assessed. Initially, the questionnaire lacked fit to the Rasch model. Removal of two items concerning emotional well-being resulted in a fit of the data (overall item-trait interaction: chi(2) (df) = 81 (51); mean (SD) person and item fit residual values: -0.30 (1.02) and -0.32 (1.46), and good targeting of person ability and item difficulty was evident. Poorer distance and near visual acuities were significantly associated with worse quality-of-life scores (P < 0.001). Person separation reliability was substantial (0.93), indicating that the instrument can discriminate between groups with normal and impaired vision. All 17 items were free of differential item functioning, and there was no evidence of multidimensionality. This 17-item TL-VSQOL has high reliability, construct, and criterion validity and effective targeting. It can effectively assess the impact on quality of life of adult Timorese with distance and near vision impairment. The TL-VSQOL could be adapted for use in other low-resource settings.
Beyond Stigmatization of Children with Difficulties in Learning

ERIC Educational Resources Information Center

Hido, Margarita; Shehu, Irena

2010-01-01

In the Albanian schools settings does not exist religious discrimination, neither gender discrimination, but there exists a discrimination, as unfair against children called "difficulty". The children who drop out of school are by far less numerous compared with those who start school, but who are not properly treated, so that they can…
Comparing the Personality Disorder Interview for DSM-IV (PDI-IV) and SCID-II borderline personality disorder scales: an item-response theory analysis.

PubMed

Huprich, Steven K; Paggeot, Amy V; Samuel, Douglas B

2015-01-01

One-hundred sixty-nine psychiatric outpatients and 171 undergraduate students were assessed with the Personality Disorder Interview-IV (PDI-IV; Widiger, Mangine, Corbitt, Ellis, & Thomas, 1995) and the Structured Clinical Interview for DSM-IV Axis II disorders (SCID-II; First, Gibbon, Spitzer, Williams, & Benjamin, 1997) for borderline personality disorder (BPD). Eighty individuals met PDI-IV BPD criteria, whereas 34 met SCID-II BPD criteria. Dimensional ratings of both measures were highly intercorrelated (rs = .78, .75), and item-level interrater reliability fell in the good to excellent range. An item-response theory analysis was performed to investigate whether properties of the items from each interview could help understand these differences. The limited agreement seemed to be explained by differences in the response options across the two interviews. We found that suicidal behavior was among the most discriminating criteria on both instruments, whereas dissociation and difficulty controlling anger had the 2 lowest alpha parameter values. Finally, those meeting BPD criteria on both interviews had higher levels of anxiety, depression, and more impairments in object relations than those meeting criteria on just the PDI-IV. These findings suggest that the choice of measure has a notable effect on the obtained diagnostic prevalence and the level of BPD severity that is detected.
An enhanced functional ability questionnaire (faVIQ) to measure the impact of rehabilitation services on the visually impaired

PubMed Central

Wolffsohn, James Stuart; Jackson, Jonathan; Hunt, Olivia Anne; Cottriall, Charles; Lindsay, Jennifer; Gilmour, Richard; Sinclair, Anne; Harper, Robert

2014-01-01

AIM To develop a short, enhanced functional ability Quality of Vision (faVIQ) instrument based on previous questionnaires employing comprehensive modern statistical techniques to ensure the use of an appropriate response scale, items and scoring of the visual related difficulties experienced by patients with visual impairment. METHODS Items in current quality-of-life questionnaires for the visually impaired were refined by a multi-professional group and visually impaired focus groups. The resulting 76 items were completed by 293 visually impaired patients with stable vision on two occasions separated by a month. The faVIQ scores of 75 patients with no ocular pathology were compared to 75 age and gender matched patients with visual impairment. RESULTS Rasch analysis reduced the faVIQ items to 27. Correlation to standard visual metrics was moderate (r=0.32-0.46) and to the NEI-VFQ was 0.48. The faVIQ was able to clearly discriminate between age and gender matched populations with no ocular pathology and visual impairment with an index of 0.983 and 95% sensitivity and 95% specificity using a cut off of 29. CONCLUSION The faVIQ allows sensitive assessment of quality-of-life in the visually impaired and should support studies which evaluate the effectiveness of low vision rehabilitation services. PMID:24634868
The development of a knowledge test of depression and its treatment for patients suffering from non-psychotic depression: a psychometric assessment

PubMed Central

Gabriel, Adel; Violato, Claudio

2009-01-01

Background To develop and psychometrically assess a multiple choice question (MCQ) instrument to test knowledge of depression and its treatments in patients suffering from depression. Methods A total of 63 depressed patients and twelve psychiatric experts participated. Based on empirical evidence from an extensive review, theoretical knowledge and in consultations with experts, 27-item MCQ knowledge of depression and its treatment test was constructed. Data collected from the psychiatry experts were used to assess evidence of content validity for the instrument. Results Cronbach's alpha of the instrument was 0.68, and there was an overall 87.8% agreement (items are highly relevant) between experts about the relevance of the MCQs to test patient knowledge on depression and its treatments. There was an overall satisfactory patients' performance on the MCQs with 78.7% correct answers. Results of an item analysis indicated that most items had adequate difficulties and discriminations. Conclusion There was adequate reliability and evidence for content and convergent validity for the instrument. Future research should employ a lager and more heterogeneous sample from both psychiatrist and community samples, than did the present study. Meanwhile, the present study has resulted in psychometrically tested instruments for measuring knowledge of depression and its treatment of depressed patients. PMID:19754944
ABILHAND-Kids: a measure of manual ability in children with cerebral palsy.

PubMed

Arnould, Carlyne; Penta, Massimo; Renders, Anne; Thonnard, Jean-Louis

2004-09-28

To develop a clinical tool for measuring manual ability (ABILHAND-Kids) in children with cerebral palsy (CP) using the Rasch measurement model. The authors developed a 74-item questionnaire based on existing scales and experts' advice. The questionnaire was submitted to 113 children with CP (59% boys; mean age, 10 years) without major intellectual deficits (IQ > 60) and to their parents, and resubmitted to both groups after 1 month. The children's and parents' responses were analyzed separately with the WINSTEPS Rasch software to select items presenting an ordered rating scale, sharing the same discrimination, and fitting a unidimensional scale. The final ABILHAND-Kids scale consisted of 21 mostly bimanual items rated by the parents. The parents reported a finer perception of their children's ability than the children themselves, leading to a wider range of measurement, a higher reliability (R = 0.94), and a good reproducibility over time (R = 0.91). The item difficulty hierarchy was consistent between the parents and the experts. The ABILHAND-kids measures are significantly related to school education, type of CP, and gross motor function. ABILHAND-Kids is a functional scale specifically developed to measure manual ability in children with CP providing guidelines for goal setting in treatment planning. Its range and measurement precision are appropriate for clinical practice.
An enhanced functional ability questionnaire (faVIQ) to measure the impact of rehabilitation services on the visually impaired.

PubMed

Wolffsohn, James Stuart; Jackson, Jonathan; Hunt, Olivia Anne; Cottriall, Charles; Lindsay, Jennifer; Gilmour, Richard; Sinclair, Anne; Harper, Robert

2014-01-01

To develop a short, enhanced functional ability Quality of Vision (faVIQ) instrument based on previous questionnaires employing comprehensive modern statistical techniques to ensure the use of an appropriate response scale, items and scoring of the visual related difficulties experienced by patients with visual impairment. Items in current quality-of-life questionnaires for the visually impaired were refined by a multi-professional group and visually impaired focus groups. The resulting 76 items were completed by 293 visually impaired patients with stable vision on two occasions separated by a month. The faVIQ scores of 75 patients with no ocular pathology were compared to 75 age and gender matched patients with visual impairment. Rasch analysis reduced the faVIQ items to 27. Correlation to standard visual metrics was moderate (r=0.32-0.46) and to the NEI-VFQ was 0.48. The faVIQ was able to clearly discriminate between age and gender matched populations with no ocular pathology and visual impairment with an index of 0.983 and 95% sensitivity and 95% specificity using a cut off of 29. The faVIQ allows sensitive assessment of quality-of-life in the visually impaired and should support studies which evaluate the effectiveness of low vision rehabilitation services.
Fostering a student's skill for analyzing test items through an authentic task

NASA Astrophysics Data System (ADS)

Setiawan, Beni; Sabtiawan, Wahyu Budi

2017-08-01

Analyzing test items is a skill that must be mastered by prospective teachers, in order to determine the quality of test questions which have been written. The main aim of this research was to describe the effectiveness of authentic task to foster the student's skill for analyzing test items involving validity, reliability, item discrimination index, level of difficulty, and distractor functioning through the authentic task. The participant of the research is students of science education study program, science and mathematics faculty, Universitas Negeri Surabaya, enrolled for assessment course. The research design was a one-group posttest design. The treatment in this study is that the students were provided an authentic task facilitating the students to develop test items, then they analyze the items like a professional assessor using Microsoft Excel and Anates Software. The data of research obtained were analyzed descriptively, such as the analysis was presented by displaying the data of students' skill, then they were associated with theories or previous empirical studies. The research showed the task facilitated the students to have the skills. Thirty-one students got a perfect score for the analyzing, five students achieved 97% mastery, two students had 92% mastery, and another two students got 89% and 79% of mastery. The implication of the finding was the students who get authentic tasks forcing them to perform like a professional, the possibility of the students for achieving the professional skills will be higher at the end of learning.
The Contribution of Prospective Memory Performance to the Neuropsychological Assessment of Mild Cognitive Impairment.

PubMed

Lee, Stephen; Ong, Ben; Pike, Kerryn E; Mullaly, Elizabeth; Rand, Elizabeth; Storey, Elsdon; Ames, David; Saling, Michael; Clare, Linda; Kinsella, Glynda J

2016-01-01

Prospective memory difficulties are a feature of the amnestic form of mild cognitive impairment (aMCI). Although comprehensive test batteries of prospective memory are suitable for clinical practice, they are lengthy, which has detracted from their widespread clinical use. Our aim was to investigate the utility of a brief screening measure of prospective memory, which can be incorporated into a clinical neuropsychological assessment. Seventy-seven healthy older adults (HOA) and 77 participants with aMCI were administered a neuropsychological test battery, including a prospective memory screening measure (Envelope Task), a retrospective memory measure (CVLT-II), and a multi-item subjective memory questionnaire (Prospective and Retrospective Memory Questionnaire; PRMQ) and a single-item subjective memory scale. Compared with HOA participants, participants with aMCI performed poorly on the Envelope Task (η(2) = .38), which provided good discrimination of the aMCI and HOA groups (AUC = .83). In the aMCI group, there was a small but significant relationship between the Envelope Task and the single-item subjective rating of memory, with the Envelope Task accounting for 5-6% of the variance in subjective memory after accounting for emotional status. This relationship of prospective memory and subjective memory was not significant for the multi-item questionnaire (PRMQ); and, retrospective memory was not a significant predictor of self-rated memory, single-item, or multi-item. A brief screening measure of prospective memory, the Envelope Task, provides useful support to traditional memory measures in detecting aMCI.
Validity and reliability of portfolio assessment of competency in a baccalaureate dental hygiene program

NASA Astrophysics Data System (ADS)

Gadbury-Amyot, Cynthia C.

This study examined validity and reliability of portfolio assessment using Messick's (1996, 1995) unified framework of construct validity. Theoretical and empirical evidence was sought for six aspects of construct validity. The sample included twenty student portfolios. Each portfolio were evaluated by seven faculty raters using a primary trait analysis scoring rubric. There was a significant relationship (r = .81--.95; p < .01) between the seven subscales in the scoring rubric demonstrating measurement of a common construct. Item analysis was conducted to examine convergent and discriminant empirical relationships of the 35 items in the scoring rubric. There was a significant relationship between all items ( p < .01), and all but one item was more strongly correlated with its own subscale than with other subscales. However, correlations of items across subscales were predominantly moderate in strength indicating that items did not strongly discriminate between subscales. A fully crossed, two facet generalizability (G) study design was used to examine reliability. Analysis of variance demonstrated that the greatest source of variance was the scoring rubric itself, accounting for 78% of the total variance. The smallest source of variance was the interaction between portfolio and rubric (1.15%) indicating that while the seven subscales varied in difficulty level, the relative standing of individual portfolios was maintained across subscales. Faculty rater variance accounted for only 1.28% of total variance. A phi coefficient of .86, analogous to a reliability coefficient in classical test theory, was obtained in the Decision study by increasing the subscales to fourteen and decreasing faculty raters to three. There was a significant relationship between portfolios and grade point average (r = .70; p < .01), and the National Dental Hygiene Board Examination (r = .60; p < .01). The relationship between portfolios and the Central Regional Dental Testing Service examination was both weak and nonsignificant (r = .19; p > .05). An open-ended survey was used to elicit student feedback on portfolio development. A majority of the students (76%) perceived value in the development of programmatic portfolios. In conclusion, the pattern of findings from this study suggest that portfolios can serve as a valid and reliable measure for assessing student competency.
Screening for Moral Injury: The Moral Injury Symptom Scale - Military Version Short Form.

PubMed

Koenig, Harold G; Ames, Donna; Youssef, Nagy A; Oliver, John P; Volk, Fred; Teng, Ellen J; Haynes, Kerry; Erickson, Zachary D; Arnold, Irina; O'Garo, Keisha; Pearce, Michelle

2018-03-26

To develop a short form (SF) of the 45-item multidimensional Moral Injury Symptom Scale - Military Version (MISS-M) to use when screening for moral injury and monitoring treatment response in veterans and active duty military with PTSD. A total of 427 veterans and active duty military with PTSD symptoms were recruited from VA Medical Centers in Augusta, GA; Los Angeles, CA; Durham, NC; Houston, TX; and San Antonio, TX; and from Liberty University, Lynchburg, Virginia. The sample was randomly split in two. In the first half (n = 214), exploratory factor analysis identified the highest loading item on each of the 10 MISS scales (guilt, shame, moral concerns, loss of meaning, difficulty forgiving, loss of trust, self-condemnation, religious struggle, and loss of religious faith) to form the 10-item MISS-M-SF; confirmatory factor analysis was then performed to replicate results in the second half of the sample (n = 213). Internal reliability, test-retest reliability, and convergent, discriminant, and concurrent validity were examined in the overall sample. The study was approved by the institutional review boards and the Research & Development (R&D) Committees at Veterans Administration medical centers in Durham, Los Angeles, Augusta, Houston, and San Antonio, and the Liberty University and Duke University Medical Center institutional review boards. The 10-item MISS-M-SF had a median of 50 and a range of 12-91 (possible range 10-100). Over 70% scored a 9 or 10 (highest possible) on at least one item. Cronbach's alpha was 0.73 (95% CI 0.69-0.76), and test-retest reliability was 0.87 (95% CI 0.79-0.92). Convergent validity with the 45-item MISS-M was r = 0.92. Discriminant validity was demonstrated by relatively weak correlations with social, religious, and physical health constructs (r = 0.21-0.35), and concurrent validity was indicated by strong correlations with PTSD, depression, and anxiety symptoms (r = 0.54-0.58). The MISS-M-SF is a reliable and valid measure of MI symptoms that can be used to screen for MI and monitor response to treatment in veterans and active duty military with PTSD.
Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination.

PubMed

Koh, Bongyeun; Hong, Sunggi; Kim, Soon-Sim; Hyun, Jin-Sook; Baek, Milye; Moon, Jundong; Kwon, Hayran; Kim, Gyoungyong; Min, Seonggi; Kang, Gu-Hyun

2016-01-01

The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.
A comparison of top and middle level women administrators in social work, nursing, and education: career supports and barriers.

PubMed

Collins, S K

1984-01-01

Presented in this article are findings from a descriptive study of a national stratified random sample of 259 women administrators in social work, nursing, and education. Subjects responded to a 54 item, mailed questionnaire about their career experiences. The author explores career support from parents and persons within their work organizations. Barriers of sex discrimination and difficulties balancing career and family responsibilities are included. Finally, the extent to which the women administrators provided career assistance to other women is discussed. Comparisons between the three fields and between top and middle management levels are emphasized. Recommendations are made for increasing women's opportunities for career achievement within the fields studied.

High time for a change: psychometric analysis of multiple-choice questions in nursing.

PubMed

Redmond, Sandra P; Hartigan-Rogers, Jackie A; Cobbett, Shelley

2012-11-26

Nurse educators teach students to develop an informed nursing practice but can educators claim the same grounding in the available evidence when formulating multiple-choice assessment tools to evaluate student learning? Multiple-choice questions are a popular assessment format within nursing education. While widely accepted as a credible format to assess student knowledge across disciplines, debate exists among educators regarding the number of options necessary to adequately test cognitive reasoning and optimal discrimination between student abilities. The purpose of this quasi-experimental between groups study was to examine the psychometric properties of three option multiple-choice questions when compared to the more traditional four option questions. Data analysis revealed that there were no statistically significant differences in the item discrimination, difficulty or the mean examination scores when multiple-choice test questions were administered with three versus four option answer choices. This study provides additional guidance for nurse educators to assist in improving multiple-choice question writing and test design.
Psychometric properties of the Chinese version of the Menopause-Specific Quality-of-Life questionnaire.

PubMed

Nie, Guangning; Yang, Hongyan; Liu, Jian; Zhao, ChunMei; Wang, Xiaoyun

2017-05-01

The Menopause-Specific Quality-of-Life (MENQOL) questionnaire was developed as a specific tool to measure the health-related quality-of-life of postmenopausal women. Thus far, the Chinese version questionnaire has not been subjected to psychometric assessment with a large sample. This study aims to evaluate the validity and reliability of the Chinese version of the MENQOL specific to postmenopausal women in China. A total of 1,137 menopausal symptomatic and 491 menopausal asymptomatic women from eight cities in China were recruited using a convenience sampling method. Psychometric properties were evaluated by descriptive statistics, validity, and reliability. Reliability was assessed for each subscale of the MENQOL through internal consistency reliability with Cronbach's α and intersubscale correlations. Item-domain correlations, principal components analysis (PCA), and confirmatory factor analysis were performed to determine construct validity. t tests were used to compare the differences between the menopausal symptomatic and asymptomatic women and to evaluate the discriminate validity. Pearson correlation coefficients were calculated between MENQOL scores and the Kupperman index to assess criterion-related validity. The most common symptoms in Chinese menopausal symptomatic women were "experiencing poor memory" (94.4%), "feeling tired or worn out" (93.8%), "aching in muscle and joints" (89.4%), "low backache" (86.9%), "decrease in physical strength" (86.6%), "aches in back of neck or head" (86.2%), "difficulty sleeping" (83.6%), "accomplishing less than I used to" (83.4%), "feeling a lack of energy" (83.3%), "change in your sexual desire" (81%), and "hot flash" (80.7%) among others. The symptoms of "increased facial hair" were rarely seen (9.9%). The vasomotor domain, as well as psychosocial, physical, and sexual domains showed high reliability (Cronbach's α 0.84, 0.87, 0.89, and 0.86, respectively). Item-domain correlation analysis showed that all items correlated more strongly with their own domains than with other domains. In the PCA, after deleting the "increased facial hair" item, items in the vasomotor, sexual, and psychosocial subscales loaded on their respective domains by and large, and items in the physical subscale divided into two factors. The PCA revealed a latent structure of the Chinese version of MENQOL nearly identical to the original MENQOL domains. The confirmatory factor analysis demonstrated that the questionnaire fits well with a four-domain model. The MENQOL can discriminate between menopausal symptomatic women with asymptomatic women as it showed good discriminate validity. Criterion-related validity was confirmed by a significant correlation between MENQOL scores and the Kupperman index. This study showed that Chinese version of MENQOL has good psychometric properties and would be suitable to measure the health-related quality-of-life of Chinese menopausal women except for item 21 (increased facial hair).
Ubiquitous testing using tablets: its impact on medical student perceptions of and engagement in learning

PubMed Central

Kim, Kyong-Jee; Hwang, Jee-Young

2016-01-01

Purpose: Ubiquitous testing has the potential to affect medical education by enhancing the authenticity of the assessment using multimedia items. This study explored medical students’ experience with ubiquitous testing and its impact on student learning. Methods: A cohort (n=48) of third-year students at a medical school in South Korea participated in this study. The students were divided into two groups and were given different versions of 10 content-matched items: one in text version (the text group) and the other in multimedia version (the multimedia group). Multimedia items were delivered using tablets. Item response analyses were performed to compare item characteristics between the two versions. Additionally, focus group interviews were held to investigate the students’ experiences of ubiquitous testing. Results: The mean test score was significantly higher in the text group. Item difficulty and discrimination did not differ between text and multimedia items. The participants generally showed positive responses on ubiquitous testing. Still, they felt that the lectures that they had taken in preclinical years did not prepare them enough for this type of assessment and clinical encounters during clerkships were more helpful. To be better prepared, the participants felt that they needed to engage more actively in learning in clinical clerkships and have more access to multimedia learning resources. Conclusion: Ubiquitous testing can positively affect student learning by reinforcing the importance of being able to understand and apply knowledge in clinical contexts, which drives students to engage more actively in learning in clinical settings. PMID:26838569
Development and psychometric evaluation of a cardiovascular risk and disease management knowledge assessment tool.

PubMed

Rosneck, James S; Hughes, Joel; Gunstad, John; Josephson, Richard; Noe, Donald A; Waechter, Donna

2014-01-01

This article describes the systematic construction and psychometric analysis of a knowledge assessment instrument for phase II cardiac rehabilitation (CR) patients measuring risk modification disease management knowledge and behavioral outcomes derived from national standards relevant to secondary prevention and management of cardiovascular disease. First, using adult curriculum based on disease-specific learning outcomes and competencies, a systematic test item development process was completed by clinical staff. Second, a panel of educational and clinical experts used an iterative process to identify test content domain and arrive at consensus in selecting items meeting criteria. Third, the resulting 31-question instrument, the Cardiac Knowledge Assessment Tool (CKAT), was piloted in CR patients to ensure use of application. Validity and reliability analyses were performed on 3638 adults before test administrations with additional focused analyses on 1999 individuals completing both pretreatment and posttreatment administrations within 6 months. Evidence of CKAT content validity was substantiated, with 85% agreement among content experts. Evidence of construct validity was demonstrated via factor analysis identifying key underlying factors. Estimates of internal consistency, for example, Cronbach's α = .852 and Spearman-Brown split-half reliability = 0.817 on pretesting, support test reliability. Item analysis, using point biserial correlation, measured relationships between performance on single items and total score (P < .01). Analyses using item difficulty and item discrimination indices further verified item stability and validity of the CKAT. A knowledge instrument specifically designed for an adult CR population was systematically developed and tested in a large representative patient population, satisfying psychometric parameters, including validity and reliability.
Validation of the Erlangen Test of Activities of Daily Living in Persons with Mild Dementia or Mild Cognitive Impairment (ETAM).

PubMed

Luttenberger, Katharina; Reppermund, Simone; Schmiedeberg-Sohn, Anke; Book, Stephanie; Graessel, Elmar

2016-05-26

There are currently no valid, fast, and easy-to-administer performance tests that are designed to assess the capacities to perform activities of daily living in persons with mild dementia and mild cognitive impairment (MCI). However, such measures are urgently needed for determining individual support needs as well as the efficacy of interventions. The aim of the present study was therefore to validate the Erlangen Test of Activities of Daily Living in Persons with Mild Dementia and Mild Cognitive Impairment (ETAM), a performance test that is based on the International Classification of Functioning and Health (ICF), which assesses the relevant domains of living in older adults with MCI and mild dementia who live independently. The 10 ICF-based items on the research version of the ETAM were tested in a final sample of 81 persons with MCI or mild dementia. The items were selected for the final version in accordance with 6 criteria: 1) all domains must be represented and have equal weight, 2) all items must load on the same factor, 3) item difficulties and item discriminatory powers, 4) convergent validity (Bayer Activities of Daily Living Scale [B-ADL]) and discriminant validity (Mini Mental State Examination [MMSE], Geriatric Depression Scale 15 [GDS-15]), 5) inter-rater reliabilities of the individual items, 6) as little material as possible. Retest reliability was also examined. Cohen's ds were calculated to determine the magnitudes of the differences in ETAM scores between participants diagnosed with different grades of severity of cognitive impairment. The final version of the ETAM consists of 6 items that cover the five ICF domains communication, mobility, self-care, domestic life (assessed by two 3-point items), and major life areas (specifically, the economic life sub-category) and load on a single factor. The maximum achievable score is 30 points (6 points per domain). The average administration time was 35 min, 19 of which were needed for pure item performance. The internal consistency was α = .71. The three-week test-retest reliability was r = .78, and the inter-rater reliability was r = .97. The ETAM also provided satisfactory discrimination between healthy individuals and persons with MCI or mild dementia as well as between persons with mild and moderate dementia. The 6-item final version of the ETAM shows satisfactory psychometric characteristics and can be administered quickly. It is therefore suitable for use in both clinical practice and research.
Comparison of university students' understanding of graphs in different contexts

NASA Astrophysics Data System (ADS)

Planinic, Maja; Ivanjek, Lana; Susac, Ana; Milin-Sipus, Zeljka

2013-12-01

This study investigates university students’ understanding of graphs in three different domains: mathematics, physics (kinematics), and contexts other than physics. Eight sets of parallel mathematics, physics, and other context questions about graphs were developed. A test consisting of these eight sets of questions (24 questions in all) was administered to 385 first year students at University of Zagreb who were either prospective physics or mathematics teachers or prospective physicists or mathematicians. Rasch analysis of data was conducted and linear measures for item difficulties were obtained. Average difficulties of items in three domains (mathematics, physics, and other contexts) and over two concepts (graph slope, area under the graph) were computed and compared. Analysis suggests that the variation of average difficulty among the three domains is much smaller for the concept of graph slope than for the concept of area under the graph. Most of the slope items are very close in difficulty, suggesting that students who have developed sufficient understanding of graph slope in mathematics are generally able to transfer it almost equally successfully to other contexts. A large difference was found between the difficulty of the concept of area under the graph in physics and other contexts on one side and mathematics on the other side. Comparison of average difficulty of the three domains suggests that mathematics without context is the easiest domain for students. Adding either physics or other context to mathematical items generally seems to increase item difficulty. No significant difference was found between the average item difficulty in physics and contexts other than physics, suggesting that physics (kinematics) remains a difficult context for most students despite the received instruction on kinematics in high school.
Applicability of the Newtonian gravity concept inventory to introductory college physics classes

NASA Astrophysics Data System (ADS)

Williamson, Kathryn; Prather, Edward E.; Willoughby, Shannon

2016-06-01

The study described here extends the applicability of the Newtonian Gravity Concept Inventory (NGCI) to college algebra-based physics classes, beyond the general education astronomy courses for which it was originally developed. The four conceptual domains probed by the NGCI (Directionality, Force Law, Independence of Other Forces, and Threshold) are well suited for investigating students' reasoning about gravity in both populations, making the NGCI a highly versatile instrument. Classical test theory statistical analysis with physics student responses pre-instruction (N = 1,392) and post-instruction (N = 929) from eight colleges and universities across the United States indicate that the NGCI is composed of items with appropriate difficulty and discrimination and is reliable for this population. Also, expert review and student interviews support the NGCI's validity for the physics population. Emergent similarities and differences in how physics students reason about gravity compared to astronomy students are discussed, as well as future directions for analyzing the instrument's item parameters across both populations.
Applying the Rule Space Model to Develop a Learning Progression for Thermochemistry

NASA Astrophysics Data System (ADS)

Chen, Fu; Zhang, Shanshan; Guo, Yanfang; Xin, Tao

2017-12-01

We used the Rule Space Model, a cognitive diagnostic model, to measure the learning progression for thermochemistry for senior high school students. We extracted five attributes and proposed their hierarchical relationships to model the construct of thermochemistry at four levels using a hypothesized learning progression. For this study, we developed 24 test items addressing the attributes of exothermic and endothermic reactions, chemical bonds and heat quantity change, reaction heat and enthalpy, thermochemical equations, and Hess's law. The test was administered to a sample base of 694 senior high school students taught in 3 schools across 2 cities. Results based on the Rule Space Model analysis indicated that (1) the test items developed by the Rule Space Model were of high psychometric quality for good analysis of difficulties, discriminations, reliabilities, and validities; (2) the Rule Space Model analysis classified the students into seven different attribute mastery patterns; and (3) the initial hypothesized learning progression was modified by the attribute mastery patterns and the learning paths to be more precise and detailed.
Item Discrimination and Type I Error in the Detection of Differential Item Functioning

ERIC Educational Resources Information Center

Li, Yanju; Brooks, Gordon P.; Johanson, George A.

2012-01-01

In 2009, DeMars stated that when impact exists there will be Type I error inflation, especially with larger sample sizes and larger discrimination parameters for items. One purpose of this study is to present the patterns of Type I error rates using Mantel-Haenszel (MH) and logistic regression (LR) procedures when the mean ability between the…
Retrieval monitoring and anosognosia in Alzheimer's disease.

PubMed

Gallo, David A; Chen, Jennifer M; Wiseman, Amy L; Schacter, Daniel L; Budson, Andrew E

2007-09-01

This study explored the relationship between episodic memory and anosognosia (a lack of deficit awareness) among patients with mild Alzheimer's disease (AD). Participants studied words and pictures for subsequent memory tests. Healthy older adults made fewer false recognition errors when trying to remember pictures compared with words, suggesting that the perceptual distinctiveness of picture memories enhanced retrieval monitoring (the distinctiveness heuristic). In contrast, although participants with AD could discriminate between studied and nonstudied items, they had difficulty recollecting the specific presentation formats (words or pictures), and they had limited use of the distinctiveness heuristic. Critically, the demands of the memory test modulated the relationship between memory accuracy and anosognosia. Greater anosognosia was associated with impaired memory accuracy when participants with AD tried to remember words but not when they tried to remember pictures. These data further delineate the retrieval monitoring difficulties among individuals with AD and suggest that anosognosia measures are most likely to correlate with memory tests that require the effortful retrieval of nondistinctive information. (PsycINFO Database Record (c) 2007 APA, all rights reserved).
Validation of the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch Analysis.

PubMed

Garzón Umerenkova, Angélica; de la Fuente Arias, Jesús; Martínez-Vicente, José Manuel; Zapata Sevillano, Lucía; Pichardo, Mari Carmen; García-Berbén, Ana Belén

2017-01-01

Background: The aim of the study was to psychometrically characterize the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch analysis. Materials and Methods: 831 Spaniard university students (262 men), between 17 and 39 years of age and ranging from the first to the 5th year of studies, completed the SSSRQ questionnaire. Confirmatory factor analysis (CFA) was carried out in order to establish structural adequacy. Afterward, by means of the Rasch model, a study of each sub scale was conducted to test for dimensionality, fit of the sample questions, functionality of the response categories, reliability and estimation of Differential Item Functioning by gender and course. Results: The four sub-scales comply with the unidimensionality criteria, the questions are in line with the model, the response categories operate properly and the reliability of the sample is acceptable. Nonetheless, the test could benefit from the inclusion of additional items of both high and low difficulty in order to increase construct validity, discrimination and reliability for the respondents. Several items with differences in gender and course were also identified. Discussion: The results evidence the need and adequacy of this complementary psychometric analysis strategy, in relation to the CFA to enhance the instrument.
Validation of the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch Analysis

PubMed Central

Garzón Umerenkova, Angélica; de la Fuente Arias, Jesús; Martínez-Vicente, José Manuel; Zapata Sevillano, Lucía; Pichardo, Mari Carmen; García-Berbén, Ana Belén

2017-01-01

Background: The aim of the study was to psychometrically characterize the Spanish Short Self-Regulation Questionnaire (SSSRQ) through Rasch analysis. Materials and Methods: 831 Spaniard university students (262 men), between 17 and 39 years of age and ranging from the first to the 5th year of studies, completed the SSSRQ questionnaire. Confirmatory factor analysis (CFA) was carried out in order to establish structural adequacy. Afterward, by means of the Rasch model, a study of each sub scale was conducted to test for dimensionality, fit of the sample questions, functionality of the response categories, reliability and estimation of Differential Item Functioning by gender and course. Results: The four sub-scales comply with the unidimensionality criteria, the questions are in line with the model, the response categories operate properly and the reliability of the sample is acceptable. Nonetheless, the test could benefit from the inclusion of additional items of both high and low difficulty in order to increase construct validity, discrimination and reliability for the respondents. Several items with differences in gender and course were also identified. Discussion: The results evidence the need and adequacy of this complementary psychometric analysis strategy, in relation to the CFA to enhance the instrument. PMID:28298898
Anger and postcombat mental health: validation of a brief anger measure with U.S. soldiers postdeployed from Iraq and Afghanistan.

PubMed

Novaco, Raymond W; Swanson, Rob D; Gonzalez, Oscar I; Gahm, Gregory A; Reger, Mark D

2012-09-01

The involvement of anger in the psychological adjustment of current war veterans, particularly in conjunction with combat-related posttraumatic stress disorder (PTSD), warrants greater research focus than it has received. The present study concerns a brief anger measure, Dimensions of Anger Reactions (DAR), intended for use in large sample studies and as a screening tool. The concurrent validity, discriminant validity, and incremental validity of the instrument were examined in conjunction with behavioral health data for 3,528 treatment-seeking soldiers who had been in combat in Iraq and Afghanistan. Criterion indices included multiple self-rated measures of psychological distress (including PTSD, depression, and anxiety), functional difficulties (relationships, daily activities, work problems, and substance use), and violence risk. Concurrent validity was established by strong correlations with single anger items on 4 other scales, and discriminant validity was found against anxiety and depression measures. Pertinent to the construct of anger, the DAR was significantly associated with psychosocial functional difficulties and with several indices of harm to self and to others. Hierarchical regression performed on a self/others harm index found incremental validity for the DAR, controlling for age, education, military component, officer rank, combat exposure, PTSD, and depression. The ability to efficiently assess anger in at-risk military populations can provide an indicator of many undesirable behavioral health outcomes. PsycINFO Database Record (c) 2012 APA, all rights reserved.
Validity of Computer Adaptive Tests of Daily Routines for Youth with Spinal Cord Injury

PubMed Central

Haley, Stephen M.

2013-01-01

Objective: To evaluate the accuracy of computer adaptive tests (CATs) of daily routines for child- and parent-reported outcomes following pediatric spinal cord injury (SCI) and to evaluate the validity of the scales. Methods: One hundred ninety-six daily routine items were administered to 381 youths and 322 parents. Pearson correlations, intraclass correlation coefficients (ICC), and 95% confidence intervals (CI) were calculated to evaluate the accuracy of simulated 5-item, 10-item, and 15-item CATs against the full-item banks and to evaluate concurrent validity. Independent samples t tests and analysis of variance were used to evaluate the ability of the daily routine scales to discriminate between children with tetraplegia and paraplegia and among 5 motor groups. Results: ICC and 95% CI demonstrated that simulated 5-, 10-, and 15-item CATs accurately represented the full-item banks for both child- and parent-report scales. The daily routine scales demonstrated discriminative validity, except between 2 motor groups of children with paraplegia. Concurrent validity of the daily routine scales was demonstrated through significant relationships with the FIM scores. Conclusion: Child- and parent-reported outcomes of daily routines can be obtained using CATs with the same relative precision of a full-item bank. Five-item, 10-item, and 15-item CATs have discriminative and concurrent validity. PMID:23671380
A Study of Inference in Standardized Reading Test Items and Its Relationship to Difficulty.

ERIC Educational Resources Information Center

Marzano, Robert J.

To study the relationship between inferences made on standardized reading tests and item difficulty, 50 items on the reading comprehension section of the Metropolitan Achievement Test were analyzed independently in this study by two raters using four general categories of inferences: (1) reference inferences, (2) between proposition inferences,…
An opportunity in difficulty: Japan-Korea-Taiwan expert Delphi consensus on surgical difficulty during laparoscopic cholecystectomy.

PubMed

Iwashita, Yukio; Hibi, Taizo; Ohyama, Tetsuji; Honda, Goro; Yoshida, Masahiro; Miura, Fumihiko; Takada, Tadahiro; Han, Ho-Seong; Hwang, Tsann-Long; Shinya, Satoshi; Suzuki, Kenji; Umezawa, Akiko; Yoon, Yoo-Seok; Choi, In-Seok; Huang, Wayne Shih-Wei; Chen, Kuo-Hsin; Watanabe, Manabu; Abe, Yuta; Misawa, Takeyuki; Nagakawa, Yuichi; Yoon, Dong-Sup; Jang, Jin-Young; Yu, Hee Chul; Ahn, Keun Soo; Kim, Song Cheol; Song, In Sang; Kim, Ji Hoon; Yun, Sung Su; Choi, Seong Ho; Jan, Yi-Yin; Shan, Yan-Shen; Ker, Chen-Guo; Chan, De-Chuan; Wu, Cheng-Chung; Lee, King-Teh; Toyota, Naoyuki; Higuchi, Ryota; Nakamura, Yoshiharu; Mizuguchi, Yoshiaki; Takeda, Yutaka; Ito, Masahiro; Norimizu, Shinji; Yamada, Shigetoshi; Matsumura, Naoki; Shindoh, Junichi; Sunagawa, Hiroki; Gocho, Takeshi; Hasegawa, Hiroshi; Rikiyama, Toshiki; Sata, Naohiro; Kano, Nobuyasu; Kitano, Seigo; Tokumura, Hiromi; Yamashita, Yuichi; Watanabe, Goro; Nakagawa, Kunitoshi; Kimura, Taizo; Yamakawa, Tatsuo; Wakabayashi, Go; Mori, Rintaro; Endo, Itaru; Miyazaki, Masaru; Yamamoto, Masakazu

2017-04-01

We previously identified 25 intraoperative findings during laparoscopic cholecystectomy (LC) as potential indicators of surgical difficulty per nominal group technique. This study aimed to build a consensus among expert LC surgeons on the impact of each item on surgical difficulty. Surgeons from Japan, Korea, and Taiwan (n = 554) participated in a Delphi process and graded the 25 items on a seven-stage scale (range, 0-6). Consensus was defined as (1) the interquartile range (IQR) of overall responses ≤2 and (2) ≥66% of the responses concentrated within a median ± 1 after stratification by workplace and LC experience level. Response rates for the first and the second-round Delphi were 92.6% and 90.3%, respectively. Final consensus was reached for all the 25 items. 'Diffuse scarring in the Calot's triangle area' in the 'Factors related to inflammation of the gallbladder' category had the strongest impact on surgical difficulty (median, 5; IQR, 1). Surgeons agreed that the surgical difficulty increases as more fibrotic change and scarring develop. The median point for each item was set as the difficulty score. A Delphi consensus was reached among expert LC surgeons on the impact of intraoperative findings on surgical difficulty. © 2017 Japanese Society of Hepato-Biliary-Pancreatic Surgery.
Item analysis of the Spanish version of the Boston Naming Test with a Spanish speaking adult population from Colombia.

PubMed

Kim, Stella H; Strutt, Adriana M; Olabarrieta-Landa, Laiene; Lequerica, Anthony H; Rivera, Diego; De Los Reyes Aragon, Carlos Jose; Utria, Oscar; Arango-Lasprilla, Juan Carlos

2018-02-23

The Boston Naming Test (BNT) is a widely used measure of confrontation naming ability that has been criticized for its questionable construct validity for non-English speakers. This study investigated item difficulty and construct validity of the Spanish version of the BNT to assess cultural and linguistic impact on performance. Subjects were 1298 healthy Spanish speaking adults from Colombia. They were administered the 60- and 15-item Spanish version of the BNT. A Rasch analysis was computed to assess dimensionality, item hierarchy, targeting, reliability, and item fit. Both versions of the BNT satisfied requirements for unidimensionality. Although internal consistency was excellent for the 60-item BNT, order of difficulty did not increase consistently with item number and there were a number of items that did not fit the Rasch model. For the 15-item BNT, a total of 5 items changed position on the item hierarchy with 7 poor fitting items. Internal consistency was acceptable. Construct validity of the BNT remains a concern when it is administered to non-English speaking populations. Similar to previous findings, the order of item presentation did not correspond with increasing item difficulty, and both versions were inadequate at assessing high naming ability.
Varying levels of difficulty index of skills-test items randomly selected by examinees on the Korean emergency medical technician licensing examination

PubMed Central

2016-01-01

Purpose: The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. Methods: The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. Results: In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). Conclusion: In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination. PMID:26883810
Measuring cancer-specific child adjustment difficulties: Development and validation of the Children's Oncology Child Adjustment Scale (ChOCs).

PubMed

Burke, Kylie; McCarthy, Maria; Lowe, Cherie; Sanders, Matthew R; Lloyd, Erin; Bowden, Madeleine; Williams, Lauren

2017-03-01

Childhood cancer is associated with child adjustment difficulties including, eating and sleep disturbance, and emotional and other behavioral difficulties. However, there is a lack of validated instruments to measure the specific child adjustment issues associated with pediatric cancer treatments. The aim of this study was to develop and evaluate the reliability and validity of a parent-reported, child adjustment scale. One hundred thirty-two parents from two pediatric oncology centers who had children (aged 2-10 years) diagnosed with cancer completed the newly developed measure and additional measures of child behavior, sleep, diet, and quality of life. Children were more than 4 weeks postdiagnosis and less than 12 months postactive treatment. Factor structure, internal consistency, and construct (convergent) validity analyses were conducted. Principal component analysis revealed five distinct and theoretically coherent factors: Sleep Difficulties, Impact of Child's Illness, Eating Difficulties, Hospital-Related Behavior Difficulties, and General Behavior Difficulties. The final 25-item measure, the Children's Oncology Child Adjustment Scale (ChOCs), demonstrated good internal consistency (α = 0.79-0.91). Validity of the ChOCs was demonstrated by significant correlations between the subscales and measures of corresponding constructs. The ChOCs provides a new measure of child adjustment difficulties designed specifically for pediatric oncology. Preliminary analyses indicate strong theoretical and psychometric properties. Future studies are required to further examine reliability and validity of the scale, including test-retest reliability, discriminant validity, as well as change sensitivity and generalizability across different oncology samples and ages of children. The ChOCs shows promise as a measure of child adjustment relevant for oncology clinical settings and research purposes. © 2016 Wiley Periodicals, Inc.
Sequential Objective Structured Clinical Examination based on item response theory in Iran.

PubMed

Hejri, Sara Mortaz; Jalili, Mohammad

2017-01-01

In a sequential objective structured clinical examination (OSCE), all students initially take a short screening OSCE. Examinees who pass are excused from further testing, but an additional OSCE is administered to the remaining examinees. Previous investigations of sequential OSCE were based on classical test theory. We aimed to design and evaluate screening OSCEs based on item response theory (IRT). We carried out a retrospective observational study. At each station of a 10-station OSCE, the students' performance was graded on a Likert-type scale. Since the data were polytomous, the difficulty parameters, discrimination parameters, and students' ability were calculated using a graded response model. To design several screening OSCEs, we identified the 5 most difficult stations and the 5 most discriminative ones. For each test, 5, 4, or 3 stations were selected. Normal and stringent cut-scores were defined for each test. We compared the results of each of the 12 screening OSCEs to the main OSCE and calculated the positive and negative predictive values (PPV and NPV), as well as the exam cost. A total of 253 students (95.1%) passed the main OSCE, while 72.6% to 94.4% of examinees passed the screening tests. The PPV values ranged from 0.98 to 1.00, and the NPV values ranged from 0.18 to 0.59. Two tests effectively predicted the results of the main exam, resulting in financial savings of 34% to 40%. If stations with the highest IRT-based discrimination values and stringent cut-scores are utilized in the screening test, sequential OSCE can be an efficient and convenient way to conduct an OSCE.

Strategic retrieval in a reality monitoring task.

PubMed

Rosburg, Timm; Mecklinger, Axel; Johansson, Mikael

2011-08-01

Strategic recollection refers to control processes that allow the retrieval of information that is relevant for a specific situation. These processes can be studied in memory exclusion tasks, which require the retrieval of particular kinds of episodic information. In the current study, we investigated strategic recollection in reality monitoring by event-related potentials (ERPs). Participants studied object words, followed by a picture of the denoted object (perceive condition) or followed by the instruction to imagine such a picture (imagine condition). At test, subjects had to identify words of one study condition and to reject words of the second study condition together with newly presented items. Data analysis showed that object names were better identified when items of the perceive condition were targeted. In this test condition, a left parietal old/new effect (the ERP correlate of recollection) was observed only in response to targets. In contrast, both targets and nontargets elicited this old/new effect when items of the imagine condition were targeted. The magnitude of the left parietal old/new effect to nontargets in this condition (but no other left parietal old/new effect) correlated positively with the discrimination indices of both test conditions. In addition, ERPs to targets and nontargets differed at right frontal electrode sites at longer latencies (1500-1800 ms), with more positive ERPs for targets. Findings indicate that subjects retrieved nontarget information in the more difficult task condition, while they relied on target information alone in the less difficult task. This kind of strategic retrieval was not mirrored in other old/new effects. The correlation between the left parietal old/new effect for nontargets in the imagined item target condition and the discrimination indices of both conditions may indicate that the ease of nontarget retrieval, rather than the difficulty of target retrieval, increases the likelihood that nontarget information is actually retrieved. Copyright © 2011 Elsevier Ltd. All rights reserved.
Asymmetric effects of emotion on mnemonic interference

PubMed Central

Leal, Stephanie L.; Tighe, Sarah K.; Yassa, Michael A.

2014-01-01

Emotional experiences can strengthen memories so that they can be used to guide future behavior. Emotional arousal, mediated by the amygdala, is thought to modulate storage by the hippocampus, which may encode unique episodic memories via pattern separation – the process by which similar memories are stored using non-overlapping representations. While prior work has examined mnemonic interference due to similarity and emotional modulation of memory independently, examining the mechanisms by which emotion influences mnemonic interference has not been previously accomplished in humans. To this end, we developed an emotional memory task where emotional content and stimulus similarity were varied to examine the effect of emotion on fine mnemonic discrimination (a putative behavioral correlate of hippocampal pattern separation). When tested immediately after encoding, discrimination was reduced for similar emotional items compared to similar neutral items, consistent with a reduced bias towards pattern separation. After 24 h, recognition of emotional target items was preserved compared to neutral items, whereas similar emotional item discrimination was further diminished. This suggests a potential mechanism for the emotional modulation of memory with a selective remembering of gist, as well as a selective forgetting of detail, indicating an emotion-induced reduction in pattern separation. This can potentially increase the effective signal-to-noise ratio in any given situation to promote survival. Furthermore, we found that individuals with depressive symptoms hyper-discriminate negative items, which correlated with their symptom severity. This suggests that utilizing mnemonic discrimination paradigms allows us to tease apart the nuances of disorders with aberrant emotional mnemonic processing. PMID:24607286
What Does a Verbal Test Measure? A New Approach to Understanding Sources of Item Difficulty.

ERIC Educational Resources Information Center

Berk, Eric J. Vanden; Lohman, David F.; Cassata, Jennifer Coyne

Assessing the construct relevance of mental test results continues to present many challenges, and it has proven to be particularly difficult to assess the construct relevance of verbal items. This study was conducted to gain a better understanding of the conceptual sources of verbal item difficulty using a unique approach that integrates…
On Maximizing Item Information and Matching Difficulty with Ability.

ERIC Educational Resources Information Center

Bickel, Peter; Buyske, Steven; Chang, Huahua; Ying, Zhiliang

2001-01-01

Examined the assumption that matching difficulty levels of test items with an examinee's ability makes a test more efficient and challenged this assumption through a class of one-parameter item response theory models. Found the validity of the fundamental assumption to be closely related to the van Zwet tail ordering of symmetric distributions (W.…
Detecting a Gender-Related Differential Item Functioning Using Transformed Item Difficulty

ERIC Educational Resources Information Center

Abedalaziz, Nabeel; Leng, Chin Hai; Alahmadi, Ahlam

2014-01-01

The purpose of the study was to examine gender differences in performance on multiple-choice mathematical ability test, administered within the context of high school graduation test that was designed to match eleventh grade curriculum. The transformed item difficulty (TID) was used to detect a gender related DIF. A random sample of 1400 eleventh…
International Semiotics: Item Difficulty and the Complexity of Science Item Illustrations in the PISA-2009 International Test Comparison

ERIC Educational Resources Information Center

Solano-Flores, Guillermo; Wang, Chao; Shade, Chelsey

2016-01-01

We examined multimodality (the representation of information in multiple semiotic modes) in the context of international test comparisons. Using Program of International Student Assessment (PISA)-2009 data, we examined the correlation of the difficulty of science items and the complexity of their illustrations. We observed statistically…
An Investigation of Gender Differences in the Components Influencing the Difficulty of Spatial Ability Items.

ERIC Educational Resources Information Center

Kramer, Gene A.; Smith, Richard M.

2001-01-01

Examined the role that gender differences play in the determination of the components influencing the difficulty of spatial ability items. Results for 2,245 examinees taking a spatial ability test that is part of the Dental School Admission Battery show that component difficulties show little variation across gender. (SLD)
Social status correlates of reporting gender discrimination and racial discrimination among racially diverse women.

PubMed

Ro, Annie E; Choi, Kyung-Hee

2009-01-01

The growing body of research on discrimination and health indicates a deleterious effect of discrimination on various health outcomes. However, less is known about the sociodemographic correlates of reporting racial discrimination and gender discrimination among racially diverse women. We examined the associations of social status characteristics with lifetime experiences of racial discrimination and gender discrimination using a racially-diverse sample of 754 women attending family planning clinics in North California (11.4% African American, 16.8% Latina, 10.1% Asian and 61.7% Caucasian). A multivariate analysis revealed that race, financial difficulty and marital status were significantly correlated with higher reports of racial discrimination, while race, education, financial difficulty and nativity were significantly correlated with gender discrimination scores. Our findings suggest that the social patterning of perceiving racial discrimination is somewhat different from that of gender discrimination. This has implications in the realm of discrimination research and applied interventions, as different forms of discrimination may have unique covariates that should be accounted for in research analysis or program design.
Development of the movement domain in the global body examination.

PubMed

Kvåle, Alice; Bunkan, Berit Heir; Opjordsmoen, Stein; Friis, Svein

2012-01-01

The purpose of this study was to develop a new Movement domain, based on 16 items from the Global Physiotherapy Examination-52 (GPE-52) and 18 items from the Comprehensive Body Examination (CBE). Furthermore, we examined how well the new domain and its scales would discriminate between healthy individuals and different groups of patients, compared to the original methods. Two physiotherapists, each using one method, independently examined 132 individuals (34 healthy, 32 with localized pain, 32 with generalized pain, and 34 with psychoses). The number of items was reduced by means of correlational and exploratory factor analysis. Internal consistency was examined with Cronbach's alpha. For examination of discriminative validity, Mann-Whitney U-test and Area under the Curve (AUC) were used. The initial 34 items were reduced to two subscales with 13 items: one for range of movement and balance and one for flexibility. Cronbach's alpha was 0.84 and 0.87 for the two subscales. The new subscales showed very good to excellent discriminating ability between healthy persons and the different patient groups (p < 0.001; AUC 0.82-0.95). Furthermore, patients with localized pain had significantly less movement aberrations than the other patient groups. The new Movement domain had fewer items than the GPE-52 and CBE, without losing discriminative validity.
Is Using the Strengths and Difficulties Questionnaire in a Community Sample the Optimal Way to Assess Mental Health Functioning?

PubMed

Vaz, Sharmila; Cordier, Reinie; Boyes, Mark; Parsons, Richard; Joosten, Annette; Ciccarelli, Marina; Falkmer, Marita; Falkmer, Torbjorn

2016-01-01

An important characteristic of a screening tool is its discriminant ability or the measure's accuracy to distinguish between those with and without mental health problems. The current study examined the inter-rater agreement and screening concordance of the parent and teacher versions of SDQ at scale, subscale and item-levels, with the view of identifying the items that have the most informant discrepancies; and determining whether the concordance between parent and teacher reports on some items has the potential to influence decision making. Cross-sectional data from parent and teacher reports of the mental health functioning of a community sample of 299 students with and without disabilities from 75 different primary schools in Perth, Western Australia were analysed. The study found that: a) Intraclass correlations between parent and teacher ratings of children's mental health using the SDQ at person level was fair on individual child level; b) The SDQ only demonstrated clinical utility when there was agreement between teacher and parent reports using the possible or 90% dichotomisation system; and c) Three individual items had positive likelihood ratio scores indicating clinical utility. Of note was the finding that the negative likelihood ratio or likelihood of disregarding the absence of a condition when both parents and teachers rate the item as absent was not significant. Taken together, these findings suggest that the SDQ is not optimised for use in community samples and that further psychometric evaluation of the SDQ in this context is clearly warranted.
Anxiety and fear. Discriminant validity in the child and adolescent practitioner's perspective.

PubMed

Pavuluri, Mani N; Henry, David; Allen, Kathleen

2002-12-01

We assessed the ability of child and adolescent practitioners to discriminate between anxiety items from the Revised Children's Manifest Anxiety Scale (RCMAS) and fear items from the Fear Survey Schedule for Children-Revised (FSSC-R). In addition, we examined the effects age, gender, nationality, and therapeutic orientation on discrimination ability. Child and adolescent psychiatrists and psychologists from two university hospitals in Australia and the USA completed a questionnaire comprised of items randomly chosen from the RCMAS and the FSSC-R. Clinicians rated each item on the extent to which the item represented the construct of anxiety or fear, using a 7-point Likert-type scale. Clinicians were more accurate in their perceptions of anxiety than in their perceptions of fear. Clinicians with a psychodynamic orientation were more likely to perceive an item as describing anxiety, and were less likely to identify fear. There was a significant interaction between age, scale and perception, with the youngest clinicians showing the greatest perceptual differentiation between the fear and anxiety items. The results suggest a need to develop common terminology among researchers and clinicians, develop scales with items specific to the pathology they intend to measure, and consider the variables influencing the clinicians rating them.
An algorithm for calculating exam quality as a basis for performance-based allocation of funds at medical schools.

PubMed

Kirschstein, Timo; Wolters, Alexander; Lenz, Jan-Hendrik; Fröhlich, Susanne; Hakenberg, Oliver; Kundt, Günther; Darmüntzel, Martin; Hecker, Michael; Altiner, Attila; Müller-Hilke, Brigitte

2016-01-01

The amendment of the Medical Licensing Act (ÄAppO) in Germany in 2002 led to the introduction of graded assessments in the clinical part of medical studies. This, in turn, lent new weight to the importance of written tests, even though the minimum requirements for exam quality are sometimes difficult to reach. Introducing exam quality as a criterion for the award of performance-based allocation of funds is expected to steer the attention of faculty members towards more quality and perpetuate higher standards. However, at present there is a lack of suitable algorithms for calculating exam quality. In the spring of 2014, the students' dean commissioned the "core group" for curricular improvement at the University Medical Center in Rostock to revise the criteria for the allocation of performance-based funds for teaching. In a first approach, we developed an algorithm that was based on the results of the most common type of exam in medical education, multiple choice tests. It included item difficulty and discrimination, reliability as well as the distribution of grades achieved. This algorithm quantitatively describes exam quality of multiple choice exams. However, it can also be applied to exams involving short assay questions and the OSCE. It thus allows for the quantitation of exam quality in the various subjects and - in analogy to impact factors and third party grants - a ranking among faculty. Our algorithm can be applied to all test formats in which item difficulty, the discriminatory power of the individual items, reliability of the exam and the distribution of grades are measured. Even though the content validity of an exam is not considered here, we believe that our algorithm is suitable as a general basis for performance-based allocation of funds.
Discrimination between stages of Alzheimer's disease with subsets of Mini-Mental State Examination items. An analysis of Consortium to Establish a Registry for Alzheimer's Disease data.

PubMed

Fillenbaum, G G; Wilkinson, W E; Welsh, K A; Mohs, R C

1994-09-01

To identify minimal sets of Mini-Mental State Examination (MMSE) items that can distinguish normal control subjects from patients with mild Alzheimer's disease (AD), patients with mild from those with moderate AD, and those with moderate from those with severe AD. Two randomly selected equivalent half samples. Results of logistic regression analysis from data from the first half of the sample were confirmed by receiver operating characteristic curves on the second half. Memory disorders clinics at major medical centers in the United States affiliated with the Consortium to establish a Registry for Alzheimer's Disease (CERAD). White, normal control subjects (n = 412) and patients with AD (n = 621) who met CERAD criteria; nonwhite subjects (n = 165) and persons with missing data (n = 27) were excluded. Three four-item sets of MMSE items that discriminate, respectively, (1) normal controls from patients with mild AD, (2) patients with mild from those with moderate AD, and (3) patients with moderate from those with severe AD. The MMSE items discriminating normal controls from patients with mild AD were day, date, recall of apple, and recall of penny; those discriminating patients with mild from those with moderate AD were month, city, spelling world backward, and county, and those discriminating patients with moderate from those with severe AD were floor of building, repeating the word table, naming watch, and folding paper in half. Performance on the first two four-item sets was comparable with that of the full MMSE; the third set distinguished patients with moderate from those with severe AD better than chance. A minimum set of MMSE items can effectively discriminate normal controls from patients with mild AD and between successive levels of severity of AD. Data apply only to white patients with AD. Performance in minorities, more heterogeneous groups, or normal subjects with questionable cognitive status has not been assessed.
Validation of the German version of the Nurse-Work Instability Scale: baseline survey findings of a prospective study of a cohort of geriatric care workers

PubMed Central

2013-01-01

Background A prospective study of a cohort of nursing staff from nursing homes was undertaken to validate the Nurse-Work Instability Scale (Nurse-WIS). Baseline investigation data was used to test reliability, construct validity and criterion validity. Method A survey of nursing staff from nursing homes was conducted using a questionnaire containing the Nurse-WIS along with other survey instruments (including SF-12, WAI, SPE). The self-reported number of days’ sick leave taken and if a pension for reduced work capacity was drawn were recorded. The reliability of the scale was checked by item difficulty (P), item discrimination (rjt) and by internal consistency according to Cronbach’s coefficient. The hypotheses for checking construct validity were tested on the basis of correlations. Pearson’s chi-square was used to test concurrent criterion validity; discriminant validity was tested by means of binary logistic regression. Results 396 persons answered the questionnaire (21.3% response rate). More than 80% were female and mostly work full-time in a rotating shift pattern. Following the test for item discrimination, two items were removed from the Nurse-WIS test. According to Cronbach’s (0.927) the scale provides a high degree of measuring accuracy. All hypotheses and assumptions used to test validity were confirmed: As the Nurse-WIS risk increases, health-related quality of life, work ability and job satisfaction decline. Depressive symptoms and a poor subjective prognosis of earning capacity are also more frequent. Musculoskeletal disorders and impairments of psychological well-being are more frequent. Age also influences the Nurse-WIS result. While 12.0% of those below the age of 35 had an increased risk, the figure for those aged over 55 was 50%. Conclusion This study is the first validation study of the Nurse-WIS to date. The Nurse-WIS shows good reliability, good validity and a good level of measuring accuracy. It appears to be suitable for recording prevention and rehabilitation needs among health care workers. If, in the follow-up, the Nurse-WIS likewise proves to be a reliable screening instrument with good predictive validity, it could ensure that suitable action is taken at an early stage, thereby helping to counteract early retirement and the anticipated shortage of health care workers. PMID:24330532
The ADHD Concomitant Difficulties Scale (ADHD-CDS), a Brief Scale to Measure Comorbidity Associated to ADHD.

PubMed

Fenollar-Cortés, Javier; Fuentes, Luis J

2016-01-01

Although the critical feature of attention-deficit/hyperactivity disorder (ADHD) is a persistent pattern of inattention and/or hyperactivity/impulsivity behavior, the disorder is clinically heterogeneous, and concomitant difficulties are common. Children with ADHD are at increased risk for experiencing lifelong impairments in multiple domains of daily functioning. In the present study we aimed to build a brief ADHD impairment-related tool -ADHD concomitant difficulties scale (ADHD-CDS)- to assess the presence of some of the most important comorbidities that usually appear associated with ADHD such as emotional/motivational management, fine motor coordination, problem-solving/management of time, disruptive behavior, sleep habits, academic achievement and quality of life. The two main objectives of the study were (i) to discriminate those profiles with several and important ADHD functional difficulties and (ii) to create a brief clinical tool that fosters a comprehensive evaluation process and can be easily used by clinicians. The total sample included 399 parents of children with ADHD aged 6-18 years (M = 11.65; SD = 3.1; 280 males) and 297 parents of children without a diagnosis of ADHD (M = 10.91; SD = 3.2; 149 male). The scale construction followed an item improved sequential process. Factor analysis showed a 13-item single factor model with good fit indices. Higher scores on inattention predicted higher scores on ADHD-CDS for both the clinical sample (β = 0.50; p < 0.001) and the whole sample (β = 0.85; p < 0.001). The ROC curve for the ADHD-CDS (against the ADHD diagnostic status) gave an area under the curve (AUC) of.979 (95%, CI = [0.969, 0.990]). The ADHD-CDS has shown preliminary adequate psychometric properties, with high convergent validity and good sensitivity for different ADHD profiles, which makes it a potentially appropriate and brief instrument that may be easily used by clinicians, researchers, and health professionals in dealing with ADHD.
The ADHD Concomitant Difficulties Scale (ADHD-CDS), a Brief Scale to Measure Comorbidity Associated to ADHD

PubMed Central

Fenollar-Cortés, Javier; Fuentes, Luis J.

2016-01-01

Introduction: Although the critical feature of attention-deficit/hyperactivity disorder (ADHD) is a persistent pattern of inattention and/or hyperactivity/impulsivity behavior, the disorder is clinically heterogeneous, and concomitant difficulties are common. Children with ADHD are at increased risk for experiencing lifelong impairments in multiple domains of daily functioning. In the present study we aimed to build a brief ADHD impairment-related tool -ADHD concomitant difficulties scale (ADHD-CDS)- to assess the presence of some of the most important comorbidities that usually appear associated with ADHD such as emotional/motivational management, fine motor coordination, problem-solving/management of time, disruptive behavior, sleep habits, academic achievement and quality of life. The two main objectives of the study were (i) to discriminate those profiles with several and important ADHD functional difficulties and (ii) to create a brief clinical tool that fosters a comprehensive evaluation process and can be easily used by clinicians. Methods: The total sample included 399 parents of children with ADHD aged 6–18 years (M = 11.65; SD = 3.1; 280 males) and 297 parents of children without a diagnosis of ADHD (M = 10.91; SD = 3.2; 149 male). The scale construction followed an item improved sequential process. Results: Factor analysis showed a 13-item single factor model with good fit indices. Higher scores on inattention predicted higher scores on ADHD-CDS for both the clinical sample (β = 0.50; p < 0.001) and the whole sample (β = 0.85; p < 0.001). The ROC curve for the ADHD-CDS (against the ADHD diagnostic status) gave an area under the curve (AUC) of.979 (95%, CI = [0.969, 0.990]). Discussion: The ADHD-CDS has shown preliminary adequate psychometric properties, with high convergent validity and good sensitivity for different ADHD profiles, which makes it a potentially appropriate and brief instrument that may be easily used by clinicians, researchers, and health professionals in dealing with ADHD. PMID:27378972
Task difficulty modulates brain activation in the emotional oddball task.

PubMed

Siciliano, Rachel E; Madden, David J; Tallman, Catherine W; Boylan, Maria A; Kirste, Imke; Monge, Zachary A; Packard, Lauren E; Potter, Guy G; Wang, Lihong

2017-06-01

Previous functional magnetic resonance imaging (fMRI) studies have reported that task-irrelevant, emotionally salient events can disrupt target discrimination, particularly when attentional demands are low, while others demonstrate alterations in the distracting effects of emotion in behavior and neural activation in the context of attention-demanding tasks. We used fMRI, in conjunction with an emotional oddball task, at different levels of target discrimination difficulty, to investigate the effects of emotional distractors on the detection of subsequent targets. In addition, we distinguished different behavioral components of target detection representing decisional, nondecisional, and response criterion processes. Results indicated that increasing target discrimination difficulty led to increased time required for both the decisional and nondecisional components of the detection response, as well as to increased target-related neural activation in frontoparietal regions. The emotional distractors were associated with activation in ventral occipital and frontal regions and dorsal frontal regions, but this activation was attenuated with increased difficulty. Emotional distraction did not alter the behavioral measures of target detection, but did lead to increased target-related frontoparietal activation for targets following emotional images as compared to those following neutral images. This latter effect varied with target discrimination difficulty, with an increased influence of the emotional distractors on subsequent target-related frontoparietal activation in the more difficult discrimination condition. This influence of emotional distraction was in addition associated specifically with the decisional component of target detection. These findings indicate that emotion-cognition interactions, in the emotional oddball task, vary depending on the difficulty of the target discrimination and the associated limitations on processing resources. Copyright © 2017 Elsevier B.V. All rights reserved.
Why Are the Mathematics National Examination Items Difficult and What Is Teachers' Strategy to Overcome It?

ERIC Educational Resources Information Center

Retnawati, Heri; Kartowagiran, Badrun; Arlinwibowo, Janu; Sulistyaningsih, Eny

2017-01-01

The quality of national examination items plays an enormous role in identifying students' competencies mastery and their difficulties. This study aims to identify the difficult items in the Junior High School Mathematics National Examination, to find the factors that cause students' difficulty and to reveal the strategies that the teachers and the…
Faster on Easy Items, More Accurate on Difficult Ones: Cognitive Ability and Performance on a Task of Varying Difficulty

ERIC Educational Resources Information Center

Dodonova, Yulia A.; Dodonov, Yury S.

2013-01-01

Using more complex items than those commonly employed within the information-processing approach, but still easier than those used in intelligence tests, this study analyzed how the association between processing speed and accuracy level changes as the difficulty of the items increases. The study involved measuring cognitive ability using Raven's…
Predicting Item Difficulty in a Reading Comprehension Test with an Artificial Neural Network.

ERIC Educational Resources Information Center

Perkins, Kyle; And Others

This paper reports the results of using a three-layer backpropagation artificial neural network to predict item difficulty in a reading comprehension test. Two network structures were developed, one with and one without a sigmoid function in the output processing unit. The data set, which consisted of a table of coded test items and corresponding…

Perceptual discrimination difficulty and familiarity in the Uncanny Valley: more like a "Happy Valley".

PubMed

Cheetham, Marcus; Suter, Pascal; Jancke, Lutz

2014-01-01

The Uncanny Valley Hypothesis (UVH) predicts that greater difficulty perceptually discriminating between categorically ambiguous human and humanlike characters (e.g., highly realistic robot) evokes negatively valenced (i.e., uncanny) affect. An ABX perceptual discrimination task and signal detection analysis was used to examine the profile of perceptual discrimination (PD) difficulty along the UVH' dimension of human likeness (DHL). This was represented using avatar-to-human morph continua. Rejecting the implicitly assumed profile of PD difficulty underlying the UVH' prediction, Experiment 1 showed that PD difficulty was reduced for categorically ambiguous faces but, notably, enhanced for human faces. Rejecting the UVH' predicted relationship between PD difficulty and negative affect (assessed in terms of the UVH' familiarity dimension), Experiment 2 demonstrated that greater PD difficulty correlates with more positively valenced affect. Critically, this effect was strongest for the ambiguous faces, suggesting a correlative relationship between PD difficulty and feelings of familiarity more consistent with the metaphor happy valley. This relationship is also consistent with a fluency amplification instead of the hitherto proposed hedonic fluency account of affect along the DHL. Experiment 3 found no evidence that the asymmetry in the profile of PD along the DHL is attributable to a differential processing bias (cf. other-race effect), i.e., processing avatars at a category level but human faces at an individual level. In conclusion, the present data for static faces show clear effects that, however, strongly challenge the UVH' implicitly assumed profile of PD difficulty along the DHL and the predicted relationship between this and feelings of familiarity.
Perceptual discrimination difficulty and familiarity in the Uncanny Valley: more like a “Happy Valley”

PubMed Central

Cheetham, Marcus; Suter, Pascal; Jancke, Lutz

2014-01-01

The Uncanny Valley Hypothesis (UVH) predicts that greater difficulty perceptually discriminating between categorically ambiguous human and humanlike characters (e.g., highly realistic robot) evokes negatively valenced (i.e., uncanny) affect. An ABX perceptual discrimination task and signal detection analysis was used to examine the profile of perceptual discrimination (PD) difficulty along the UVH' dimension of human likeness (DHL). This was represented using avatar-to-human morph continua. Rejecting the implicitly assumed profile of PD difficulty underlying the UVH' prediction, Experiment 1 showed that PD difficulty was reduced for categorically ambiguous faces but, notably, enhanced for human faces. Rejecting the UVH' predicted relationship between PD difficulty and negative affect (assessed in terms of the UVH' familiarity dimension), Experiment 2 demonstrated that greater PD difficulty correlates with more positively valenced affect. Critically, this effect was strongest for the ambiguous faces, suggesting a correlative relationship between PD difficulty and feelings of familiarity more consistent with the metaphor happy valley. This relationship is also consistent with a fluency amplification instead of the hitherto proposed hedonic fluency account of affect along the DHL. Experiment 3 found no evidence that the asymmetry in the profile of PD along the DHL is attributable to a differential processing bias (cf. other-race effect), i.e., processing avatars at a category level but human faces at an individual level. In conclusion, the present data for static faces show clear effects that, however, strongly challenge the UVH' implicitly assumed profile of PD difficulty along the DHL and the predicted relationship between this and feelings of familiarity. PMID:25477829
A new look at the WHOQOL as health-related quality of life instrument among visually impaired people using Rasch analysis.

PubMed

Gothwal, Vijaya K; Srinivas, Marmamula; Rao, Gullapalli N

2013-05-01

To examine the psychometric characteristics of the World Health Organization quality of life instrument-modified Indian version (modified WHOQOL) and its subscales in adults with visual impairment (VI) using Rasch analysis. Cross-sectional data were of people aged ≥40 years with VI (n = 1,333) who responded to the modified WHOQOL in the Andhra Pradesh Eye Disease Study, India. Rasch analysis was used to explore the instrument and its subscales for key indices such as measurement precision by person separation reliability, PSR (i.e., discrimination between strata of participants' health-related QOL [HRQOL], recommended minimum value 0.8), unidimensionality (i.e., measurement of a single construct), and targeting (i.e., matching of item difficulty to participants' HRQOL). Rasch-guided iterative approach including category re-organization to enable threshold ordering and item deletion to overcome multidimensionality resulted in a unidimensional 9-item WHOQOL and a 6-item level of independence (LOI) subscale with adequate PSR (0.81 and 0.82, respectively). Targeting was sub-optimal for both (-1.58 logits for WHOQOL and -2.55 logits for the subscale). Remaining subscales were dysfunctional. The WHOQOL and LOI subscale can be improved and shortened, and the Rasch-revised versions are likely to assess the HROQL of VI patients best because of their brevity, reliability, and unidimensionality.
Tournament Validity: Testing Golfer Competence

ERIC Educational Resources Information Center

Sachau, Daniel; Andrews, Lance; Gibson, Bryan; DeNeui, Daniel

2009-01-01

The concept of tournament validity was explored in three studies. In the first study, measures of tournament validity, difficulty, and discrimination were introduced. These measures were illustrated with data from the 2003 Professional Golf Association (PGA) Tour. In the second study, the relationship between difficulty and discrimination was…
Paintings discrimination by mice: Different strategies for different paintings.

PubMed

Watanabe, Shigeru

2017-09-01

C57BL/6 mice were trained on simultaneous discrimination of paintings with multiple exemplars, using an operant chamber with a touch screen. The number of exemplars was successively increased up to six. Those mice trained in Kandinsky/Mondrian discrimination showed improved learning and generalization, whereas those trained in Picasso/Renoir discrimination showed no improvements in learning or generalization. These results suggest category-like discrimination in the Kandinsky/Mondrian task, but item-to-item discrimination in the Picasso/Renoir task. Mice maintained their discriminative behavior in a pixelization test with various paintings; however, mice in the Picasso/Renoir task showed poor performance in a test that employed scrambling processing. These results do not indicate that discrimination strategy for any Kandinsky/Mondrian combinations differed from that for any Picasso/Monet combinations but suggest the mice employed different strategies of discrimination tasks depending upon stimuli. Copyright © 2017 Elsevier B.V. All rights reserved.
Measuring cancer caregiver health literacy: Validation of the Health Literacy of Caregivers Scale-Cancer (HLCS-C) in an Australian population.

PubMed

Yuen, Eva; Knight, Tess; Dodson, Sarity; Chirgwin, Jacqueline; Busija, Lucy; Ricciardelli, Lina A; Burney, Susan; Parente, Phillip; Livingston, Patricia M

2018-05-01

Caregivers have been largely neglected in health literacy measurement. We assess the construct validity, and internal consistency of the Health Literacy of Caregivers Scale-Cancer (HLCS-C), and present a revised, psychometrically robust scale. Using data from 297 cancer caregivers (12.4% response rate) recruited from Melbourne, Australia between January-July 2014, confirmatory factor analysis (CFA) was conducted to evaluate the HLCS-C's proposed factor structure. Items were evaluated for: item difficulty, unidimensionality and overall item fit within their domain. Item-threshold-ordering was examined though one-parameter Item Response Theory models. Internal consistency was assessed using Raykov's reliability coefficient. CFA results identified 42 poorly performing/redundant items which were subsequently removed. A 10-factor model was fitted to 46 acceptable items with no correlated residuals or factor cross-loadings accepted. Adequate fit was revealed (χ 2 WLSMV = 1463.807[df = 944], p < .001, RMSEA = 0.043, CFI = 0.980, TLI = 0.978, WRMR = 1.00). Ten domains were identified: Proactivity and determination to seek information; Adequate information about cancer and cancer management; Supported by healthcare providers (HCP) to understand information; Social support; Cancer-related communication with the care recipient (CR); Understanding CR needs and preferences; Self-care; Understanding the healthcare system; Capacity to process health information; and Active engagement with HCP. Internal consistency was adequate across domains (0.78-0.92). The revised HLCS-C demonstrated good structural, convergent, and discriminant validity, and high internal consistency. The scale may be useful for the development and evaluation of caregiver interventions. © 2017 John Wiley & Sons Ltd.
Detecting unexpected variables in the MMPI 2 Social Introversion scale.

PubMed

Chang, C H; Wright, B D

2001-01-01

The standard scoring structure of the revised Minnesota Multiphasic Personality Inventory (MMPI-2) Social Introversion (Si) scale was reexamined with Rasch Measurement. The 69-item Si scale split into two distinct dimensions when their standardized residuals were factor analyzed. Items keyed "true" to Si defined one dimension and items keyed "false" defined another. Relationships between Lexile values (an index of reading difficulty and comprehension) and item difficulties were also explored. The article shows how to use Rasch Measurement to understand and improve personality assessment.
Day-to-day discrimination and health among Asian Indians: a population-based study of Gujarati men and women in Metropolitan Detroit.

PubMed

Yoshihama, Mieko; Bybee, Deborah; Blazevski, Juliane

2012-10-01

This study examined the relationship between experiences of day-to-day discrimination and two measures of health among Gujaratis, one of the largest ethnic groups of Asian Indians in the U.S. Data were collected via computer-assisted telephone interviews with a random sample of Gujarati men and women aged 18-64 in Metropolitan Detroit (N = 423). Using structural equation modeling, we tested two gender-moderated models of the relationship between day-to-day discrimination and health, one using the single-item general health status and the other using the 4-item emotional wellbeing measure. For both women and men, controlling for socio-demographic and other relevant characteristics, the experience of day-to-day discrimination was associated with worse emotional wellbeing. However, day-to-day discrimination was associated with the single-item self-rated general health status only for men. This study identified not only gender differences in discrimination-health associations but also the importance of using multiple questions in assessing perceived health status.
An Alternate Definition of the ETS Delta Scale of Item Difficulty. Program Statistics Research.

ERIC Educational Resources Information Center

Holland, Paul W.; Thayer, Dorothy T.

An alternative definition has been developed of the delta scale of item difficulty used at Educational Testing Service. The traditional delta scale uses an inverse normal transformation based on normal ogive models developed years ago. However, no use is made of this fact in typical uses of item deltas. It is simply one way to make the probability…
Discriminant content validity: a quantitative methodology for assessing content of theory-based measures, with illustrative applications.

PubMed

Johnston, Marie; Dixon, Diane; Hart, Jo; Glidewell, Liz; Schröder, Carin; Pollard, Beth

2014-05-01

In studies involving theoretical constructs, it is important that measures have good content validity and that there is not contamination of measures by content from other constructs. While reliability and construct validity are routinely reported, to date, there has not been a satisfactory, transparent, and systematic method of assessing and reporting content validity. In this paper, we describe a methodology of discriminant content validity (DCV) and illustrate its application in three studies. Discriminant content validity involves six steps: construct definition, item selection, judge identification, judgement format, single-sample test of content validity, and assessment of discriminant items. In three studies, these steps were applied to a measure of illness perceptions (IPQ-R) and control cognitions. The IPQ-R performed well with most items being purely related to their target construct, although timeline and consequences had small problems. By contrast, the study of control cognitions identified problems in measuring constructs independently. In the final study, direct estimation response formats for theory of planned behaviour constructs were found to have as good DCV as Likert format. The DCV method allowed quantitative assessment of each item and can therefore inform the content validity of the measures assessed. The methods can be applied to assess content validity before or after collecting data to select the appropriate items to measure theoretical constructs. Further, the data reported for each item in Appendix S1 can be used in item or measure selection. Statement of contribution What is already known on this subject? There are agreed methods of assessing and reporting construct validity of measures of theoretical constructs, but not their content validity. Content validity is rarely reported in a systematic and transparent manner. What does this study add? The paper proposes discriminant content validity (DCV), a systematic and transparent method of assessing and reporting whether items assess the intended theoretical construct and only that construct. In three studies, DCV was applied to measures of illness perceptions, control cognitions, and theory of planned behaviour response formats. Appendix S1 gives content validity indices for each item of each questionnaire investigated. Discriminant content validity is ideally applied while the measure is being developed, before using to measure the construct(s), but can also be applied after using a measure. © 2014 The British Psychological Society.
Psychometric properties of the Chinese version of the Menopause-Specific Quality-of-Life questionnaire

PubMed Central

Nie, Guangning; Yang, Hongyan; Liu, Jian; Zhao, ChunMei; Wang, Xiaoyun

2017-01-01

Abstract Objective: The Menopause-Specific Quality-of-Life (MENQOL) questionnaire was developed as a specific tool to measure the health-related quality-of-life of postmenopausal women. Thus far, the Chinese version questionnaire has not been subjected to psychometric assessment with a large sample. This study aims to evaluate the validity and reliability of the Chinese version of the MENQOL specific to postmenopausal women in China. Methods: A total of 1,137 menopausal symptomatic and 491 menopausal asymptomatic women from eight cities in China were recruited using a convenience sampling method. Psychometric properties were evaluated by descriptive statistics, validity, and reliability. Reliability was assessed for each subscale of the MENQOL through internal consistency reliability with Cronbach's α and intersubscale correlations. Item-domain correlations, principal components analysis (PCA), and confirmatory factor analysis were performed to determine construct validity. t tests were used to compare the differences between the menopausal symptomatic and asymptomatic women and to evaluate the discriminate validity. Pearson correlation coefficients were calculated between MENQOL scores and the Kupperman index to assess criterion-related validity. Results: The most common symptoms in Chinese menopausal symptomatic women were “experiencing poor memory” (94.4%), “feeling tired or worn out” (93.8%), “aching in muscle and joints” (89.4%), “low backache” (86.9%), “decrease in physical strength” (86.6%), “aches in back of neck or head” (86.2%), “difficulty sleeping” (83.6%), “accomplishing less than I used to” (83.4%), “feeling a lack of energy” (83.3%), “change in your sexual desire” (81%), and “hot flash” (80.7%) among others. The symptoms of “increased facial hair” were rarely seen (9.9%). The vasomotor domain, as well as psychosocial, physical, and sexual domains showed high reliability (Cronbach's α 0.84, 0.87, 0.89, and 0.86, respectively). Item-domain correlation analysis showed that all items correlated more strongly with their own domains than with other domains. In the PCA, after deleting the “increased facial hair” item, items in the vasomotor, sexual, and psychosocial subscales loaded on their respective domains by and large, and items in the physical subscale divided into two factors. The PCA revealed a latent structure of the Chinese version of MENQOL nearly identical to the original MENQOL domains. The confirmatory factor analysis demonstrated that the questionnaire fits well with a four-domain model. The MENQOL can discriminate between menopausal symptomatic women with asymptomatic women as it showed good discriminate validity. Criterion-related validity was confirmed by a significant correlation between MENQOL scores and the Kupperman index. Conclusions: This study showed that Chinese version of MENQOL has good psychometric properties and would be suitable to measure the health-related quality-of-life of Chinese menopausal women except for item 21 (increased facial hair). PMID:27922934
The role of difficulty and gender in numbers, algebra, geometry and mathematics achievement

NASA Astrophysics Data System (ADS)

Rabab'h, Belal Sadiq Hamed; Veloo, Arsaythamby; Perumal, Selvan

2015-05-01

This study aims to identify the role of difficulty and gender in numbers, algebra, geometry and mathematics achievement among secondary schools students in Jordan. The respondent of the study were 337 students from eight public secondary school in Alkoura district by using stratified random sampling. The study comprised of 179 (53%) males and 158 (47%) females students. The mathematics test comprises of 30 items which has eight items for numbers, 14 items for algebra and eight items for geometry. Based on difficulties among male and female students, the findings showed that item 4 (fractions - 0.34) was most difficult for male students and item 6 (square roots - 0.39) for females in numbers. For the algebra, item 11 (inequality - 0.23) was most difficult for male students and item 6 (algebraic expressions - 0.35) for female students. In geometry, item 3 (reflection - 0.34) was most difficult for male students and item 8 (volume - 0.33) for female students. Based on gender differences, female students showed higher achievement in numbers and algebra compare to male students. On the other hand, there was no differences between male and female students achievement in geometry test. This study suggest that teachers need to give more attention on numbers and algebra when teaching mathematics.
What do Demand-Control and Effort-Reward work stress questionnaires really measure? A discriminant content validity study of relevance and representativeness of measures.

PubMed

Bell, Cheryl; Johnston, Derek; Allan, Julia; Pollard, Beth; Johnston, Marie

2017-05-01

The Demand-Control (DC) and Effort-Reward Imbalance (ERI) models predict health in a work context. Self-report measures of the four key constructs (demand, control, effort, and reward) have been developed and it is important that these measures have good content validity uncontaminated by content from other constructs. We assessed relevance (whether items reflect the constructs) and representativeness (whether all aspects of the construct are assessed, and all items contribute to that assessment) across the instruments and items. Two studies examined fourteen demand/control items from the Job Content Questionnaire and seventeen effort/reward items from the Effort-Reward Imbalance measure using discriminant content validation and a third study developed new methods to assess instrument representativeness. Both methods use judges' ratings and construct definitions to get transparent quantitative estimates of construct validity. Study 1 used dictionary definitions while studies 2 and 3 used published phrases to define constructs. Overall, 3/5 demand items, 4/9 control items, 1/6 effort items, and 7/11 reward items were uniquely classified to the appropriate theoretical construct and were therefore 'pure' items with discriminant content validity (DCV). All pure items measured a defining phrase. However, both the DC and ERI assessment instruments failed to assess all defining aspects. Finding good discriminant content validity for demand and reward measures means these measures are usable and our quantitative results can guide item selection. By contrast, effort and control measures had limitations (in relevance and representativeness) presenting a challenge to the implementation of the theories. Statement of contribution What is already known on this subject? While the reliability and construct validity of Demand-Control and Effort-Reward-Imbalance (DC and ERI) work stress measures are routinely reported, there has not been adequate investigation of their content validity. This paper investigates their content validity in terms of both relevance and representativeness and provides a model for the investigation of content validity of measures in health psychology more generally. What does this study add? A new application of an existing method, discriminant content validity, and a new method of assessing instrument representativeness. 'Pure' DC and ERI items are identified, as are constructs that are not fully represented by their assessment instruments. The findings are important for studies attempting to distinguish between the main DC and ERI work stress constructs. The quantitative results can be used to guide item selection for future studies. © 2017 The British Psychological Society.
Social Status Correlates of Reporting Racial Discrimination and Gender Discrimination among Racially Diverse Women

PubMed Central

Ro, Annie E.; Choi, Kyung-Hee

2009-01-01

The growing body of research on discrimination and health indicates a deleterious effect of discrimination on various health outcomes. However, less is known about the sociodemographic correlates of reporting racial discrimination and gender discrimination among racially diverse women. We examined the associations of social status characteristics with lifetime experiences of racial discrimination and gender discrimination using a racially-diverse sample of 754 women attending family planning clinics in Northern California (11.4% African American, 16.8% Latina, 10.1% Asian and 61.7% Caucasian). A multivariate analysis revealed that race, financial difficulty and marital status were significantly correlated with higher reports of racial discrimination, while race, education, financial difficulty and nativity were significantly correlated with gender discrimination scores. Our findings suggest that the social patterning of perceiving racial discrimination is somewhat different from that of gender discrimination. This has implications in the realm of discrimination research and applied interventions, as different forms of discrimination may have unique covariates that should be accounted for in research analysis or program design. PMID:19485231
Stereotype threat in classroom settings: the interactive effect of domain identification, task difficulty and stereotype threat on female students' maths performance.

PubMed

Keller, Johannes

2007-06-01

Stereotype threat research revealed that negative stereotypes can disrupt the performance of persons targeted by such stereotypes. This paper contributes to stereotype threat research by providing evidence that domain identification and the difficulty level of test items moderate stereotype threat effects on female students' maths performance. The study was designed to test theoretical ideas derived from stereotype threat theory and assumptions outlined in the Yerkes-Dodson law proposing a nonlinear relationship between arousal, task difficulty and performance. Participants were 108 high school students attending secondary schools. Participants worked on a test comprising maths problems of different difficulty levels. Half of the participants learned that the test had been shown to produce gender differences (stereotype threat). The other half learned that the test had been shown not to produce gender differences (no threat). The degree to which participants identify with the domain of maths was included as a quasi-experimental factor. Maths-identified female students showed performance decrements under conditions of stereotype threat. Moreover, the stereotype threat manipulation had different effects on low and high domain identifiers' performance depending on test item difficulty. On difficult items, low identifiers showed higher performance under threat (vs. no threat) whereas the reverse was true in high identifiers. This interaction effect did not emerge on easy items. Domain identification and test item difficulty are two important factors that need to be considered in the attempt to understand the impact of stereotype threat on performance.
Parent-reported and clinician-observed autism spectrum disorder (ASD) symptoms in children with attention deficit/hyperactivity disorder (ADHD): implications for practice under DSM-5.

PubMed

Grzadzinski, Rebecca; Dick, Catherine; Lord, Catherine; Bishop, Somer

2016-01-01

Children with attention deficit/hyperactivity disorder (ADHD) often present with social difficulties, though the extent to which these clearly overlap with symptoms of autism spectrum disorder (ASD) is not well understood. We explored parent-reported and directly-observed ASD symptoms on the Autism Diagnostic Interview-Revised (ADI-R) and the Autism Diagnostic Observation Schedule (ADOS) in children referred to ASD-specialty clinics who received diagnoses of either ADHD (n = 48) or ASD (n = 164). Of the ADHD sample, 21 % met ASD cut-offs on the ADOS and 30 % met ASD cut-offs on all domains of the ADI-R. Four social communication ADOS items (Quality of Social Overtures, Unusual Eye Contact, Facial Expressions Directed to Examiner, and Amount of Reciprocal Social Communication) adequately differentiated the groups while none of the items on the ADI-R met the criteria for adequate discrimination. Results of this work highlight the challenges that clinicians and researchers face when distinguishing ASD from other disorders in verbally fluent, school-age children.
Unlawful Discrimination DEOCS 4.1 Construct Validity Summary

DTIC Science & Technology

2017-08-01

Included is a review of the 4.0 description and items, followed by the proposed modifications to the factor. The current DEOCS (4.0) contains multiple...Officer (E7 – E9) 586 10.8% Junior Officer (O1 – O3) 474 9% Senior Officer (O4 and above) 391 6.1% Descriptive Statistics and Reliability This section...displays descriptive statistics for the items on the Unlawful Discrimination scale. All items had a range from 1 to 7 (strongly disagree to strongly
Speech Perception Deficits in Poor Readers: A Reply to Denenberg's Critique.

ERIC Educational Resources Information Center

Studdert-Kennedy, Michael; Mody, Maria; Brady, Susan

2000-01-01

This rejoinder to a critique of the authors' research on speech perception deficits in poor readers answers the specific criticisms and reaffirms their conclusion that the difficulty some poor readers have with rapid /ba/-/da/ discrimination does not stem from difficulty in discriminating the rapid spectral transitions at stop-vowel syllable…
Selecting Items for Criterion-Referenced Tests.

ERIC Educational Resources Information Center

Mellenbergh, Gideon J.; van der Linden, Wim J.

1982-01-01

Three item selection methods for criterion-referenced tests are examined: the classical theory of item difficulty and item-test correlation; the latent trait theory of item characteristic curves; and a decision-theoretic approach for optimal item selection. Item contribution to the standardized expected utility of mastery testing is discussed. (CM)
Psychometric evaluation of the PainCAS Interference with Daily Activities, Psychological/Emotional Distress, and Pain scales.

PubMed

McCaffrey, Stacey A; Black, Ryan A; Butler, Stephen F

2018-03-01

The PainCAS is a web-based clinical tool for assessing and tracking pain and opioid risk in chronic pain patients. Despite evidence for its utility within the clinical setting, the PainCAS scales have never been subject to psychometric evaluation. The current study is the first to evaluate the psychometric properties of the PainCAS Interference with Daily Activities, Psychological/Emotional Distress, and Pain scales. Patients (N = 4797) from treatment centers and hospitals in 16 different states completed the PainCAS as part of routine clinical assessment. A subsample (n = 73) from two hospital-based treatment centers also completed comparator measures. Rasch Rating Scale Models were employed to evaluate the Interference with Daily Activities and Psychological/Emotional Distress scales, and empirical evaluation included assessment of dimensionality, discrimination, item fit, reliability, information, and person-to-item targeting. Additionally, convergent and discriminant validity were evaluated through classical test theory approaches. Convergent validity of the Pain scales was evaluated through correlations with corresponding comparator items. One Interference with Daily Activities item was removed due to poor functioning and discrimination. The retained items from the Interference with Daily Activities and Psychological/Emotional Distress scales conformed to unidimensional Rasch measurement models, yielding satisfactory item fit, reliability, precision, and coverage. Further, results provided support for the convergent and discriminant validity of these two scales. Convergent validity between the PainCAS Pain and BPI Pain items was also strong. Taken together, results provide strong psychometric support for these PainCAS Pain scales. Strengths and limitations of the current study are discussed.

Item Difficulty Modeling of Paragraph Comprehension Items

ERIC Educational Resources Information Center

Gorin, Joanna S.; Embretson, Susan E.

2006-01-01

Recent assessment research joining cognitive psychology and psychometric theory has introduced a new technology, item generation. In algorithmic item generation, items are systematically created based on specific combinations of features that underlie the processing required to correctly solve a problem. Reading comprehension items have been more…
Study protocol of psychometric properties of the Spanish translation of a competence test in evidence based practice: the Fresno test.

PubMed

Argimon-Pallàs, Josep M; Flores-Mateo, Gemma; Jiménez-Villa, Josep; Pujol-Ribera, Enriqueta; Foz, Gonçal; Bundó-Vidiella, Magda; Juncosa, Sebastià; Fuentes-Bellido, Cruz M; Pérez-Rodríguez, Belén; Margalef-Pallarès, Francesc; Villafafila-Ferrero, Rosa; Forès-Garcia, Dolors; Roman-Martínez, Josep; Vilert-Garroga, Esther

2009-02-24

There are few high-quality instruments for evaluating the effectiveness of Evidence-Based Practice (EBP) curricula with objective outcomes measures. The Fresno test is an instrument that evaluates most of EBP steps with a high reliability and validity in the English original version. The present study has the aims to translate the Fresno questionnaire into Spanish and its subsequent validation to ensure the equivalence of the Spanish version against the English original. The questionnaire will be translated with the back translation technique and tested in Primary Care Teaching Units in Catalonia (PCTU). Participants will be: (a) tutors of Family Medicine residents (expert group); (b) Family Medicine residents in their second year of the Family Medicine training program (novice group), and (c) Family Medicine physicians (intermediate group). The questionnaire will be administered before and after an educational intervention. The educational intervention will be an interactive four half-day sessions designed to develop the knowledge and skills required to EBP. Responsiveness statistics used in the analysis will be the effect size, the standardised response mean and Guyatt's method. For internal consistency reliability, two measures will be used: corrected item-total correlations and Cronbach's alpha. Inter-rater reliability will be tested using Kappa coefficient for qualitative items and intra-class correlation coefficient for quantitative items and the overall score. Construct validity, item difficulty, item discrimination and feasibility will be determined. The validation of the Fresno questionnaire into different languages will enable the expansion of the questionnaire, as well as allowing comparison between countries and the evaluation of different teaching models.
Item difficulty and item validity for the Children's Group Embedded Figures Test.

PubMed

Rusch, R R; Trigg, C L; Brogan, R; Petriquin, S

1994-02-01

The validity and reliability of the Children's Group Embedded Figures Test was reported for students in Grade 2 by Cromack and Stone in 1980; however, a search of the literature indicates no evidence for internal consistency or item analysis. Hence the purpose of this study was to examine the item difficulty and item validity of the test with children in Grades 1 and 2. Confusion in the literature over development and use of this test was seemingly resolved through analysis of these descriptions and through an interview with the test developer. One early-appearing item was unreasonably difficult. Two or three other items were quite difficult and made little contribution to the total score. Caution is recommended, however, in any reordering or elimination of items based on these findings, given the limited number of subjects (n = 84).
North American Veterinary Licensing Examination pacing study.

PubMed

Subhiyah, Raja G; Boyce, John R

2010-01-01

The National Board of Veterinary Medical Examiners was interested in the possible effects of word count on the outcomes of the North American Veterinary Licensing Examination. In this study, the authors investigated the effects of increasing word count on the pacing of examinees during each section of the examination and on the performance of examinees on the items. Specifically, the authors analyzed the effect of item word count on the average time spent on each item within a section of the examination, the average number of items omitted at the end of a section, and the average difficulty of items as a function of presentation order. The average word count per item increased from 2001 to 2008. As expected, there was a relationship between word count and time spent on the item. No significant relationship was found between word count and item difficulty, and an analysis of omitted items and pacing patterns showed no indication of overall pacing problems.
Measuring student learning using initial and final concept test in an STEM course

NASA Astrophysics Data System (ADS)

Kaw, Autar; Yalcin, Ali

2012-06-01

Effective assessment is a cornerstone in measuring student learning in higher education. For a course in Numerical Methods, a concept test was used as an assessment tool to measure student learning and its improvement during the course. The concept test comprised 16 multiple choice questions and was given in the beginning and end of the class for three semesters. Hake's gain index, a measure of learning gains from pre- to post-tests, of 0.36 to 0.41 were recorded. The validity and reliability of the concept test was checked via standard measures such as Cronbach's alpha, content and criterion-related validity, item characteristic curves and difficulty and discrimination indices. The performance of various subgroups such as pre-requisite grades, transfer students, gender and age were also studied.
Working memory capacity and fluid abilities: the more difficult the item, the more more is better.

PubMed

Little, Daniel R; Lewandowsky, Stephan; Craig, Stewart

2014-01-01

The relationship between fluid intelligence and working memory is of fundamental importance to understanding how capacity-limited structures such as working memory interact with inference abilities to determine intelligent behavior. Recent evidence has suggested that the relationship between a fluid abilities test, Raven's Progressive Matrices, and working memory capacity (WMC) may be invariant across difficulty levels of the Raven's items. We show that this invariance can only be observed if the overall correlation between Raven's and WMC is low. Simulations of Raven's performance revealed that as the overall correlation between Raven's and WMC increases, the item-wise point bi-serial correlations involving WMC are no longer constant but increase considerably with item difficulty. The simulation results were confirmed by two studies that used a composite measure of WMC, which yielded a higher correlation between WMC and Raven's than reported in previous studies. As expected, with the higher overall correlation, there was a significant positive relationship between Raven's item difficulty and the extent of the item-wise correlation with WMC.
Rise time and formant transition duration in the discrimination of speech sounds: the Ba-Wa distinction in developmental dyslexia.

PubMed

Goswami, Usha; Fosker, Tim; Huss, Martina; Mead, Natasha; Szucs, Dénes

2011-01-01

Across languages, children with developmental dyslexia have a specific difficulty with the neural representation of the sound structure (phonological structure) of speech. One likely cause of their difficulties with phonology is a perceptual difficulty in auditory temporal processing (Tallal, 1980). Tallal (1980) proposed that basic auditory processing of brief, rapidly successive acoustic changes is compromised in dyslexia, thereby affecting phonetic discrimination (e.g. discriminating /b/ from /d/) via impaired discrimination of formant transitions (rapid acoustic changes in frequency and intensity). However, an alternative auditory temporal hypothesis is that the basic auditory processing of the slower amplitude modulation cues in speech is compromised (Goswami et al., 2002). Here, we contrast children's perception of a synthetic speech contrast (ba/wa) when it is based on the speed of the rate of change of frequency information (formant transition duration) versus the speed of the rate of change of amplitude modulation (rise time). We show that children with dyslexia have excellent phonetic discrimination based on formant transition duration, but poor phonetic discrimination based on envelope cues. The results explain why phonetic discrimination may be allophonic in developmental dyslexia (Serniclaes et al., 2004), and suggest new avenues for the remediation of developmental dyslexia. © 2010 Blackwell Publishing Ltd.
Bilingual health literacy assessment using the Talking Touchscreen/la Pantalla Parlanchina: Development and pilot testing.

PubMed

Yost, Kathleen J; Webster, Kimberly; Baker, David W; Choi, Seung W; Bode, Rita K; Hahn, Elizabeth A

2009-06-01

Current health literacy measures are too long, imprecise, or have questionable equivalence of English and Spanish versions. The purpose of this paper is to describe the development and pilot testing of a new bilingual computer-based health literacy assessment tool. We analyzed literacy data from three large studies. Using a working definition of health literacy, we developed new prose, document and quantitative items in English and Spanish. Items were pilot tested on 97 English- and 134 Spanish-speaking participants to assess item difficulty. Items covered topics relevant to primary care patients and providers. English- and Spanish-speaking participants understood the tasks involved in answering each type of question. The English Talking Touchscreen was easy to use and the English and Spanish items provided good coverage of the difficulty continuum. Qualitative and quantitative results provided useful information on computer acceptability and initial item difficulty. After the items have been administered on the Talking Touchscreen (la Pantalla Parlanchina) to 600 English-speaking (and 600 Spanish-speaking) primary care patients, we will develop a computer adaptive test. This health literacy tool will enable clinicians and researchers to more precisely determine the level at which low health literacy adversely affects health and healthcare utilization.
Developing an Interpretation of Item Parameters for Personality Items: Content Correlates of Parameter Estimates.

ERIC Educational Resources Information Center

Zickar, Michael J.; Ury, Karen L.

2002-01-01

Attempted to relate content features of personality items to item parameter estimates from the partial credit model of E. Muraki (1990) by administering the Adjective Checklist (L. Goldberg, 1992) to 329 undergraduates. As predicted, the discrimination parameter was related to the item subtlety ratings of personality items but the level of word…
A Note on Item-Restscore Association in Rasch Models

ERIC Educational Resources Information Center

Kreiner, Svend

2011-01-01

To rule out the need for a two-parameter item response theory (IRT) model during item analysis by Rasch models, it is important to check the Rasch model's assumption that all items have the same item discrimination. Biserial and polyserial correlation coefficients measuring the association between items and restscores are often used in an informal…
Introduction to Psychology and Leadership. Rank-Biserial Correlation as an Item Discrimination.

ERIC Educational Resources Information Center

Westinghouse Learning Corp., Annapolis, MD.

Written as a technical report for the leadership course of the United States Naval Academy (see the final reports which summarize the course development project, EM 010 418, EM 010 419, and EM 010 484), this paper examines the use and interpretation of the rank-biserial correlation as an index of item discrimination. The advantages and…
Vineland-II adaptive behavior profile of children with attention-deficit/hyperactivity disorder or specific learning disorders.

PubMed

Balboni, Giulia; Incognito, Oriana; Belacchi, Carmen; Bonichini, Sabrina; Cubelli, Roberto

2017-02-01

The evaluation of adaptive behavior is informative in children with attention-deficit/hyperactivity disorder (ADHD) or specific learning disorders (SLD). However, the few investigations available have focused only on the gross level of domains of adaptive behavior. To investigate which item subsets of the Vineland-II can discriminate children with ADHD or SLD from peers with typical development. Student's t-tests, ROC analysis, logistic regression, and linear discriminant function analysis were used to compare 24 children with ADHD, 61 elementary students with SLD, and controls matched on age, sex, school level attended, and both parents' education level. Several item subsets that address not only ADHD core symptoms, but also understanding in social context and development of interpersonal relationships, allowed discrimination of children with ADHD from controls. The combination of four item subsets (Listening and attending, Expressing complex ideas, Social communication, and Following instructions) classified children with ADHD with both sensitivity and specificity of 87.5%. Only Reading skills, Writing skills, and Time and dates discriminated children with SLD from controls. Evaluation of Vineland-II scores at the level of item content categories is a useful procedure for an efficient clinical description. Copyright © 2016 Elsevier Ltd. All rights reserved.
Mokken scaling of the Myocardial Infarction Dimensional Assessment Scale (MIDAS).

PubMed

Thompson, David R; Watson, Roger

2011-02-01

The purpose of this study was to examine the hierarchical and cumulative nature of the 35 items of the Myocardial Infarction Dimensional Assessment Scale (MIDAS), a disease-specific health-related quality of life measure. Data from 668 participants who completed the MIDAS were analysed using the Mokken Scaling Procedure, which is a computer program that searches polychotomous data for hierarchical and cumulative scales on the basis of a range of diagnostic criteria. Fourteen MIDAS items were retained in a Mokken scale and these items included physical activity, insecurity, emotional reaction and dependency items but excluded items related to diet, medication or side-effects. Item difficulty, in item response theory terms, ran from physical activity items (low difficulty) to insecurity, suggesting that the most severe quality of life effect of myocardial infarction is loneliness and isolation. Items from the MIDAS form a strong and reliable Mokken scale, which provides new insight into the relationship between items in the MIDAS and the measurement of quality of life after myocardial infarction. © 2010 Blackwell Publishing Ltd.
Can health care providers recognise a fibromyalgia personality?

PubMed

Da Silva, José A P; Jacobs, Johannes W G; Branco, Jaime C; Canaipa, Rita; Gaspar, M Filomena; Griep, Ed N; van Helmond, Toon; Oliveira, Paula J; Zijlstra, Theo J; Geenen, Rinie

2017-01-01

To determine if experienced health care providers (HCPs) can recognise patients with fibromyalgia (FM) based on a limited set of personality items, exploring the existence of a FM personality. From the 240-item NEO-PI-R personality questionnaire, 8 HCPs from two different countries each selected 20 items they considered most discriminative of FM personality. Then, evaluating the scores on these items of 129 female patients with FM and 127 female controls, each HCP rated the probability of FM for each individual on a 0-10 scale. Personality characteristics (domains and facets) of selected items were determined. Scores of patients with FM and controls on the eight 20-item sets, and HCPs' estimates of each individual's probability of FM were analysed for their discriminative value. The eight 20-item sets discriminated for FM, with areas under the receiver operating characteristic curve ranging from 0.71-0.81. The estimated probabilities for FM showed, in general, percentages of correct classifications above 50%, with rising correct percentages for higher estimated probabilities. The most often chosen and discriminatory items were predominantly of the domain neuroticism (all with higher scores in FM), followed by some items of the facet trust (lower scores in FM). HCPs can, based on a limited set of items from a personality questionnaire, distinguish patients with FM from controls with a statistically significant probability. The HCPs' expectation that personality in FM patients is associated with higher levels for aspects of neuroticism (proneness to psychological distress) and lower scores for aspects of trust, proved to be correct.
Item-specific processing reduces false memories.

PubMed

McCabe, David P; Presmanes, Alison G; Robertson, Chuck L; Smith, Anderson D

2004-12-01

We examined the effect of item-specific and relational encoding instructions on false recognition in two experiments in which the DRM paradigm was used (Deese, 1959; Roediger & McDermott, 1995). Type of encoding (item-specific or relational) was manipulated between subjects in Experiment 1 and within subjects in Experiment 2. Decision-based explanations (e.g., the distinctiveness heuristic) predict reductions in false recognition in between-subjects designs, but not in within-subjects designs, because they are conceptualized as global shifts in decision criteria. Memory-based explanations predict reductions in false recognition in both designs, resulting from enhanced recollection of item-specific details. False recognition was reduced following item-specific encoding instructions in both experiments, favoring a memory-based explanation. These results suggest that providing unique cues for the retrieval of individual studied items results in enhanced discrimination between those studied items and critical lures. Conversely, enhancing the similarity of studied items results in poor discrimination among items within a particular list theme. These results are discussed in terms of the item-specific/ relational framework (Hunt & McDaniel, 1993).
Interpretation of the Rasch Ability and Difficulty Scales for Educational Purposes.

ERIC Educational Resources Information Center

Woodcock, Richard W.

Though many test developers have utilized item response theory in their work, few have taken advantage of the potential of item response theory for providing new interpretation procedures that accentuate the educational implications to be drawn from test scores. This paper describes several features, based upon the Rasch difficulty and ability…
The Effect of Anchor Test Construction on Scale Drift

ERIC Educational Resources Information Center

Antal, Judit; Proctor, Thomas P.; Melican, Gerald J.

2014-01-01

In common-item equating the anchor block is generally built to represent a miniature form of the total test in terms of content and statistical specifications. The statistical properties frequently reflect equal mean and spread of item difficulty. Sinharay and Holland (2007) suggested that the requirement for equal spread of difficulty may be too…
Reducing the number of options on multiple-choice questions: response time, psychometrics and standard setting.

PubMed

Schneid, Stephen D; Armour, Chris; Park, Yoon Soo; Yudkowsky, Rachel; Bordage, Georges

2014-10-01

Despite significant evidence supporting the use of three-option multiple-choice questions (MCQs), these are rarely used in written examinations for health professions students. The purpose of this study was to examine the effects of reducing four- and five-option MCQs to three-option MCQs on response times, psychometric characteristics, and absolute standard setting judgements in a pharmacology examination administered to health professions students. We administered two versions of a computerised examination containing 98 MCQs to 38 Year 2 medical students and 39 Year 3 pharmacy students. Four- and five-option MCQs were converted into three-option MCQs to create two versions of the examination. Differences in response time, item difficulty and discrimination, and reliability were evaluated. Medical and pharmacy faculty judges provided three-level Angoff (TLA) ratings for all MCQs for both versions of the examination to allow the assessment of differences in cut scores. Students answered three-option MCQs an average of 5 seconds faster than they answered four- and five-option MCQs (36 seconds versus 41 seconds; p = 0.008). There were no significant differences in item difficulty and discrimination, or test reliability. Overall, the cut scores generated for three-option MCQs using the TLA ratings were 8 percentage points higher (p = 0.04). The use of three-option MCQs in a health professions examination resulted in a time saving equivalent to the completion of 16% more MCQs per 1-hour testing period, which may increase content validity and test score reliability, and minimise construct under-representation. The higher cut scores may result in higher failure rates if an absolute standard setting method, such as the TLA method, is used. The results from this study provide a cautious indication to health professions educators that using three-option MCQs does not threaten validity and may strengthen it by allowing additional MCQs to be tested in a fixed amount of testing time with no deleterious effect on the reliability of the test scores. © 2014 John Wiley & Sons Ltd.
Simple mental addition in children with and without mild mental retardation.

PubMed

Janssen, R; De Boeck, P; Viaene, M; Vallaeys, L

1999-11-01

The speeded performance on simple mental addition problems of 6- and 7-year-old children with and without mild mental retardation is modeled from a person perspective and an item perspective. On the person side, it was found that a single cognitive dimension spanned the performance differences between the two ability groups. However, a discontinuity, or "jump," was observed in the performance of the normal ability group on the easier items. On the item side, the addition problems were almost perfectly ordered in difficulty according to their problem size. Differences in difficulty were explained by factors related to the difficulty of executing nonretrieval strategies. All findings were interpreted within the framework of Siegler's (e.g., R. S. Siegler & C. Shipley, 1995) model of children's strategy choices in arithmetic. Models from item response theory were used to test the hypotheses. Copyright 1999 Academic Press.
Discriminating Children with Autism from Children with Learning Difficulties with an Adaptation of the Short Sensory Profile

ERIC Educational Resources Information Center

O'Brien, Justin; Tsermentseli, Stella; Cummins, Omar; Happe, Francesca; Heaton, Pamela; Spencer, Janine

2009-01-01

In this article, we examine the extent to which children with autism and children with learning difficulties can be discriminated from their responses to different patterns of sensory stimuli. Using an adapted version of the Short Sensory Profile (SSP), sensory processing was compared in 34 children with autism to 33 children with typical…

Readability Level of Standardized Test Items and Student Performance: The Forgotten Validity Variable

ERIC Educational Resources Information Center

Hewitt, Margaret A.; Homan, Susan P.

2004-01-01

Test validity issues considered by test developers and school districts rarely include individual item readability levels. In this study, items from a major standardized test were examined for individual item readability level and item difficulty. The Homan-Hewitt Readability Formula was applied to items across three grade levels. Results of…
Assisting Australians with mental health problems and financial difficulties: a Delphi study to develop guidelines for financial counsellors, financial institution staff, mental health professionals and carers.

PubMed

Bond, Kathy S; Chalmers, Kathryn J; Jorm, Anthony F; Kitchener, Betty A; Reavley, Nicola J

2015-06-03

There is a strong association between mental health problems and financial difficulties. Therefore, people who work with those who have financial difficulties (financial counsellors and financial institution staff) need to have knowledge and helping skills relevant to mental health problems. Conversely, people who support those with mental health problems (mental health professionals and carers) may need to have knowledge and helping skills relevant to financial difficulties. The Delphi expert consensus method was used to develop guidelines for people who work with or support those with mental health problems and financial difficulties. A systematic review of websites, books and journal articles was conducted to develop a questionnaire containing items about the knowledge, skills and actions relevant to working with or supporting someone with mental health problems and financial difficulties. These items were rated over three rounds by five Australian expert panels comprising of financial counsellors (n = 33), financial institution staff (n = 54), mental health professionals (n = 31), consumers (n = 20) and carers (n = 24). A total of 897 items were rated, with 462 items endorsed by at least 80 % of members of each of the expert panels. These endorsed statements were used to develop a set of guidelines for financial counsellors, financial institution staff, mental health professionals and carers about how to assist someone with mental health problems and financial difficulties. A diverse group of expert panel members were able to reach substantial consensus on the knowledge, skills and actions needed to work with and support people with mental health problems and financial difficulties. These guidelines can be used to inform policy and practice in the financial and mental health sectors.
Sexual orientation and boyhood gender conformity: development of the Boyhood Gender Conformity Scale (BGCS)

PubMed

Hockenberry, S L; Billingham, R E

1987-12-01

Two hundred twenty-five [corrected] respondents (109 [corrected] heterosexuals and 116 [corrected] homosexuals) completed a survey containing a 20-item Boyhood Gender Conformity Scale (BGCS). This scale was largely composed of edited and abridged gender items from Part A of Freund et al.'s Feminine Gender Identity Scale (FGIS-A) and Whitam's "childhood indicators." The combined scale was developed in an attempt to obtain a reliable, valid, and potent discriminating instrument for accurately classifying adult male respondents for sexual orientation on the basis of their reported boyhood gender conformity or nonconforming behavior and identity. In addition, 33% of these respondents were administered the original FGIS-A and Whitam inventory during a 2-week test-retest analysis conducted to determine the validity and reliability of the new instrument. All the original items significantly discriminated between heterosexual and homosexual respondents. From these a 13-item function and a 5-item function proved to be the most powerful discriminators between the two groups. Significant correlations between each of the three scales and a very high test-retest correlation coefficient supported the reliability and validity assumption for the BGCS. The conclusion was made that the five-item function (playing with boys, preferring [corrected] boys' games, imagining self as sports figure, reading adventure and sports stories, considered a "sissy") was the most potent and parsimonious discriminator among adult males for sexual orientation. It was similarly noted that the absence of masculine behaviors and traits appeared to be a more powerful predictor of later homosexual orientation than the traditionally feminine or cross-sexed traits and behaviors.
Brief Report: Best Discriminators for Identifying Children with Autism Spectrum Disorder at an 18-Month Health Check-Up in Japan

ERIC Educational Resources Information Center

Kamio, Yoko; Haraguchi, Hideyuki; Stickley, Andrew; Ogino, Kazuo; Ishitobi, Makoto; Takahashi, Hidetoshi

2015-01-01

To determine the best discriminative items for identifying young children with autism spectrum disorders (ASD), we conducted a secondary analysis using longitudinal cohort data that included the Japanese version of the 23-item modified checklist for autism in toddlers (M-CHAT-JV). M-CHAT-JV data at 18 months of age and diagnostic information…
Psychometric properties of the Sexual Excitation/Sexual Inhibition Inventory for Women and Men (SESII-W/M) and the Sexual Excitation Scales/Sexual Inhibition Scales short form (SIS/SES-SF) in a population-based sample in Germany

PubMed Central

Scholten, Saskia; Margraf, Jürgen

2018-01-01

The Sexual Excitation Sexual/Inhibition Inventory for Women and Men (SESII-W/M) and the Sexual Excitation Scales/Sexual Inhibition Scales short form (SIS/SES-SF) are two self-report questionnaires for assessing sexual excitation (SE) and sexual inhibition (SI). According to the dual control model of sexual response, SE and SI differ between individuals and influence the occurrence of sexual arousal in given situations. Extreme levels of SE and SI are postulated to be associated with sexual difficulties or risky sexual behaviors. The present study was designed to assess the psychometric properties of the German versions of both questionnaires utilizing a large population-based sample of 2,708 participants (Mage = 51.19, SD = 14.03). Overall, psychometric evaluation of the two instruments yielded good convergent and discriminant validity and mediocre to good internal consistency. The original 30-item version of the SESII-W/M did not show a sufficient model fit. For a 24-item version of the SESII-W/M partial strong measurement invariance across gender, and strong measurement invariance across relationship status, age, and educational levels were established. The original structure (14 items, 3 factors) of the SIS/SES-SF was not replicated. However, a 4-factor model including 13 items showed a good model fit and strong measurement invariance across the before-mentioned participant groups. For both questionnaires, partial strong measurement invariance with the original American versions of the scales was found. As some factors showed unsatisfactory internal consistency and the factor structure of the original scales could not be replicated, scores on several SE- and SI-factors should be interpreted with caution. However, most analyses indicated sufficient psychometric quality of the German SESII-W/M and SIS/SES-SF and their use can be recommended in German-speaking samples. More research with diverse samples (i.e., different sexual orientations, individuals with sexual difficulties) is needed to ensure the replicability of the factor solutions presented in this study. PMID:29529045
Validity and Reliability of the 8-Item Work Limitations Questionnaire.

PubMed

Walker, Timothy J; Tullar, Jessica M; Diamond, Pamela M; Kohl, Harold W; Amick, Benjamin C

2017-12-01

Purpose To evaluate factorial validity, scale reliability, test-retest reliability, convergent validity, and discriminant validity of the 8-item Work Limitations Questionnaire (WLQ) among employees from a public university system. Methods A secondary analysis using de-identified data from employees who completed an annual Health Assessment between the years 2009-2015 tested research aims. Confirmatory factor analysis (CFA) (n = 10,165) tested the latent structure of the 8-item WLQ. Scale reliability was determined using a CFA-based approach while test-retest reliability was determined using the intraclass correlation coefficient. Convergent/discriminant validity was tested by evaluating relations between the 8-item WLQ with health/performance variables for convergent validity (health-related work performance, number of chronic conditions, and general health) and demographic variables for discriminant validity (gender and institution type). Results A 1-factor model with three correlated residuals demonstrated excellent model fit (CFI = 0.99, TLI = 0.99, RMSEA = 0.03, and SRMR = 0.01). The scale reliability was acceptable (0.69, 95% CI 0.68-0.70) and the test-retest reliability was very good (ICC = 0.78). Low-to-moderate associations were observed between the 8-item WLQ and the health/performance variables while weak associations were observed between the demographic variables. Conclusions The 8-item WLQ demonstrated sufficient reliability and validity among employees from a public university system. Results suggest the 8-item WLQ is a usable alternative for studies when the more comprehensive 25-item WLQ is not available.
A Monte Carlo Study Investigating the Influence of Item Discrimination, Category Intersection Parameters, and Differential Item Functioning Patterns on the Detection of Differential Item Functioning in Polytomous Items

ERIC Educational Resources Information Center

Thurman, Carol

2009-01-01

The increased use of polytomous item formats has led assessment developers to pay greater attention to the detection of differential item functioning (DIF) in these items. DIF occurs when an item performs differently for two contrasting groups of respondents (e.g., males versus females) after controlling for differences in the abilities of the…
Exploring autistic traits in anorexia: a clinical study.

PubMed

Tchanturia, Kate; Smith, Emma; Weineck, Felicitas; Fidanboylu, Eliz; Kern, Nikola; Treasure, Janet; Baron Cohen, Simon

2013-11-12

The objectives of this study were to explore associations between autistic traits and self-reported clinical symptoms in a population with anorexia nervosa (AN). Experimental and self-report evidence reveals similarities between AN and autism spectrum condition (ASC) populations in socio-emotional and cognitive domains; this includes difficulties with empathy, set-shifting and global processing. Focusing on these similarities may lead to better tailored interventions for both conditions. A cross-sectional independent-groups design was employed. Participants with AN (n = 66) and typical controls (n = 66) completed self-report questionnaires including the Short (10-Item) Version Autism Spectrum Quotient (AQ-10) questionnaire (the first time this has been implemented in this population), the Eating Disorder Examination Questionnaire, the Hospital Anxiety and Depression Scale and the Work and Social Adjustment Scale. Group differences and the relationship between autistic traits and other questionnaire measures were investigated. The AN group had a significantly higher AQ-10 total score and a greater proportion scored above the clinical cut-off than the control group. Seven out of ten AQ-10 items significantly discriminated between groups. In the AN group, levels of autistic traits correlated with a greater self-reported anxiety and depression and a lower ability to maintain close relationships; however, eating disorder symptoms were not associated with autistic traits. Women with anorexia possess a greater number of autistic traits than typical women. AQ-10 items that discriminated between groups related to 'bigger picture' (global) thinking, inflexibility of thinking and problems with social interactions, suggesting that autistic traits may exacerbate factors that maintain the eating disorder rather than cause the eating disorder directly. Using screening instruments may improve understanding of patients' problems, leading to better tailoring of intervention. We conclude that further investigation of autistic traits in AN could inform new intervention approaches based on joint working between ASC and eating disorder services.
Two Methods for Teaching Simple Visual Discriminations to Learners with Severe Disabilities

ERIC Educational Resources Information Center

Graff, Richard B.; Green, Gina

2004-01-01

Simple discriminations are involved in many functional skills; additionally, they are components of conditional discriminations (identity and arbitrary matching-to-sample), which are involved in a wide array of other important performances. Many individuals with severe disabilities have difficulty acquiring simple discriminations with standard…
Discriminability and Sensitivity to Reinforcer Magnitude in a Detection Task

ERIC Educational Resources Information Center

Alsop, Brent; Porritt, Melissa

2006-01-01

Three pigeons discriminated between two sample stimuli (intensities of red light). The difficulty of the discrimination was varied over four levels. At each level, the relative reinforcer magnitude for the two correct responses was varied across conditions, and the reinforcer rates were equal. Within levels, discriminability between the sample…
Using Student Ability and Item Difficulty for Making Defensible Pass/Fail Decisions for Borderline Grades

ERIC Educational Resources Information Center

Shulruf, Boaz; Jones, Phil; Turner, Rolf

2015-01-01

The determination of Pass/Fail decisions over Borderline grades, (i.e., grades which do not clearly distinguish between the competent and incompetent examinees) has been an ongoing challenge for academic institutions. This study utilises the Objective Borderline Method (OBM) to determine examinee ability and item difficulty, and from that…
Psychometric Properties of the Chinese Version of the Beck Depression Inventory-II Using the Rasch Model

ERIC Educational Resources Information Center

Wu, Pei-Chen; Chang, Lily

2008-01-01

The authors investigated the Chinese version of the Beck Depression Inventory-II (BDI-II-C; Chinese Behavioral Science Corporation, 2000) within the Rasch framework in terms of dimensionality, item difficulty, and category functioning. Two underlying scale dimensions, relatively high item difficulties, and a need for collapsing 2 response…
A Comparison of Three Types of Test Development Procedures Using Classical and Latent Trait Methods.

ERIC Educational Resources Information Center

Benson, Jeri; Wilson, Michael

Three methods of item selection were used to select sets of 38 items from a 50-item verbal analogies test and the resulting item sets were compared for internal consistency, standard errors of measurement, item difficulty, biserial item-test correlations, and relative efficiency. Three groups of 1,500 cases each were used for item selection. First…
Effects of Anchor Item Methods on the Detection of Differential Item Functioning within the Family of Rasch Models

ERIC Educational Resources Information Center

Wang, Wen-Chung

2004-01-01

Scale indeterminacy in analysis of differential item functioning (DIF) within the framework of item response theory can be resolved by imposing 3 anchor item methods: the equal-mean-difficulty method, the all-other anchor item method, and the constant anchor item method. In this article, applicability and limitations of these 3 methods are…
The influence of strategic encoding on false memory in patients with mild cognitive impairment and Alzheimer's disease dementia.

PubMed

Tat, Michelle J; Soonsawat, Anothai; Nagle, Corinne B; Deason, Rebecca G; O'Connor, Maureen K; Budson, Andrew E

2016-11-01

Patients with Alzheimer's disease (AD) dementia exhibit high rates of memory distortions in addition to their impairments in episodic memory. Several investigations have demonstrated that when healthy individuals (young and old) engaged in an encoding strategy that emphasized the uniqueness of study items (an item-specific encoding strategy), they were able to improve their discrimination between old items and unstudied critical lure items in a false memory task. In the present study we examined if patients with AD could also improve their memory discrimination when engaging in an item-specific encoding strategy. Healthy older adult controls, patients with mild cognitive impairment (MCI) due to AD, and patients with mild AD dementia were asked to study lists of categorized words. In the Item-Specific condition, participants were asked to provide a unique detail or personal experience with each study item. In the Relational condition, they were asked to determine how each item in the list was related to the others. To assess the influence of both strategies, recall and recognition memory tests were administered. Overall, both patient groups exhibited poorer memory in both recall and recognition tests compared to controls. In terms of recognition, healthy older controls and patients with MCI due to AD exhibited improved memory discrimination in the Item-Specific condition compared to the Relational condition, whereas patients with AD dementia did not. We speculate that patients with MCI due to AD use intact frontal networks to effectively engage in this strategy. Published by Elsevier Inc.
The influence of strategic encoding on false memory in patients with mild cognitive impairment and Alzheimer’s disease dementia

PubMed Central

Tat, Michelle J.; Soonsawat, Anothai; Nagle, Corinne B.; Deason, Rebecca G.; O’Connor, Maureen K.; Budson, Andrew E.

2018-01-01

Patients with Alzheimer’s disease (AD) dementia exhibit high rates of memory distortions in addition to their impairments in episodic memory. Several investigations have demonstrated that when healthy individuals (young and old) engaged in an encoding strategy that emphasized the uniqueness of study items (an item-specific encoding strategy), they were able to improve their discrimination between old items and unstudied critical lure items in a false memory task. In the present study we examined if patients with AD could also improve their memory discrimination when engaging in an item-specific encoding strategy. Healthy older adult controls, patients with mild cognitive impairment (MCI) due to AD, and patients with mild AD dementia were asked to study lists of categorized words. In the Item-Specific condition, participants were asked to provide a unique detail or personal experience with each study item. In the Relational condition, they were asked to determine how each item in the list was related to the others. To assess the influence of both strategies, recall and recognition memory tests were administered. Overall, both patient groups exhibited poorer memory in both recall and recognition tests compared to controls. In terms of recognition, healthy older controls and patients with MCI due to AD exhibited improved memory discrimination in the Item-Specific condition compared to the Relational condition, whereas patients with AD dementia did not. We speculate that patients with MCI due to AD use intact frontal networks to effectively engage in this strategy. PMID:27643951
Transfer in motion perceptual learning depends on the difficulty of the training task.

PubMed

Wang, Xiaoxiao; Zhou, Yifeng; Liu, Zili

2013-06-07

One hypothesis in visual perceptual learning is that the amount of transfer depends on the difficulty of the training and transfer tasks (Ahissar & Hochstein, 1997; Liu, 1995, 1999). Jeter, Dosher, Petrov, and Lu (2009), using an orientation discrimination task, challenged this hypothesis by arguing that the amount of transfer depends only on the transfer task but not on the training task. Here we show in a motion direction discrimination task that the amount of transfer indeed depends on the difficulty of the training task. Specifically, participants were first trained with either 4° or 8° direction discrimination along one average direction. Their transfer performance was then tested along an average direction 90° away from the trained direction. A variety of transfer measures consistently demonstrated that transfer performance depended on whether the participants were trained on 4° or 8° directional difference. The results contradicted the prediction that transfer was independent of the training task difficulty.
Psychometric properties of the medical outcomes study sleep scale in Spanish postmenopausal women.

PubMed

Zagalaz-Anula, Noelia; Hita-Contreras, Fidel; Martínez-Amat, Antonio; Cruz-Díaz, David; Lomas-Vega, Rafael

2017-07-01

This study aimed to analyze the reliability and validity of the Spanish version of the Medical Outcomes Study Sleep Scale (MOS-SS), and its ability to discriminate between poor and good sleepers among a Spanish population with vestibular disorders. In all, 121 women (50-76 years old) completed the Spanish version of the MOS-SS. Internal consistency, test-retest reliability, and construct validity (exploratory factor analysis) were analyzed. Concurrent validity was evaluated using the Pittsburgh Sleep Quality Index and the 36-item Short Form Health Survey. To analyze the ability of the MOS-SS scores to discriminate between poor and good sleepers, a receiver-operating characteristic curve analysis was performed. The Spanish version of the MOS-SS showed excellent and substantial reliability in Sleep Problems Index I (two sleep disturbance items, one somnolence item, two sleep adequacy items, and awaken short of breath or with headache) and Sleep Problems Index II (four sleep disturbance items, two somnolence items, two sleep adequacy items, and awaken short of breath or with headache), respectively, and good internal consistency with optimal Cronbach's alpha values in all domains and indexes (0.70-0.90). Factor analysis suggested a coherent four-factor structure (explained variance 70%). In concurrent validity analysis, MOS-SS indexes showed significant and strong correlation with the Pittsburgh Sleep Quality Index total score, and moderate with the 36-item Short Form Health Survey component summaries. Several domains and the two indexes were significantly able to discriminate between poor and good sleepers (P < 0.05). Optimal cut-off points were above 20 for "sleep disturbance" domain, with above 22.22 and above 33.33 for Sleep Problems Index I and II. The Spanish version of the MOS-SS is a valid and reliable instrument, suitable to assess sleep quality in Spanish postmenopausal women, with satisfactory general psychometric properties. It discriminates well between good and poor sleepers.
Talent identification model for sprinter using discriminant factor

NASA Astrophysics Data System (ADS)

Kusnanik, N. W.; Hariyanto, A.; Herdyanto, Y.; Satia, A.

2018-01-01

The main purpose of this study was to identify young talented sprinter using discriminant factor. The research was conducted in 3 steps including item pool, screening of item pool, and trial of instruments at the small and big size of samples. 315 male elementary school students participated in this study with mean age of 11-13 years old. Data were collected by measuring anthropometry (standing height, sitting height, body mass, and leg length); testing physical fitness (40m sprint for speed, shuttle run for agility, standing broad jump for power, multistage fitness test for endurance). Data were analyzed using discriminant factor. The result of this study found that there were 5 items that selected as an instrument to identify young talented sprinter: sitting height, body mass, leg length, sprint 40m, and multistage fitness test. Model of Discriminant for talent identification in sprinter was D = -24,497 + (0,155 sitting height) + (0,080 body mass) + (0,148 leg length) + (-1,225 Sprint 40m) + (0,563 MFT). The conclusion of this study: instrument tests that have been selected and discriminant model that have been found can be applied to identify young talented as a sprinter.
Classical Item Analysis Using Latent Variable Modeling: A Note on a Direct Evaluation Procedure

ERIC Educational Resources Information Center

Raykov, Tenko; Marcoulides, George A.

2011-01-01

A directly applicable latent variable modeling procedure for classical item analysis is outlined. The method allows one to point and interval estimate item difficulty, item correlations, and item-total correlations for composites consisting of categorical items. The approach is readily employed in empirical research and as a by-product permits…

Item Estimates under Low-Stakes Conditions: How Should Omits Be Treated?

ERIC Educational Resources Information Center

DeMars, Christine

Using data from a pilot test of science and math from students in 30 high schools, item difficulties were estimated with a one-parameter model (partial-credit model for the multi-point items). Some items were multiple-choice items, and others were constructed-response items (open-ended). Four sets of estimates were obtained: estimates for males…
Analysis of the Difficulty and Discrimination Indices of Multiple-Choice Questions According to Cognitive Levels in an Open and Distance Learning Context

ERIC Educational Resources Information Center

Koçdar, Serpil; Karadag, Nejdet; Sahin, Murat Dogan

2016-01-01

This is a descriptive study which intends to determine whether the difficulty and discrimination indices of the multiple-choice questions show differences according to cognitive levels of the Bloom's Taxonomy, which are used in the exams of the courses in a business administration bachelor's degree program offered through open and distance…
The Dominance Concept Inventory: A Tool for Assessing Undergraduate Student Alternative Conceptions about Dominance in Mendelian and Population Genetics

PubMed Central

Perez, Kathryn E.; Price, Rebecca M.

2014-01-01

Despite the impact of genetics on daily life, biology undergraduates understand some key genetics concepts poorly. One concept requiring attention is dominance, which many students understand as a fixed property of an allele or trait and regularly conflate with frequency in a population or selective advantage. We present the Dominance Concept Inventory (DCI), an instrument to gather data on selected alternative conceptions about dominance. During development of the 16-item test, we used expert surveys (n = 12), student interviews (n = 42), and field tests (n = 1763) from introductory and advanced biology undergraduates at public and private, majority- and minority-serving, 2- and 4-yr institutions in the United States. In the final field test across all subject populations (n = 709), item difficulty ranged from 0.08 to 0.84 (0.51 ± 0.049 SEM), while item discrimination ranged from 0.11 to 0.82 (0.50 ± 0.048 SEM). Internal reliability (Cronbach's alpha) was 0.77, while test–retest reliability values were 0.74 (product moment correlation) and 0.77 (intraclass correlation). The prevalence of alternative conceptions in the field tests shows that introductory and advanced students retain confusion about dominance after instruction. All measures support the DCI as a useful instrument for measuring undergraduate biology student understanding and alternative conceptions about dominance. PMID:26086665
Development and psychometric analysis of the Brief DSM-5 Alcohol Use Disorder Diagnostic Assessment: Towards effective diagnosis in college students.

PubMed

Hagman, Brett T

2017-11-01

The Diagnostic and Statistical Manual of Mental Disorders (5th edition) Alcohol Use Disorder (DSM-5 AUD) criteria have been modified to reflect a single, continuous disorder. It is critical that we develop brief assessment measures that can accurately assess for DSM-5 AUD criteria in college students to assist in screening, referral, and brief intervention services implemented on college campuses. The present study sought to develop and assess for the psychometric properties of a brief 13-item measure designed to capture the full spectrum of the DSM-5 AUD criteria in a sample of college students. Participants were past-year drinkers (N = 923) between the ages of 18 to 30 enrolled at 3 universities. Respondents completed a 30-min anonymous battery of questionnaires online. The Brief DSM-5 AUD Assessment consisted of 13 items designed to reflect the DSM-5 AUD criteria. Results indicated a high degree of internal consistency reliability with high item-to-scale correlations. Confirmatory factor analyses indicated that a dominant single factor emerged with good model fit. The Item Response Theory (IRT) analyses indicated that the difficulty parameters for each criterion were intermixed along the upper portion of the underlying AUD severity continuum, and the discrimination parameters were all high. Additional analysis indicated that those with a DSM-5 AUD had greater levels of alcohol and other drug use and problem severity in comparison to those without a DSM-5 AUD. Study findings provide empirical support for the reliability and validity of the Brief 13-item DSM-5 Assessment. It should be routinely included into research and clinical practice efforts. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Visual Motion Prediction and Verbal False Memory Performance in Autistic Children.

PubMed

Tewolde, Furtuna G; Bishop, Dorothy V M; Manning, Catherine

2018-03-01

Recent theoretical accounts propose that atypical predictive processing can explain the diverse cognitive and behavioral features associated with autism, and that difficulties in making predictions may be related to reduced contextual processing. In this pre-registered study, 30 autistic children aged 6-14 years and 30 typically developing children matched in age and non-verbal IQ completed visual extrapolation and false memory tasks to assess predictive abilities and contextual processing, respectively. In the visual extrapolation tasks, children were asked to predict when an occluded car would reach the end of a road and when an occluded set of lights would fill up a grid. Autistic children made predictions that were just as precise as those made by typically developing children, across a range of occlusion durations. In the false memory task, autistic and typically developing children did not differ significantly in their discrimination between items presented in a list and semantically related, non-presented items, although the data were insensitive, suggesting the need for larger samples. Our findings help to refine theoretical accounts by challenging the notion that autism is caused by pervasively disordered prediction abilities. Further studies will be required to assess the relationship between predictive processing and context use in autism, and to establish the conditions under which predictive processing may be impaired. Autism Res 2018, 11: 509-518. © 2017 The Authors Autism Research published by International Society for Autism Research and Wiley Periodicals, Inc. It has been suggested that autistic individuals have difficulties making predictions and perceiving the overall gist of things. Yet, here we found that autistic children made similar predictions about hidden objects as non-autistic children. In a memory task, autistic children were slightly less confused about whether they had heard a word before, when words were closely related in meaning. We conclude that autistic children do not show difficulties with this type of prediction. © 2017 The Authors Autism Research published by International Society for Autism Research and Wiley Periodicals, Inc.
Conflict and metacognitive control: the mismatch-monitoring hypothesis of how others' knowledge states affect recall.

PubMed

Fraundorf, Scott H; Benjamin, Aaron S

2016-09-01

Information about others' success in remembering is frequently available. For example, students taking an exam may assess its difficulty by monitoring when others turn in their exams. In two experiments, we investigated how rememberers use this information to guide recall. Participants studied paired associates, some semantically related (and thus easier to retrieve) and some unrelated (and thus harder). During a subsequent cued recall test, participants viewed fictive information about an opponent's accuracy on each item. In Experiment 1, participants responded to each cue once before seeing the opponent's performance and once afterwards. Participants reconsidered their responses least often when the opponent's accuracy matched the item difficulty (easy items the opponent recalled, hard items the opponent forgot) and most often when the opponent's accuracy and the item difficulty mismatched. When participants responded only after seeing the opponent's performance (Experiment 2), the same mismatch conditions that led to reconsideration even produced superior recall. These results suggest that rememberers monitor whether others' knowledge states accord or conflict with their own experience, and that this information shifts how they interrogate their memory and what they recall.
Intervention for children with word-finding difficulties: a parallel group randomised control trial.

PubMed

Best, Wendy; Hughes, Lucy Mari; Masterson, Jackie; Thomas, Michael; Fedor, Anna; Roncoli, Silvia; Fern-Pollak, Liory; Shepherd, Donna-Lynn; Howard, David; Shobbrook, Kate; Kapikian, Anna

2017-07-31

The study investigated the outcome of a word-web intervention for children diagnosed with word-finding difficulties (WFDs). Twenty children age 6-8 years with WFDs confirmed by a discrepancy between comprehension and production on the Test of Word Finding-2, were randomly assigned to intervention (n = 11) and waiting control (n = 9) groups. The intervention group had six sessions of intervention which used word-webs and targeted children's meta-cognitive awareness and word-retrieval. On the treated experimental set (n = 25 items) the intervention group gained on average four times as many items as the waiting control group (d = 2.30). There were also gains on personally chosen items for the intervention group. There was little change on untreated items for either group. The study is the first randomised control trial to demonstrate an effect of word-finding therapy with children with language difficulties in mainstream school. The improvement in word-finding for treated items was obtained following a clinically realistic intervention in terms of approach, intensity and duration.
Identification of technical item flaws leads to improvement of the quality of single best Multiple Choice Questions.

PubMed

Fayyaz Khan, Humaira; Farooq Danish, Khalid; Saeed Awan, Azra; Anwar, Masood

2013-05-01

The purpose of the study was to identify technical item flaws in the multiple choice questions submitted for the final exams for the years 2009, 2010 and 2011. This descriptive analytical study was carried out in Islamic International Medical College (IIMC). The Data was collected from the MCQ's submitted by the faculty for the final exams for the year 2009, 2010 and 2011. The data was compiled and evaluated by a three member assessment committee. The data was analyzed for frequency and percentages the categorical data was analyzed by chi-square test. Overall percentage of flawed item was 67% for the year 2009 of which 21% were for testwiseness and 40% were for irrelevant difficulty. In year 2010 the total item flaws were 36% and 11% testwiseness and 22% were for irrelevant difficulty. The year 2011 data showed decreased overall flaws of 21%. The flaws of testwisness were 7%, irrelevant difficulty were 11%. Technical item flaws are frequently encountered during MCQ construction, and the identification of flaws leads to improved quality of the single best MCQ's.
Final Report: Resolving and Discriminating Overlapping Anomalies from Multiple Objects in Cluttered Environments

DTIC Science & Technology

2015-12-15

UXO community . NAME Total Number: PERCENT_SUPPORTEDNAME FTE Equivalent: Total Number: Irma Shamatava 0.50 0.50 1 Resolving and Discriminating...Distinguishing an object of interest from innocuous items is the main problem that the UXO community is facing currently. This inverse problem...innocuous items is the main problem that the UXO community is facing currently. This inverse problem demands fast and accurate representation of
Dipole Models for UXO Discrimination at Live Sites

DTIC Science & Technology

2017-05-01

Discriminator CCR Combined Classifier Ranking cm Centimeter(s) EM Electromagnetic EMI Electromagnetic Induction ESTCP Environmental Security Technology...fraction of the anomalies as arising from non-hazardous items that could be safely left in the ground. Of particular note, the contractor EM -61-MK2 cart...use of classification metrics applied to production quality EM - 61 data, it was possible to significantly reduce the number of clutter items excavated
Detecting a Gender-Related DIF Using Logistic Regression and Transformed Item Difficulty

ERIC Educational Resources Information Center

Abedlaziz, Nabeel; Ismail, Wail; Hussin, Zaharah

2011-01-01

Test items are designed to provide information about the examinees. Difficult items are designed to be more demanding and easy items are less so. However, sometimes, test items carry with their demands other than those intended by the test developer (Scheuneman & Gerritz, 1990). When personal attributes such as gender systematically affect…
Adaptable Learning Assistant for Item Bank Management

ERIC Educational Resources Information Center

Nuntiyagul, Atorn; Naruedomkul, Kanlaya; Cercone, Nick; Wongsawang, Damras

2008-01-01

We present PKIP, an adaptable learning assistant tool for managing question items in item banks. PKIP is not only able to automatically assist educational users to categorize the question items into predefined categories by their contents but also to correctly retrieve the items by specifying the category and/or the difficulty level. PKIP adapts…
Developing a Placement Exam for Spanish Heritage Language Learners: Item Analysis and Learner Characteristics

ERIC Educational Resources Information Center

Wilson, Damian Vergara

2012-01-01

This paper illustrates a method of item analysis used to identify discriminating multiple-choice items in placement data. The data come from two rounds of pilots given to both SHL students and Spanish as a Second Language (SSL) students. In the first round, 104 items were administered to 507 students. After discarding poor items, the second round…
Aligning Items and Achievement Levels: A Study Comparing Expert Judgments

ERIC Educational Resources Information Center

Kaliski, Pamela; Huff, Kristen; Barry, Carol

2011-01-01

For educational achievement tests that employ multiple-choice (MC) items and aim to reliably classify students into performance categories, it is critical to design MC items that are capable of discriminating student performance according to the stated achievement levels. This is accomplished, in part, by clearly understanding how item design…
A Study of Developing an Environmental Attitude Scale for Primary School Students

ERIC Educational Resources Information Center

Artvinli, Eyup; Demir, Zulfiye Melis

2018-01-01

The aim of this research is to develop an instrument that measures environmental attitudes of third grade students. The study was completed in six stages: creating scale items, content validity study, item total and remaining item correlation study, determining item discrimination, determining construct validity study and examining the internal…
Measuring HIV- and AIDS-related stigma and discrimination in Nicaragua: results from a community-based study.

PubMed

Ugarte, William J; Högberg, Ulf; Valladares, Eliette C; Essén, Birgitta

2013-04-01

Psychometric properties of external HIV-related stigma and discrimination scales and their predictors were investigated. A cross-sectional community-based study was carried out among 520 participants using an ongoing health and demographic surveillance system in León, Nicaragua. Participants completed an 18-item HIV stigma scale and 19 HIV and AIDS discrimination-related statements. A factor analysis found that 15 of the 18 items in the stigma scale and 18 of the 19 items in the discrimination scale loaded clearly into five- and four-factor structures, respectively. Overall Cronbach's alpha of .81 for the HIV stigma scale and .91 for the HIV discrimination scale provided evidence of internal consistency. Hierarchical multiple linear regression analysis identified that females, rural residents, people with insufficient HIV-related transmission knowledge, those not tested for HIV, those reporting an elevated self-perception of HIV risk, and those unwilling to disclose their HIV status were associated with higher stigmatizing attitudes and higher discriminatory actions towards HIV-positive people. This is the first community-based study in Nicaragua that demonstrates that overall HIV stigma and discrimination scales were reliable and valid in a community-based sample comprised of men and women of reproductive age. Stigma and discrimination were reported high in the general population, especially among sub-groups. The findings in the current study suggest community-based strategies, including the monitoring of stigma and discrimination, and designing and implementing stigma reduction interventions, are greatly needed to reduce inequities and increase acceptance of persons with HIV.
Development and preliminary validation of a self-report measure of psychopathic personality traits in noncriminal populations.

PubMed

Lilienfeld, S O; Andrews, B P

1996-06-01

Research on psychopathology has been hindered by persisting difficulties and controversies regarding its assessment. The primary goals of this set of studies were to (a) develop, and initiate the construct validation of, a self-report measure that assesses the major personality traits of psychopathy in noncriminal populations and (b) clarify the nature of these traits via an exploratory approach to test construction. This measure, the Psychopathic Personality Inventory (PPI), was developed by writing items to assess a large number of personality domains relevant to psychopathy and performing successive item-level factor analyses and revisions on three undergraduate samples. The PPI total score and its eight subscales were found to possess satisfactory internal consistency and test-retest reliability. In four studies with undergraduates, the PPI and its subscales exhibited a promising pattern of convergent and discriminant validity with self-report, psychiatric interview, observer rating, and family history data. In addition, the PPI total score demonstrated incremental validity relative to several commonly used self-report psychopathy-related measures. Future construct validation studies, unresolved conceptual issues regarding the assessment of psychopathy, and potential research uses of the PPI are outlined.
A Diagnostic Assessment for Introductory Molecular and Cell Biology

PubMed Central

Wood, William B.; Martin, Jennifer M.; Guild, Nancy A.; Vicens, Quentin; Knight, Jennifer K.

2010-01-01

We have developed and validated a tool for assessing understanding of a selection of fundamental concepts and basic knowledge in undergraduate introductory molecular and cell biology, focusing on areas in which students often have misconceptions. This multiple-choice Introductory Molecular and Cell Biology Assessment (IMCA) instrument is designed for use as a pre- and posttest to measure student learning gains. To develop the assessment, we first worked with faculty to create a set of learning goals that targeted important concepts in the field and seemed likely to be emphasized by most instructors teaching these subjects. We interviewed students using open-ended questions to identify commonly held misconceptions, formulated multiple-choice questions that included these ideas as distracters, and reinterviewed students to establish validity of the instrument. The assessment was then evaluated by 25 biology experts and modified based on their suggestions. The complete revised assessment was administered to more than 1300 students at three institutions. Analysis of statistical parameters including item difficulty, item discrimination, and reliability provides evidence that the IMCA is a valid and reliable instrument with several potential uses in gauging student learning of key concepts in molecular and cell biology. PMID:21123692
Load-sensitive impairment of working memory for biological motion in schizophrenia.

PubMed

Lee, Hannah; Kim, Jejoong

2017-01-01

Impaired working memory (WM) is a core cognitive deficit in schizophrenia. Nevertheless, past studies have reported that patients may also benefit from increasing salience of memory stimuli. Such efficient encoding largely depends upon precise perception. Thus an investigation on the relationship between perceptual processing and WM would be worthwhile. Here, we used biological motion (BM), a socially relevant stimulus that schizophrenics have difficulty discriminating from similar meaningless motions, in a delayed-response task. Non-BM stimuli and static polygons were also used for comparison. In each trial, one of the three types of stimuli was presented followed by two probes, with a short delay in between. Participants were asked to indicate whether one of them was identical to the memory item or both were novel. The number of memory items was one or two. Healthy controls were more accurate in recognizing BM than non-BM regardless of memory loads. Patients with schizophrenia exhibited similar accuracy patterns to those of controls in the Load 1 condition only. These results suggest that information contained in BM could facilitate WM encoding in general, but the effect is vulnerable to the increase of cognitive load in schizophrenia, implying inefficient encoding driven by imprecise perception.
Hierarchy and Psychometric Properties of ADHD Symptoms in Spanish Children: An Application of the Graded Response Model

PubMed Central

Arias, Victor B.; Nuñez, Daniel E.; Martínez-Molina, Agustín; Ponce, Fernando P.; Arias, Benito

2016-01-01

The Diagnostic and Statistical Manual of Mental Disorders (DSM) diagnostic criteria assume that the 18 symptoms carry the same weight in an Attention Deficit with Hyperactivity Disorder (ADHD) diagnosis and bear the same discriminatory capacity. However, it is reasonable to think that symptoms may differ in terms of severity and even in the reliability with they represent the disorder. To test this hypothesis, the aim of this study was to calibrate in a sample of Spanish children (age 4–7; n = 784) a scale for assessing the symptoms of ADHD proposed by Diagnostic and Statistical Manual of Mental Disorders, IV-TR within the framework of Item Response Theory. Samejima’s Graded Response Model was used as a method for estimating the item difficulty and discrimination parameters. The results showed that ADHD subscales (Attention Deficit and Hyperactivity / Impulsivity) had good psychometric properties and had also a good fit to the model. However, relevant differences between symptoms were observed at the level of severity, informativeness and reliability for the assessment of ADHD. This finding suggests that it would be useful to identify the symptoms that are more important than the others with regard to diagnosing ADHD. PMID:27736911

Hierarchy and Psychometric Properties of ADHD Symptoms in Spanish Children: An Application of the Graded Response Model.

PubMed

Arias, Victor B; Nuñez, Daniel E; Martínez-Molina, Agustín; Ponce, Fernando P; Arias, Benito

2016-01-01

The Diagnostic and Statistical Manual of Mental Disorders (DSM) diagnostic criteria assume that the 18 symptoms carry the same weight in an Attention Deficit with Hyperactivity Disorder (ADHD) diagnosis and bear the same discriminatory capacity. However, it is reasonable to think that symptoms may differ in terms of severity and even in the reliability with they represent the disorder. To test this hypothesis, the aim of this study was to calibrate in a sample of Spanish children (age 4-7; n = 784) a scale for assessing the symptoms of ADHD proposed by Diagnostic and Statistical Manual of Mental Disorders, IV-TR within the framework of Item Response Theory. Samejima's Graded Response Model was used as a method for estimating the item difficulty and discrimination parameters. The results showed that ADHD subscales (Attention Deficit and Hyperactivity / Impulsivity) had good psychometric properties and had also a good fit to the model. However, relevant differences between symptoms were observed at the level of severity, informativeness and reliability for the assessment of ADHD. This finding suggests that it would be useful to identify the symptoms that are more important than the others with regard to diagnosing ADHD.
Development and Validation of the Spanish Numeracy Understanding in Medicine Instrument.

PubMed

Jacobs, Elizabeth A; Walker, Cindy M; Miller, Tamara; Fletcher, Kathlyn E; Ganschow, Pamela S; Imbert, Diana; O'Connell, Maria; Neuner, Joan M; Schapira, Marilyn M

2016-11-01

The Spanish-speaking population in the U.S. is large and growing and is known to have lower health literacy than the English-speaking population. Less is known about the health numeracy of this population due to a lack of health numeracy measures in Spanish. we aimed to develop and validate a short and easy to use measure of health numeracy for Spanish-speaking adults: the Spanish Numeracy Understanding in Medicine Instrument (Spanish-NUMi). Items were generated based on qualitative studies in English- and Spanish-speaking adults and translated into Spanish using a group translation and consensus process. Candidate items for the Spanish NUMi were selected from an eight-item validated English Short NUMi. Differential Item Functioning (DIF) was conducted to evaluate equivalence between English and Spanish items. Cronbach's alpha was computed as a measure of reliability and a Pearson's correlation was used to evaluate the association between test scores and the Spanish Test of Functional Health Literacy (S-TOFHLA) and education level. Two-hundred and thirty-two Spanish-speaking Chicago residents were included in the study. The study population was diverse in age, gender, and level of education and 70 % reported Mexico as their country of origin. Two items of the English eight-item Short NUMi demonstrated DIF and were dropped. The resulting six-item test had a Cronbach's alpha of 0.72, a range of difficulty using classical test statistics (percent correct: 0.48 to 0.86), and adequate discrimination (item-total score correlation: 0.34-0.49). Scores were positively correlated with print literacy as measured by the S- TOFHLA (r = 0.67; p < 0.001) and varied as predicted across grade level; mean scores for up to eighth grade, ninth through twelfth grade, and some college experience or more, respectively, were 2.48 (SD ± 1.64), 4.15 (SD ± 1.45), and 4.82 (SD ± 0.37). The Spanish NUMi is a reliable and valid measure of important numerical concepts used in communicating health information.
An Item Response Analysis of the Motor and Behavioral Subscales of the Unified Huntington's Disease Rating Scale in Huntington Disease Gene Expansion Carriers

PubMed Central

Vaccarino, Anthony L.; Anderson, Karen; Borowsky, Beth; Duff, Kevin; Giuliano, Joseph; Guttman, Mark; Ho, Aileen K.; Orth, Michael; Paulsen, Jane S.; Sills, Terrence; van Kammen, Daniel P.; Evans, Kenneth R.

2011-01-01

Although the Unified Huntington's Disease Rating Scale (UHDRS) is widely used in the assessment of Huntington disease (HD), the ability of individual items to discriminate individual differences in motor or behavioral manifestations has not been extensively studied in HD gene expansion carriers without a motor-defined clinical diagnosis (i.e., prodromal-HD or prHD). To elucidate the relationship between scores on individual motor and behavioral UHDRS items and total score for each subscale, a non-parametric item response analysis was performed on retrospective data from two multicentre, longitudinal studies. Motor and Behavioral assessments were supplied for 737 prHD individuals with data from 2114 visits (PREDICT-HD) and 686 HD individuals with data from 1482 visits (REGISTRY). Option characteristic curves were generated for UHDRS subscale items in relation to their subscale score. In prHD, overall severity of motor signs was low and participants had scores of 2 or above on very few items. In HD, motor items that assessed ocular pursuit, saccade initiation, finger tapping, tandem walking, and to a lesser extent saccade velocity, dysarthia, tongue protrusion, pronation/supination, Luria, bradykinesia, choreas, gait and balance on the retropulsion test were found to discriminate individual differences across a broad range of motor severity. In prHD, depressed mood, anxiety, and irritable behavior demonstrated good discriminative properties. In HD, depressed mood demonstrated a good relationship with the overall behavioral score. These data suggest that at least some UHDRS items appear to have utility across a broad range of severity, although many items demonstrate problematic features. PMID:21370269
An item response analysis of the motor and behavioral subscales of the unified Huntington's disease rating scale in huntington disease gene expansion carriers.

PubMed

Vaccarino, Anthony L; Anderson, Karen; Borowsky, Beth; Duff, Kevin; Giuliano, Joseph; Guttman, Mark; Ho, Aileen K; Orth, Michael; Paulsen, Jane S; Sills, Terrence; van Kammen, Daniel P; Evans, Kenneth R

2011-04-01

Although the Unified Huntington's Disease Rating Scale (UHDRS) is widely used in the assessment of Huntington disease (HD), the ability of individual items to discriminate individual differences in motor or behavioral manifestations has not been extensively studied in HD gene expansion carriers without a motor-defined clinical diagnosis (ie, prodromal-HD or prHD). To elucidate the relationship between scores on individual motor and behavioral UHDRS items and total score for each subscale, a nonparametric item response analysis was performed on retrospective data from 2 multicenter longitudinal studies. Motor and behavioral assessments were supplied for 737 prHD individuals with data from 2114 visits (PREDICT-HD) and 686 HD individuals with data from 1482 visits (REGISTRY). Option characteristic curves were generated for UHDRS subscale items in relation to their subscale score. In prHD, overall severity of motor signs was low, and participants had scores of 2 or above on very few items. In HD, motor items that assessed ocular pursuit, saccade initiation, finger tapping, tandem walking, and to a lesser extent, saccade velocity, dysarthria, tongue protrusion, pronation/supination, Luria, bradykinesia, choreas, gait, and balance on the retropulsion test were found to discriminate individual differences across a broad range of motor severity. In prHD, depressed mood, anxiety, and irritable behavior demonstrated good discriminative properties. In HD, depressed mood demonstrated a good relationship with the overall behavioral score. These data suggest that at least some UHDRS items appear to have utility across a broad range of severity, although many items demonstrate problematic features. Copyright © 2011 Movement Disorder Society.
Item analysis of university-wide multiple choice objective examinations: the experience of a Nigerian private university.

PubMed

Odukoya, Jonathan A; Adekeye, Olajide; Igbinoba, Angie O; Afolabi, A

2018-01-01

Teachers and Students worldwide often dance to the tune of tests and examinations. Assessments are powerful tools for catalyzing the achievement of educational goals, especially if done rightly. One of the tools for 'doing it rightly' is item analysis. The core objectives for this study, therefore, were: ascertaining the item difficulty and distractive indices of the university wide courses. A range of 112-1956 undergraduate students participated in this study. With the use of secondary data, the ex-post facto design was adopted for this project. In virtually all cases, majority of the items (ranging between 65% and 97% of the 70 items fielded in each course) did not meet psychometric standard in terms of difficulty and distractive indices and consequently needed to be moderated or deleted. Considering the importance of these courses, the need to apply item analyses when developing these tests was emphasized.
The discrimination of discrete and continuous amounts in African grey parrots (Psittacus erithacus).

PubMed

Aïn, Syrina Al; Giret, Nicolas; Grand, Marion; Kreutzer, Michel; Bovet, Dalila

2009-01-01

A wealth of research in infants and animals demonstrates discrimination of quantities, in some cases nonverbal numerical perception, and even elementary calculation capacities. We investigated the ability of three African grey parrots (Psittacus erithacus) to select the largest amount of food between two sets, either discrete food items (experiment 1) or as volume of a food substance (experiment 2). The two amounts were presented simultaneously and were visible at the time of choice. Parrots were tested several times for all possible combinations between 1 and 5 seeds or 0.2 and 1 ml of food substance. In both conditions, subjects performed above chance for almost all combinations. Accuracy was negatively correlated with the ratio, that is performance improved with greater differences between amounts. Therefore, these results with both individual items and volume discrimination suggest that parrots use an analogue of magnitude, rather than object-file mechanisms to quantify items and substances.
New feature extraction method for classification of agricultural products from x-ray images

NASA Astrophysics Data System (ADS)

Talukder, Ashit; Casasent, David P.; Lee, Ha-Woon; Keagy, Pamela M.; Schatzki, Thomas F.

1999-01-01

Classification of real-time x-ray images of randomly oriented touching pistachio nuts is discussed. The ultimate objective is the development of a system for automated non- invasive detection of defective product items on a conveyor belt. We discuss the extraction of new features that allow better discrimination between damaged and clean items. This feature extraction and classification stage is the new aspect of this paper; our new maximum representation and discrimination between damaged and clean items. This feature extraction and classification stage is the new aspect of this paper; our new maximum representation and discriminating feature (MRDF) extraction method computes nonlinear features that are used as inputs to a new modified k nearest neighbor classifier. In this work the MRDF is applied to standard features. The MRDF is robust to various probability distributions of the input class and is shown to provide good classification and new ROC data.
Effects of Item Exposure for Conventional Examinations in a Continuous Testing Environment.

ERIC Educational Resources Information Center

Hertz, Norman R.; Chinn, Roberta N.

This study explored the effect of item exposure on two conventional examinations administered as computer-based tests. A principal hypothesis was that item exposure would have little or no effect on average difficulty of the items over the course of an administrative cycle. This hypothesis was tested by exploring conventional item statistics and…
Efforts Toward the Development of Unbiased Selection and Assessment Instruments.

ERIC Educational Resources Information Center

Rudner, Lawrence M.

Investigations into item bias provide an empirical basis for the identification and elimination of test items which appear to measure different traits across populations or cultural groups. The Psychometric rationales for six approaches to the identification of biased test items are reviewed: (1) Transformed item difficulties: within-group…
Automatic Item Generation of Probability Word Problems

ERIC Educational Resources Information Center

Holling, Heinz; Bertling, Jonas P.; Zeuch, Nina

2009-01-01

Mathematical word problems represent a common item format for assessing student competencies. Automatic item generation (AIG) is an effective way of constructing many items with predictable difficulties, based on a set of predefined task parameters. The current study presents a framework for the automatic generation of probability word problems…
The Strengths and Difficulties Questionnaire (SDQ) Revisited in a French-Speaking Population: Proposition of a Reduced Version of the Parent SDQ

ERIC Educational Resources Information Center

Chauvin, Bruno; Leonova, Tamara

2016-01-01

Key concerns about the psychometric properties of the 25-item version of the Strengths and Difficulties Questionnaire (SDQ) have consistently been raised in the literature. The present study aimed at examining the meaningfulness of an alternative model to the SDQ in which 7 problematic items are excluded. French-speaking parents of 262 boys and…
Gender Discrimination in Jessica's Career.

ERIC Educational Resources Information Center

Cook, Ellen Piel

1997-01-01

Focuses on the sexual harassment and other gender-related difficulties faced by a Chinese-American woman. Profiles her encounters with gender discrimination and how it hindered career advancement and led to professional isolation. Relates how this case study can be used to sensitize workers to gender discrimination. (RJM)
The quadratic relationship between difficulty of intelligence test items and their correlations with working memory.

PubMed

Smolen, Tomasz; Chuderski, Adam

2015-01-01

Fluid intelligence (Gf) is a crucial cognitive ability that involves abstract reasoning in order to solve novel problems. Recent research demonstrated that Gf strongly depends on the individual effectiveness of working memory (WM). We investigated a popular claim that if the storage capacity underlay the WM-Gf correlation, then such a correlation should increase with an increasing number of items or rules (load) in a Gf-test. As often no such link is observed, on that basis the storage-capacity account is rejected, and alternative accounts of Gf (e.g., related to executive control or processing speed) are proposed. Using both analytical inference and numerical simulations, we demonstrated that the load-dependent change in correlation is primarily a function of the amount of floor/ceiling effect for particular items. Thus, the item-wise WM correlation of a Gf-test depends on its overall difficulty, and the difficulty distribution across its items. When the early test items yield huge ceiling, but the late items do not approach floor, that correlation will increase throughout the test. If the early items locate themselves between ceiling and floor, but the late items approach floor, the respective correlation will decrease. For a hallmark Gf-test, the Raven-test, whose items span from ceiling to floor, the quadratic relationship is expected, and it was shown empirically using a large sample and two types of WMC tasks. In consequence, no changes in correlation due to varying WM/Gf load, or lack of them, can yield an argument for or against any theory of WM/Gf. Moreover, as the mathematical properties of the correlation formula make it relatively immune to ceiling/floor effects for overall moderate correlations, only minor changes (if any) in the WM-Gf correlation should be expected for many psychological tests.
Differential Effects of Personal-Level vs Group-Level Racial Discrimination on Health among Black Americans.

PubMed

Hagiwara, Nao; Alderson, Courtney J; Mezuk, Briana

2016-07-21

Racial/ethnic minorities in the United States not only experience discrimination personally but also witness or hear about fellow in-group members experiencing discrimination (ie, group-level discrimination). The objective of our study was to examine whether the effects of group-level discrimination on mental and physical health are different from those of personal-level discrimination among Black Americans by drawing upon social psychology research of the Personal/Group Discrimination Discrepancy. We conducted a secondary analysis of cross-sectional survey data from a larger study. One hundred and twenty participants, who self-identified as Black/African Americans during the laboratory sessions (57.5% women, mean age = 48.97, standard deviation = 8.58) in the parent study, were included in our analyses. Perceived personal-level discrimination was assessed with five items that were taken from two existing measures, and group-level racial discrimination was assessed with three items. Self-reported physical and mental health were assessed with a modified version of SF-8. Perceived personal-level racial discrimination was associated with worse mental health. In contrast, perceived group-level racial discrimination was associated with better mental as well as physical health. Perceived group-level racial discrimination may serve as one of several health protective factors even when individuals perceive personal-level racial discrimination. The present findings demonstrate the importance of examining both personal- and group-level experiences of racial discrimination as they independently relate to health outcomes for Black Americans.
Visual speech alters the discrimination and identification of non-intact auditory speech in children with hearing loss.

PubMed

Jerger, Susan; Damian, Markus F; McAlpine, Rachel P; Abdi, Hervé

2017-03-01

Understanding spoken language is an audiovisual event that depends critically on the ability to discriminate and identify phonemes yet we have little evidence about the role of early auditory experience and visual speech on the development of these fundamental perceptual skills. Objectives of this research were to determine 1) how visual speech influences phoneme discrimination and identification; 2) whether visual speech influences these two processes in a like manner, such that discrimination predicts identification; and 3) how the degree of hearing loss affects this relationship. Such evidence is crucial for developing effective intervention strategies to mitigate the effects of hearing loss on language development. Participants were 58 children with early-onset sensorineural hearing loss (CHL, 53% girls, M = 9;4 yrs) and 58 children with normal hearing (CNH, 53% girls, M = 9;4 yrs). Test items were consonant-vowel (CV) syllables and nonwords with intact visual speech coupled to non-intact auditory speech (excised onsets) as, for example, an intact consonant/rhyme in the visual track (Baa or Baz) coupled to non-intact onset/rhyme in the auditory track (/-B/aa or/-B/az). The items started with an easy-to-speechread/B/or difficult-to-speechread/G/onset and were presented in the auditory (static face) vs. audiovisual (dynamic face) modes. We assessed discrimination for intact vs. non-intact different pairs (e.g., Baa:/-B/aa). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more same-as opposed to different-responses in the audiovisual than auditory mode. We assessed identification by repetition of nonwords with non-intact onsets (e.g.,/-B/az). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more Baz-as opposed to az- responses in the audiovisual than auditory mode. Performance in the audiovisual mode showed more same responses for the intact vs. non-intact different pairs (e.g., Baa:/-B/aa) and more intact onset responses for nonword repetition (Baz for/-B/az). Thus visual speech altered both discrimination and identification in the CHL-to a large extent for the/B/onsets but only minimally for the/G/onsets. The CHL identified the stimuli similarly to the CNH but did not discriminate the stimuli similarly. A bias-free measure of the children's discrimination skills (i.e., d' analysis) revealed that the CHL had greater difficulty discriminating intact from non-intact speech in both modes. As the degree of HL worsened, the ability to discriminate the intact vs. non-intact onsets in the auditory mode worsened. Discrimination ability in CHL significantly predicted their identification of the onsets-even after variation due to the other variables was controlled. These results clearly established that visual speech can fill in non-intact auditory speech, and this effect, in turn, made the non-intact onsets more difficult to discriminate from intact speech and more likely to be perceived as intact. Such results 1) demonstrate the value of visual speech at multiple levels of linguistic processing and 2) support intervention programs that view visual speech as a powerful asset for developing spoken language in CHL. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Visual Speech Alters the Discrimination and Identification of Non-Intact Auditory Speech in Children with Hearing Loss

PubMed Central

Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Hervé

2017-01-01

Objectives Understanding spoken language is an audiovisual event that depends critically on the ability to discriminate and identify phonemes yet we have little evidence about the role of early auditory experience and visual speech on the development of these fundamental perceptual skills. Objectives of this research were to determine 1) how visual speech influences phoneme discrimination and identification; 2) whether visual speech influences these two processes in a like manner, such that discrimination predicts identification; and 3) how the degree of hearing loss affects this relationship. Such evidence is crucial for developing effective intervention strategies to mitigate the effects of hearing loss on language development. Methods Participants were 58 children with early-onset sensorineural hearing loss (CHL, 53% girls, M = 9;4 yrs) and 58 children with normal hearing (CNH, 53% girls, M = 9;4 yrs). Test items were consonant-vowel (CV) syllables and nonwords with intact visual speech coupled to non-intact auditory speech (excised onsets) as, for example, an intact consonant/rhyme in the visual track (Baa or Baz) coupled to non-intact onset/rhyme in the auditory track (/–B/aa or /–B/az). The items started with an easy-to-speechread /B/ or difficult-to-speechread /G/ onset and were presented in the auditory (static face) vs. audiovisual (dynamic face) modes. We assessed discrimination for intact vs. non-intact different pairs (e.g., Baa:/–B/aa). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more same—as opposed to different—responses in the audiovisual than auditory mode. We assessed identification by repetition of nonwords with non-intact onsets (e.g., /–B/az). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more Baz—as opposed to az— responses in the audiovisual than auditory mode. Results Performance in the audiovisual mode showed more same responses for the intact vs. non-intact different pairs (e.g., Baa:/–B/aa) and more intact onset responses for nonword repetition (Baz for/–B/az). Thus visual speech altered both discrimination and identification in the CHL—to a large extent for the /B/ onsets but only minimally for the /G/ onsets. The CHL identified the stimuli similarly to the CNH but did not discriminate the stimuli similarly. A bias-free measure of the children’s discrimination skills (i.e., d’ analysis) revealed that the CHL had greater difficulty discriminating intact from non-intact speech in both modes. As the degree of HL worsened, the ability to discriminate the intact vs. non-intact onsets in the auditory mode worsened. Discrimination ability in CHL significantly predicted their identification of the onsets—even after variation due to the other variables was controlled. Conclusions These results clearly established that visual speech can fill in non-intact auditory speech, and this effect, in turn, made the non-intact onsets more difficult to discriminate from intact speech and more likely to be perceived as intact. Such results 1) demonstrate the value of visual speech at multiple levels of linguistic processing and 2) support intervention programs that view visual speech as a powerful asset for developing spoken language in CHL. PMID:28167003
[Frequency and variables associated with perceived devaluation-discrimination in victims of the armed conflict in Colombia].

PubMed

Campo-Arias, Adalberto; Ospino, Anyelly C; Sanabria, Adriana R; Guerra, Valeria M; Caamaño, Beatriz H; Herazo, Edwin

2017-11-21

There is no information on frequency of perceived devaluation-discrimination in victims of the armed conflict in Colombia. The aim of this study was thus to determine the frequency of perceived devaluation-discrimination and associated variables among victims of the armed conflict in municipalities in the Department of Magdalena, Colombia. A cross-sectional study was conducted among victims enrolled in the Program for Psychosocial Care and Comprehensive Healthcare for Victims. Depressive symptoms were quantified with four dichotomous items (three or more were classified as high level of depressive symptoms), and perceived devaluation-discrimination was quantified with six dichotomous items (two or more were classified as high perceived devaluation-discrimination). A total of 943 adults participated (M = 47.9; SD = 14.2); 67.4%, women; 109 (11.6%) reported high level of depressive symptoms and 217 (23%) showed high perceived devaluation-discrimination. High perceived devaluation-discrimination was associated with high level of depressive symptoms (OR = 6.47; 95%CI: 4.23-9.88). In conclusion, one-fourth of the victims of the armed conflict in Magdalena reported high perceived devaluation-discrimination, which was significantly associated with high level of depressive symptoms.
Psychometric properties of a short form of the Center for Epidemiologic Studies Depression (CES-D-10) scale for screening depressive symptoms in healthy community dwelling older adults.

PubMed

Mohebbi, Mohammadreza; Nguyen, Van; McNeil, John J; Woods, Robyn L; Nelson, Mark R; Shah, Raj C; Storey, Elsdon; Murray, Anne M; Reid, Christopher M; Kirpach, Brenda; Wolfe, Rory; Lockery, Jessica E; Berk, Michael

The 10-item Center for the Epidemiological Studies of Depression Short Form (CES-D-10) is a widely used self-report measure of depression symptomatology. The aim of this study is to investigate the psychometric properties of the CES-D-10 in healthy community dwelling older adults. The sample consists of 19,114 community-based individuals residing in Australia and the United States who participated in the ASPREE trial baseline assessment. All individuals were free of any major illness at the time. We evaluated construct validity by performing confirmatory factor analysis, examined measurement invariance across country and gender followed by evaluating item discrimination bias in age, gender, race, ethnicity and education level, and assessing internal consistency. High item-total correlations and Cronbach's alpha indicated high internal consistency. The factor analyses suggested a unidimensional factor structure. Construct validity was supported in the overall sample, and by country and gender sub-groups. The CES-D-10 was invariant across countries, and although evidence of marginal gender non-invariance was observed there was no evidence of notable gender specific item discrimination bias. No notable differences in discrimination parameters or group membership measurement non-invariance were detected by gender, age, race, ethnicity, and education level. These findings suggest the CES-D-10 is a reliable and valid measure of depression in a volunteer sample. No noteworthy evidence of invariance and/or item discrimination bias is observed across gender, age, race, language and ethnic groups. Copyright © 2017 Elsevier Inc. All rights reserved.
Convergent and Discriminant Validity of the Five Factor Form and the Sliderbar Inventory.

PubMed

Rojas, Stephanie L; Widiger, Thomas A

2018-03-01

Existing measures of the five factor model (FFM) of personality are generally, if not exclusively, unipolar in their assessment of maladaptive variants of the FFM domains. However, two recently developed measures, the Five Factor Form (FFF) and the Sliderbar Inventory (SI), include items that assess for maladaptive variants at both poles of each item. This structure is unique among existing measures of personality and personality disorder, although there is a historical, infrequently used Stone Personality Trait Schema (SPTS) that had also included this item structure. To facilitate an exploration of their convergent and discriminant validity, the SI and SPTS items were reorganized into FFM scales. The convergent and discriminant validity of the FFF, SI-FFM, and SPTS-FFM scales was considered in a sample of 450 adults with current or a history of mental health treatment. The FFF, SI-FFM, and SPTS-FFM were also compared with respect to their relationship with FFM domains. Finally, the FFF items and SI-FFM scales were tested with respect to their relationship with measures of maladaptive variants of both high and low agreeableness and conscientiousness. The implications of the results are discussed with respect to the assessment of maladaptive personality functioning, and suggestions for future research are provided.
Conflict and metacognitive control: The mismatch-monitoring hypothesis of how others’ knowledge states affect recall

PubMed Central

Fraundorf, Scott H.; Benjamin, Aaron S.

2015-01-01

Information about others’ success in remembering is frequently available. For example, students taking an exam may assess its difficulty by monitoring when others turn in their exams. In two experiments, we investigated how rememberers use this information to guide recall. Participants studied paired associates, some semantically related (and thus easier to retrieve) and some unrelated (and thus harder). During a subsequent cued recall test, participants viewed fictive information about an opponent’s accuracy on each item. In Experiment 1, participants responded to each cue once before seeing the opponent’s performance and once afterwards. Participants reconsidered their responses least often when the opponent’s accuracy matched the item difficulty (easy items the opponent recalled, hard items the opponent forgot) and most often when the opponent’s accuracy and the item difficulty mismatched. When participants responded only after seeing the opponent’s performance (Experiment 2), the same mismatch conditions that led to reconsideration even produced superior recall. These results suggest that rememberers monitor whether others’ knowledge states accord or conflict with their own experience, and that this information shifts how they interrogate their memory and what they recall. PMID:26247369

Discrimination of Mirror-Image shapes by Young Children

ERIC Educational Resources Information Center

Thompson, G. Brian

1975-01-01

Conducted two experiments which employed discrimination learning methods to test predictions related to the difficulty of discrimination of lateral reversals and of inversions when shapes are presented: (1) successively, (2) simultaneously in lateral alignment, and (3) simultaneously in vertical alignment. Subjects were 6-year-old children. (SDH)
Use of Questionnaire-Based Measures in the Assessment of Listening Difficulties in School-Aged Children

PubMed Central

Tomlin, Danielle; Moore, David R.; Dillon, Harvey

2015-01-01

Objectives: In this study, the authors assessed the potential utility of a recently developed questionnaire (Evaluation of Children’s Listening and Processing Skills [ECLiPS]) for supporting the clinical assessment of children referred for auditory processing disorder (APD). Design: A total of 49 children (35 referred for APD assessment and 14 from mainstream schools) were assessed for auditory processing (AP) abilities, cognitive abilities, and symptoms of listening difficulty. Four questionnaires were used to capture the symptoms of listening difficulty from the perspective of parents (ECLiPS and Fisher’s auditory problem checklist), teachers (Teacher’s Evaluation of Auditory Performance), and children, that is, self-report (Listening Inventory for Education). Correlation analyses tested for convergence between the questionnaires and both cognitive and AP measures. Discriminant analyses were performed to determine the best combination of tests for discriminating between typically developing children and children referred for APD. Results: All questionnaires were sensitive to the presence of difficulty, that is, children referred for assessment had significantly more symptoms of listening difficulty than typically developing children. There was, however, no evidence of more listening difficulty in children meeting the diagnostic criteria for APD. Some AP tests were significantly correlated with ECLiPS factors measuring related abilities providing evidence for construct validity. All questionnaires correlated to a greater or lesser extent with the cognitive measures in the study. Discriminant analysis suggested that the best discrimination between groups was achieved using a combination of ECLiPS factors, together with nonverbal Intelligence Quotient (cognitive) and AP measures (i.e., dichotic digits test and frequency pattern test). Conclusions: The ECLiPS was particularly sensitive to cognitive difficulties, an important aspect of many children referred for APD, as well as correlating with some AP measures. It can potentially support the preliminary assessment of children referred for APD. PMID:26002277
Comparing Methods for Item Analysis: The Impact of Different Item-Selection Statistics on Test Difficulty

ERIC Educational Resources Information Center

Jones, Andrew T.

2011-01-01

Practitioners often depend on item analysis to select items for exam forms and have a variety of options available to them. These include the point-biserial correlation, the agreement statistic, the B index, and the phi coefficient. Although research has demonstrated that these statistics can be useful for item selection, no research as of yet has…
Rasch Analysis of the Power as Knowing Participation in Change Tool--the Brazilian version.

PubMed

Guedes, Erika de Souza; Orozco-Vargas, Luiz Carlos; Turrini, Ruth Natália Teresa; de Sousa, Regina Márcia Cardoso; dos Santos, Mariana Alvina; da Cruz, Diná de Almeida Lopes Monteiro

2013-01-01

the objective of this study was to evaluate the items contained in the Brazilian version of the Power as Knowing Participation in Change Tool (PKPCT). investigation of the psychometric properties of the mentioned questionnaire through Rasch analysis. the data from 952 nursing assistants and 627 baccalaureate nurses were analyzed (average age 44.1 (SD=9.5); 13.0% men). The subscales Choices, Awareness, Freedom and Involvement were tested separately and presented unidimensionality; the categories of the responses given to the items were compiled from 7 to 3 levels and the items fit the model well, except for the following/leading item, in which the infit and outfit values were above 1.4; this item has also presented Differential Item Functioning (DIF) according to the participant's role. The reliability of the items was of 0.99 and the reliability of the participants ranged from 0.80 to 0.84 in the subscales. Items with extremely high levels of difficulty were not identified. the PKPCT should not be viewed as unidimensional, items with extremely high levels of difficulty in the scale need to be created and the differential functioning of some items has to be further investigated.
Development and validation of the knowledge and attitudes regarding antibiotics and resistance (KAAR-11) questionnaire for primary care physicians.

PubMed

López-Vázquez, Paula; Vázquez-Lago, Juan Manuel; Gonzalez-Gonzalez, Cristian; Piñeiro-Lamas, María; López-Durán, Ana; Herdeiro, Maria Teresa; Figueiras, Adolfo

2016-10-01

The aim of this study was to develop a novel, self-administered questionnaire to identify primary-care physicians' knowledge and attitudes regarding antibiotics and resistance (KAAR). The study population comprised primary care physicians. The study was conducted in five phases. Phase I consisted of a systematic review and qualitative focus-group study (n = 33 physicians), in which items were formulated so as to be measured on a continuous, visual analogue scale (VAS); in Phase II, content validation and face validity were evaluated by a panel of experts, which reformulated, added and deleted items; Phase III consisted of a pilot study on a population possessing similar characteristics (n = 15); in Phase IV, we analysed reliability by means of a test-retest study (n = 91) and calculated the intraclass correlation coefficients (ICCs); and in Phase V, we assessed construct validity by applying the known-groups technique, measuring the differences between contrasting groups of physicians formed according to antibiotic prescription quality indicators (group 1, n = 156 versus group 2, n = 191). Following Phases I and II, the questionnaire contained 16 knowledge and attitude items. Participants in the pilot study (Phase III) reported no difficulty. The test-retest study (Phase IV) showed that 11 of the 16 initial knowledge and attitude items yielded an ICC > 0.5, while analysis of known-groups validity (Phase V) showed that 13 of the 16 initial items which assessed knowledge and attitudes discriminated between physicians with good and bad indicators of antibiotics prescription. The final 11 item KAAR questionnaire appears to be valid, reliable and responsive. © The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Establishing Reliability and Validity of the Criterion Referenced Exam of GeoloGy Standards EGGS

NASA Astrophysics Data System (ADS)

Guffey, S. K.; Slater, S. J.; Slater, T. F.; Schleigh, S.; Burrows, A. C.

2016-12-01

Discipline-based geoscience education researchers have considerable need for a criterion-referenced, easy-to-administer and -score conceptual diagnostic survey for undergraduates taking introductory science survey courses in order for faculty to better be able to monitor the learning impacts of various interactive teaching approaches. To support ongoing education research across the geosciences, we are continuing to rigorously and systematically work to firmly establish the reliability and validity of the recently released Exam of GeoloGy Standards, EGGS. In educational testing, reliability refers to the consistency or stability of test scores whereas validity refers to the accuracy of the inferences or interpretations one makes from test scores. There are several types of reliability measures being applied to the iterative refinement of the EGGS survey, including test-retest, alternate form, split-half, internal consistency, and interrater reliability measures. EGGS rates strongly on most measures of reliability. For one, Cronbach's alpha provides a quantitative index indicating the extent to which if students are answering items consistently throughout the test and measures inter-item correlations. Traditional item analysis methods further establish the degree to which a particular item is reliably assessing students is actually quantifiable, including item difficulty and item discrimination. Validity, on the other hand, is perhaps best described by the word accuracy. For example, content validity is the to extent to which a measurement reflects the specific intended domain of the content, stemming from judgments of people who are either experts in the testing of that particular content area or are content experts. Perhaps more importantly, face validity is a judgement of how representative an instrument is reflective of the science "at face value" and refers to the extent to which a test appears to measure a the targeted scientific domain as viewed by laypersons, examinees, test users, the public, and other invested stakeholders.
Do Reading Experts Agree with MCAT Verbal Reasoning Item Classifications?

ERIC Educational Resources Information Center

Jackson, Evelyn W.; And Others

1994-01-01

Examined whether expert raters (n=5) could agree about classification of Medical College Admission Test (MCAT) items and whether they agreed with MCAT student manual in labeling skill being measured by each test item. Results revealed difficulties in replicating authors' labeling of skills for reading items on practice test provided with 1991 MCAT…
Combining the Best of Two Standard Setting Methods: The Ordered Item Booklet Angoff

ERIC Educational Resources Information Center

Smith, Russell W.; Davis-Becker, Susan L.; O'Leary, Lisa S.

2014-01-01

This article describes a hybrid standard setting method that combines characteristics of the Angoff (1971) and Bookmark (Mitzel, Lewis, Patz & Green, 2001) methods. The proposed approach utilizes strengths of each method while addressing weaknesses. An ordered item booklet, with items sorted based on item difficulty, is used in combination…
Comparative Racial Analysis of Enlisted Advancement Exams: Item- Difficulty.

DTIC Science & Technology

1975-07-01

11cm-ana lysis Promotion Racial comparison Equal opportunity 1 20. ABSTRACT (Continue on reveree aide 11 neceeemry mnd Identity by block...improving equal oppor- tunity in career growth for minority groups. The study of exam item- difficulty levels is the first of a series of technical reports...under Exploratory Development Task Area PF55.521.032 (Contemporary Social Issues). J. J. CLARKIN Commanding Officer SUMMARY Purpose A number of
Item Selection and Pre-equating with Empirical Item Characteristic Curves.

ERIC Educational Resources Information Center

Livingston, Samuel A.

An empirical item characteristic curve shows the probability of a correct response as a function of the student's total test score. These curves can be estimated from large-scale pretest data. They enable test developers to select items that discriminate well in the score region where decisions are made. A similar set of curves can be used to…
The Nursing Home Physical Performance Test: A Secondary Data Analysis of Women in Long-Term Care Using Item Response Theory.

PubMed

Perera, Subashan; Nace, David A; Resnick, Neil M; Greenspan, Susan L

2017-04-11

The Nursing Home Physical Performance Test (NHPPT) was developed to measure function among nursing home residents using sit-to-stand, scooping applesauce, face washing, dialing phone, putting on sweater, and ambulating tasks. Using item response theory, we explore its measurement characteristics at item level and opportunities for improvements. We used data from long-term care women. We fitted a graded response model, estimated parameters, and constructed probability and information curves. We identified items to be targeted toward lower and higher functioning persons to increase the range of abilities to which the instrument is applicable. We revised the scoring by making sit-to-stand and sweater items harder and dialing phone easier. We examined changes to concurrent validity with activities of daily living (ADL), frailty, and cognitive function. Participants were 86 years old, had more than three comorbidities, and a NHPPT of 19.4. All items had high discrimination and were targeted toward the lower middle range of performance continuum. After revision, sit-to-stand and sweater items demonstrated greater discrimination among the higher functioning and/or greater spread of thresholds for response categories. The overall test showed discrimination over a wider range of individuals. Concurrent validity correlation improved from 0.60 to 0.68 for instrumental ADL and explained variability (R2) from 22% to 36% for frailty. NHPPT has good measurement characteristics at the item level. NHPPT can be improved, implemented in computerized adaptive testing, and combined with self-report for greater utility, but a definitive study is needed. © The Author 2017. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Capuchin monkeys (Cebus apella) treat small and large numbers of items similarly during a relative quantity judgment task.

PubMed

Beran, Michael J; Parrish, Audrey E

2016-08-01

A key issue in understanding the evolutionary and developmental emergence of numerical cognition is to learn what mechanism(s) support perception and representation of quantitative information. Two such systems have been proposed, one for dealing with approximate representation of sets of items across an extended numerical range and another for highly precise representation of only small numbers of items. Evidence for the first system is abundant across species and in many tests with human adults and children, whereas the second system is primarily evident in research with children and in some tests with non-human animals. A recent paper (Choo & Franconeri, Psychonomic Bulletin & Review, 21, 93-99, 2014) with adult humans also reported "superprecise" representation of small sets of items in comparison to large sets of items, which would provide more support for the presence of a second system in human adults. We first presented capuchin monkeys with a test similar to that of Choo and Franconeri in which small or large sets with the same ratios had to be discriminated. We then presented the same monkeys with an expanded range of comparisons in the small number range (all comparisons of 1-9 items) and the large number range (all comparisons of 10-90 items in 10-item increments). Capuchin monkeys showed no increased precision for small over large sets in making these discriminations in either experiment. These data indicate a difference in the performance of monkeys to that of adult humans, and specifically that monkeys do not show improved discrimination performance for small sets relative to large sets when the relative numerical differences are held constant.
Understanding Orgasmic Difficulty in Women.

PubMed

Rowland, David L; Kolba, Tiffany N

2016-08-01

Women's primary issue with the orgasmic phase is usually difficulty reaching orgasm. To identify predictors of orgasmic difficulty in women within the context of a partnered sexual experience; to assess the relation between orgasmic difficulty and self-reported levels of sexual desire or interest and arousal in women; and to assess the interrelations among three dimensions of orgasmic response during partnered sex: self-reported time to reach orgasm, general difficulty or ease of reaching orgasm, and level of distress or concern. Drawing from a community-based sample using the Internet, 866 women were queried on a 26-item survey regarding their difficulty reaching orgasm during partnered sex. Four hundred sixteen women who indicated difficulty also responded to items assessing arousal and desire difficulties, level of distress about their condition, and their estimated time to reach orgasm. Answers to a 26-item survey on surveyed women's difficulty reaching orgasm during partnered sex. Age, arousal difficulty, and lubrication difficulty predicted difficulty reaching orgasm in the overall sample. In the subsample of women reporting difficulty, approximately half reported issues with arousal. Women with arousal problems reported greater difficulty reaching orgasm but did not differ from those without arousal problems on measurements of orgasm latency or levels of distress. Slightly more than half the women experiencing difficulty reaching orgasm were distressed by their condition; distressed women reported greater difficulty reaching orgasm and longer latencies to orgasm than non-distressed counterparts. They also reported lower satisfaction with their sexual relationship. This study indicates the importance of assessing multiple parameters when investigating orgasmic problems in women, including arousal issues, levels of distress, and latency to orgasm. Results also clarify that women with arousal problems do not differ substantially from those without arousal problems; in contrast, women distressed by their condition differ from non-distressed women along some critical dimensions. Although orgasmic problems decreased with age, the overall relation of this variable to distress, arousal, and latency to orgasm was essentially unchanged across age groups. Copyright © 2016 International Society for Sexual Medicine. Published by Elsevier Inc. All rights reserved.
Assertive Behavior and Cognitive Performance in Preschool Children

ERIC Educational Resources Information Center

Dorman, Lynn

1973-01-01

Assertive behaviors were related to each other and to intelligence test scores. An item analysis revealed that more assertive children did better on certain intelligence test items: comprehension, verbal, and discrimination. (ST)
Picture-Word Differences in Discrimination Learning: 11. Effects of Conceptual Categories

ERIC Educational Resources Information Center

Bourne, Lyle E.; And Others

1976-01-01

Investigates the prediction that the usual superiority of pictures over words for repetitions of the same items would disappear for items that were different instances of repeated categories. (Author/RK)
Psychometric Evaluation of a Cultural Competency Assessment Instrument for Health Professionals

PubMed Central

Haywood, Sonja H.; Goode, Tawara; Gao, Yong; Smith, Kristyn; Bronheim, Suzanne; Flocke, Susan A; Zyzanski, Steve

2012-01-01

Background Few valid and reliable measures exist for health care professionals interested in determining their levels of cultural and linguistic competence. Objective To evaluate the measurement properties of the Cultural Competence Health Practitioner Assessment (CCHPA-129). Methods The CCHPA-129 is a 129-item web-based instrument, developed by the National Center for Cultural Competence (NCCC). Responses on the CCHPA -129 were examined using factor analysis; Rasch modeling; and Differential Item Functioning (DIF) across race, ethnicity, gender, and profession. Subjects 2504 practitioners, including 1864 nurses (RN/LPN,/BSN); 341 clinicians (PA/NP); and 299 physicians (MD/DO), who completed the CCHPA-129 online between 2005 and 2008. Results Three factors representing domains of knowledge, adapting practice, and promoting health for culturally and linguistically diverse populations accounted for 46% of the variance. Among Knowledge factor items, 53% (23/43) fit the Rasch model, item difficulties ranged from −1.01 logits (least difficult) to +1.11 logits (most difficult), separation index (SI) 13.82, and Cronbach’s α 0.92. Forty-seven percent (21/44) Adapting Practice factor items fit the model, item difficulties −0.07 to +1.11 logits, SI 11.59, Cronbach’s α 0.88; and 58% (23/39). Promoting Health factor items fit the model, item difficulties −1.01 to +1.38 logits, SI 22.64, Cronbach’s α 0.92. Early evidence of validity was established by known groups having statistically different scores. Conclusion The 67-item CCHPA-67 is psychometrically sound. This shorted instrument can be used to establish associations between practitioners’ cultural and linguistic competence and health outcomes as well as to evaluate interventions to increase practitioners’ cultural and linguistic competence. PMID:22437625
Classification of product inspection items using nonlinear features

NASA Astrophysics Data System (ADS)

Talukder, Ashit; Casasent, David P.; Lee, H.-W.

1998-03-01

Automated processing and classification of real-time x-ray images of randomly oriented touching pistachio nuts is discussed. The ultimate objective is the development of a system for automated non-invasive detection of defective product items on a conveyor belt. This approach involves two main steps: preprocessing and classification. Preprocessing locates individual items and segments ones that touch using a modified watershed algorithm. The second stage involves extraction of features that allow discrimination between damaged and clean items (pistachio nuts). This feature extraction and classification stage is the new aspect of this paper. We use a new nonlinear feature extraction scheme called the maximum representation and discriminating feature (MRDF) extraction method to compute nonlinear features that are used as inputs to a classifier. The MRDF is shown to provide better classification and a better ROC (receiver operating characteristic) curve than other methods.
Measurement in Sensory Modulation: The Sensory Processing Scale Assessment

PubMed Central

Miller, Lucy J.; Sullivan, Jillian C.

2014-01-01

OBJECTIVE. Sensory modulation issues have a significant impact on participation in daily life. Moreover, understanding phenotypic variation in sensory modulation dysfunction is crucial for research related to defining homogeneous groups and for clinical work in guiding treatment planning. We thus evaluated the new Sensory Processing Scale (SPS) Assessment. METHOD. Research included item development, behavioral scoring system development, test administration, and item analyses to evaluate reliability and validity across sensory domains. RESULTS. Items with adequate reliability (internal reliability >.4) and discriminant validity (p < .01) were retained. Feedback from the expert panel also contributed to decisions about retaining items in the scale. CONCLUSION. The SPS Assessment appears to be a reliable and valid measure of sensory modulation (scale reliability >.90; discrimination between group effect sizes >1.00). This scale has the potential to aid in differential diagnosis of sensory modulation issues. PMID:25184464
A Longitudinal Examination of First Term Attrition and Reenlistment among FY1999 Enlisted Accessions

DTIC Science & Technology

2005-11-01

Army mission 0 High 0Very low - 0 Possibility of being subjected to sexual or racial 0 Moderate - discriminaton - 0 None of the above ,- 60. Have you... racial discrimination). The third factor, Problems Adjusting, included three items (57b, 57h, and 57i) about failing to adjust to Army life (e.g...leaving because of racial or gender discrimination. The second factor was called Medical Reasons and consisted of Items 35h and 35o, which asked about
The measurement of cyberbullying: dimensional structure and relative item severity and discrimination.

PubMed

Menesini, Ersilia; Nocentini, Annalaura; Calussi, Pamela

2011-05-01

In relation to a sample of 1,092 Italian adolescents (50.9% females), the present study aims to: (a) analyze the most parsimonious structure of the cyberbullying and cybervictimization construct in male and female Italian adolescents through confirmatory factor analysis; and (b) analyze the severity and the discrimination parameters of each act using the item response theory. Results showed that the structure of the cyberbullying scale for perpetrated and received behaviors in both genders could best be represented by a monodimensional model where each item lies on a continuum of severity of aggressive acts. For both genders, the less severe acts are silent/prank calls and insults on instant messaging, and the most severe acts are unpleasant pictures/photos on Web sites, phone pictures/photos/videos of intimate scenes, and phone pictures/photos/videos of violent scenes. The items nasty text messages, nasty or rude e-mails, insults on Web sites, insults in chatrooms, and insults on blogs range from moderate to high levels of severity. Regarding the discrimination level of the acts, several items emerged as good indicators at various levels of cyberbullying and cybervictimization severity, with the exception of silent/prank calls. Furthermore, gender specificities underlined that the visual items can be considered good indicators of severe cyberbullies and cybervictims only in males. This information can help in understanding better the nature of the phenomenon, its severity in a given population, and to plan more specific prevention and intervention strategies.

Estimating the Number of Examinees Who Did Not Reach the Last Item of a Section.

ERIC Educational Resources Information Center

Wainer, Howard

It is important to estimate the number of examinees who reached a test item, because item difficulty is defined by the number who answered correctly divided by the number who reached the item. A new method is presented and compared to the previously used definition of three categories of response to an item: (1) answered; (2) omitted--a…
Does item overlap render measured relationships between pain and challenging behaviour trivial? Results from a multicentre cross-sectional study in 13 German nursing homes.

PubMed

Kutschar, Patrick; Bauer, Zsuzsa; Gnass, Irmela; Osterbrink, Jürgen

2017-07-01

Several studies suggest that pain is a trigger for challenging behaviour in older adults with cognitive impairment. However, such measured relationships might be confounded due to item overlap as instruments share similar or identical items. The purpose of this study was to examine whether the frequently observed association between pain and challenging behaviour might be traced back to item overlap. This multicentre cross-sectional study was conducted in 13 nursing homes and examined pain (measure: Pain Assessment in Advanced Dementia Scale) and challenging behaviour (measure: Cohen-Mansfield Agitation Inventory) in 150 residents with severe cognitive impairment. The extent of item overlap was determined by juxtaposition of both measures' original items. As expected, comparison between these instruments revealed an extensive item overlap. The statistical relationship between the two phenomena can be traced back mainly to the contribution of the overlapping items, which renders the frequently stated relationship between pain and challenging behaviour trivial. The status quo of measuring such associations must be contested: constructs' discrimination and instruments' discrimination have to be discussed critically as item overlap may lead to biased conclusions and assumptions in research as well as to inadequate care measures in nursing practice. © 2017 John Wiley & Sons Ltd.
Assessing Psychopathy Among Justice Involved Adolescents with the PCL: YV: An Item Response Theory Examination Across Gender

PubMed Central

Tsang, Siny; Schmidt, Karen M.; Vincent, Gina M.; Salekin, Randall T.; Moretti, Marlene M.; Odgers, Candice L.

2014-01-01

This study used an item response theory (IRT) model and a large adolescent sample of justice involved youth (N = 1,007, 38% female) to examine the item functioning of the Psychopathy Checklist – Youth Version (PCL: YV). Items that were most discriminating (or most sensitive to changes) of the latent trait (thought to be psychopathy) among adolescents included “Glibness/superficial charm”, “Lack of remorse”, and “Need for stimulation”, whereas items that were least discriminating included “Pathological lying”, “Failure to accept responsibility”, and “Lacks goals.” The items “Impulsivity” and “Irresponsibility” were the most likely to be rated high among adolescents, whereas “Parasitic lifestyle”, and “Glibness/superficial charm” were the most likely to be rated low. Evidence of differential item functioning (DIF) on four of the 13 items was found between boys and girls. “Failure to accept responsibility” and “Impulsivity” were endorsed more frequently to describe adolescent girls than boys at similar levels of the latent trait, and vice versa for “Grandiose sense of self-worth” and “Lacks goals.” The DIF findings suggest that four PCL: YV items function differently between boys and girls. PMID:25580672
HoNOSCA-D As a Measure of the Severity of Diagnosed Mental Disorders in Children and Adolescents-Psychometric Properties of the German Translation.

PubMed

von Wyl, Agnes; Toggweiler, Stephan; Zollinger, Ruedi

2017-01-01

The Health of the Nation Outcome Scales for Children and Adolescents (HoNOSCA), in use worldwide, is a 13-item measure assessing the biopsychosocial severity of mental health problems in children and adolescents. This article introduces the authorized German-language version of HoNOSCA, the HoNOSCA-D, and examines and discusses its psychometric properties based on a clinical sample of 1,533 children and adolescents aged 4;0 to 17;11 years. For the HoNOSCA-D total score (severity of mental health problems), internal consistency (Cronbach's alpha) was 0.63. The discriminative power of the items ranged from 0.07 to 0.44; the average interitem correlation was 0.11. Due to this stochastic independence, calculation of a total severity index is acceptable. Using factor analysis, the principal axis factoring and varimax rotation resulted in a four-factor structure, which with a Kaiser-Meyer-Olkin measure of sampling adequacy of 0.684 explained 30.62% of total variance. The convergent correlations with the German-language parent report version of the Strengths and Difficulties Questionnaire were as expected and showed a medium effect size. Gender and age differences in the HoNOSCA-D total score were small. Regarding the 13 items gender and age differences were negligible to medium. The highest severity was found for schizophrenia and psychotic disorders, followed by affective disorders and social behavior disorders. Overall, validity of HoNOSCA-D was clearly supported.
Developing and validating a nutrition knowledge questionnaire: key methods and considerations.

PubMed

Trakman, Gina Louise; Forsyth, Adrienne; Hoye, Russell; Belski, Regina

2017-10-01

To outline key statistical considerations and detailed methodologies for the development and evaluation of a valid and reliable nutrition knowledge questionnaire. Literature on questionnaire development in a range of fields was reviewed and a set of evidence-based guidelines specific to the creation of a nutrition knowledge questionnaire have been developed. The recommendations describe key qualitative methods and statistical considerations, and include relevant examples from previous papers and existing nutrition knowledge questionnaires. Where details have been omitted for the sake of brevity, the reader has been directed to suitable references. We recommend an eight-step methodology for nutrition knowledge questionnaire development as follows: (i) definition of the construct and development of a test plan; (ii) generation of the item pool; (iii) choice of the scoring system and response format; (iv) assessment of content validity; (v) assessment of face validity; (vi) purification of the scale using item analysis, including item characteristics, difficulty and discrimination; (vii) evaluation of the scale including its factor structure and internal reliability, or Rasch analysis, including assessment of dimensionality and internal reliability; and (viii) gathering of data to re-examine the questionnaire's properties, assess temporal stability and confirm construct validity. Several of these methods have previously been overlooked. The measurement of nutrition knowledge is an important consideration for individuals working in the nutrition field. Improved methods in the development of nutrition knowledge questionnaires, such as the use of factor analysis or Rasch analysis, will enable more confidence in reported measures of nutrition knowledge.
Development of an Inconsistent Responding Scale for the Triarchic Psychopathy Measure.

PubMed

Mowle, Elyse N; Kelley, Shannon E; Edens, John F; Donnellan, M Brent; Smith, Shannon Toney; Wygant, Dustin B; Sellbom, Martin

2017-08-01

Inconsistent or careless responding to self-report measures is estimated to occur in approximately 10% of university research participants and may be even more common among offender populations. Inconsistent responding may be a result of a number of factors including inattentiveness, reading or comprehension difficulties, and cognitive impairment. Many stand-alone personality scales used in applied and research settings, however, do not include validity indicators to help identify inattentive response patterns. Using multiple archival samples, the current study describes the development of an inconsistent responding scale for the Triarchic Psychopathy Measure (TriPM; Patrick, 2010), a widely used self-report measure of psychopathy. We first identified pairs of correlated TriPM items in a derivation sample (N = 2,138) and then created a total score based on the sum of the absolute value of the differences for each item pair. The resulting scale, the Triarchic Assessment Procedure for Inconsistent Responding (TAPIR), strongly differentiated between genuine TriPM protocols and randomly generated TriPM data (N = 1,000), as well as between genuine protocols and those in which 50% of the original data were replaced with random item responses. TAPIR scores demonstrated fairly consistent patterns of association with some theoretically relevant correlates (e.g., inconsistency scales embedded in other personality inventories), although not others (e.g., measures of conscientiousness) across our cross-validation samples. Tentative TAPIR cut scores that may discriminate between attentively and carelessly completed protocols are presented. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
[Psychometric properties of Q-DIO, an instrument to measure the quality of documented nursing diagnoses, interventions and outcomes].

PubMed

Müller-Staub, Maria; Lunney, Margaret; Lavin, Mary Ann; Needham, Ian; Odenbreit, Matthias; van Achterberg, Theo

2010-04-01

The instrument Q-DIO was developed in the years 2005 till 2006 to measure the quality of documented nursing diagnoses, interventions, and nursing sensitive patient outcomes. Testing psychometric properties of the Q-DIO (Quality of nursing Diagnoses, Interventions and Outcomes.) was the study aim. Instrument testing included internal consistency, test-retest reliability, interrater reliability, item analyses, and an assessment of the objectivity. To render variation in scores, a random strata sample of 60 nursing documentations was drawn. The strata represented 30 nursing documentations with and 30 without application of theory based, standardised nursing language. Internal consistency of the subscale nursing diagnoses as process showed Cronbach's Alpha 0.83 [0.78, 0.88]; nursing diagnoses as product 0.98 [0.94, 0.99]; nursing interventions 0.90 [0.85, 0.94]; and nursing-sensitive patient outcomes 0.99 [0.95, 0.99]. With Cohen's Kappa of 0.95, the intrarater reliability was good. The interrater reliability showed a Kappa of 0.94 [0.90, 0.96]. Item analyses confirmed the fulfilment of criteria for degree of difficulty and discriminative validity of the items. In this study, Q-DIO has shown to be a reliable instrument. It allows measuring the documented quality of nursing diagnoses, interventions and outcomes with and without implementation of theory based, standardised nursing languages. Studies for further testing of Q-DIO in other settings are recommended. The results implicitly support the use of nursing classifications such as NANDA, NIC and NOC.
Measuring the Instructional Sensitivity of ESL Reading Comprehension Items.

ERIC Educational Resources Information Center

Brutten, Sheila R.; And Others

A study attempted to estimate the instructional sensitivity of items in three reading comprehension tests in English as a second language (ESL). Instructional sensitivity is a test-item construct defined as the tendency for a test item to vary in difficulty as a function of instruction. Similar tasks were given to readers at different proficiency…
Explaining and Controlling for the Psychometric Properties of Computer-Generated Figural Matrix Items

ERIC Educational Resources Information Center

Freund, Philipp Alexander; Hofer, Stefan; Holling, Heinz

2008-01-01

Figural matrix items are a popular task type for assessing general intelligence (Spearman's g). Items of this kind can be constructed rationally, allowing the implementation of computerized generation algorithms. In this study, the influence of different task parameters on the degree of difficulty in matrix items was investigated. A sample of N =…
Item Difficulty in the Evaluation of Computer-Based Instruction: An Example from Neuroanatomy

ERIC Educational Resources Information Center

Chariker, Julia H.; Naaz, Farah; Pani, John R.

2012-01-01

This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of…
Estimation of Item Response Theory Parameters in the Presence of Missing Data

ERIC Educational Resources Information Center

Finch, Holmes

2008-01-01

Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same…
The consequences of language proficiency and difficulty of lexical access for translation performance and priming.

PubMed

Francis, Wendy S; Tokowicz, Natasha; Kroll, Judith F

2014-01-01

Repetition priming was used to assess how proficiency and the ease or difficulty of lexical access influence bilingual translation. Two experiments, conducted at different universities with different Spanish-English bilingual populations and materials, showed repetition priming in word translation for same-direction and different-direction repetitions. Experiment 1, conducted in an English-dominant environment, revealed an effect of translation direction but not of direction match, whereas Experiment 2, conducted in a more balanced bilingual environment, showed an effect of direction match but not of translation direction. A combined analysis on the items common to both studies revealed that bilingual proficiency was negatively associated with response time (RT), priming, and the degree of translation asymmetry in RTs and priming. An item analysis showed that item difficulty was positively associated with RTs, priming, and the benefit of same-direction over different-direction repetition. Thus, although both participant accuracy and item accuracy are indices of learning, they have distinct effects on translation RTs and on the learning that is captured by the repetition-priming paradigm.
Memory for Multiple Cache Locations and Prey Quantities in a Food-Hoarding Songbird

PubMed Central

Armstrong, Nicola; Garland, Alexis; Burns, K. C.

2012-01-01

Most animals can discriminate between pairs of numbers that are each less than four without training. However, North Island robins (Petroica longipes), a food-hoarding songbird endemic to New Zealand, can discriminate between quantities of items as high as eight without training. Here we investigate whether robins are capable of other complex quantity discrimination tasks. We test whether their ability to discriminate between small quantities declines with (1) the number of cache sites containing prey rewards and (2) the length of time separating cache creation and retrieval (retention interval). Results showed that subjects generally performed above-chance expectations. They were equally able to discriminate between different combinations of prey quantities that were hidden from view in 2, 3, and 4 cache sites from between 1, 10, and 60 s. Overall results indicate that North Island robins can process complex quantity information involving more than two discrete quantities of items for up to 1 min long retention intervals without training. PMID:23293622
Memory for multiple cache locations and prey quantities in a food-hoarding songbird.

PubMed

Armstrong, Nicola; Garland, Alexis; Burns, K C

2012-01-01

Most animals can discriminate between pairs of numbers that are each less than four without training. However, North Island robins (Petroica longipes), a food-hoarding songbird endemic to New Zealand, can discriminate between quantities of items as high as eight without training. Here we investigate whether robins are capable of other complex quantity discrimination tasks. We test whether their ability to discriminate between small quantities declines with (1) the number of cache sites containing prey rewards and (2) the length of time separating cache creation and retrieval (retention interval). Results showed that subjects generally performed above-chance expectations. They were equally able to discriminate between different combinations of prey quantities that were hidden from view in 2, 3, and 4 cache sites from between 1, 10, and 60 s. Overall results indicate that North Island robins can process complex quantity information involving more than two discrete quantities of items for up to 1 min long retention intervals without training.
Brief Report: Checklist for Autism Spectrum Disorder--Most Discriminating Items for Diagnosing Autism

ERIC Educational Resources Information Center

Mayes, Susan D.

2018-01-01

The smallest subset of items from the 30-item Checklist for Autism Spectrum Disorder (CASD) that differentiated 607 referred children (3-17 years) with and without autism with 100% accuracy was identified. This 6-item subset (CASD-Short Form) was cross-validated on an independent sample of 397 referred children (1-18 years) with and without autism…
A Two-Parameter Latent Trait Model. Methodology Project.

ERIC Educational Resources Information Center

Choppin, Bruce

On well-constructed multiple-choice tests, the most serious threat to measurement is not variation in item discrimination, but the guessing behavior that may be adopted by some students. Ways of ameliorating the effects of guessing are discussed, especially for problems in latent trait models. A new item response model, including an item parameter…
iBank

ERIC Educational Resources Information Center

Bermundo, Cesar B.; Bermundo, Alex B.; Ballester, Rex C.

2012-01-01

iBank is a project that utilizes a software to create an item Bank that store quality questions, generate test and print exam. The items are from analyze teacher-constructed test questions that provides the basis for discussing test results, by determining why a test item is or not discriminating between the better and poorer students, and by…
Applying modern psychometric techniques to melodic discrimination testing: Item response theory, computerised adaptive testing, and automatic item generation.

PubMed

Harrison, Peter M C; Collins, Tom; Müllensiefen, Daniel

2017-06-15

Modern psychometric theory provides many useful tools for ability testing, such as item response theory, computerised adaptive testing, and automatic item generation. However, these techniques have yet to be integrated into mainstream psychological practice. This is unfortunate, because modern psychometric techniques can bring many benefits, including sophisticated reliability measures, improved construct validity, avoidance of exposure effects, and improved efficiency. In the present research we therefore use these techniques to develop a new test of a well-studied psychological capacity: melodic discrimination, the ability to detect differences between melodies. We calibrate and validate this test in a series of studies. Studies 1 and 2 respectively calibrate and validate an initial test version, while Studies 3 and 4 calibrate and validate an updated test version incorporating additional easy items. The results support the new test's viability, with evidence for strong reliability and construct validity. We discuss how these modern psychometric techniques may also be profitably applied to other areas of music psychology and psychological science in general.
Factor structure and psychometric properties of the Fertility Problem Inventory–Short Form

PubMed Central

Zurlo, Maria Clelia; Cattaneo Della Volta, Maria Franscesca; Vallone, Federica

2017-01-01

The study analyses factor structure and psychometric properties of the Italian version of the Fertility Problem Inventory–Short Form. A sample of 206 infertile couples completed the Italian version of Fertility Problem Inventory (46 items) with demographics, State Anxiety Scale of State-Trait Anxiety Inventory (Form Y), Edinburgh Depression Scale and Dyadic Adjustment Scale, used to assess convergent and discriminant validity. Confirmatory factor analysis was unsatisfactory (comparative fit index = 0.87; Tucker-Lewis Index = 0.83; root mean square error of approximation = 0.17), and Cronbach’s α (0.95) revealed a redundancy of items. Exploratory factor analysis was carried out deleting cross-loading items, and Mokken scale analysis was applied to verify the items homogeneity within the reduced subscales of the questionnaire. The Fertility Problem Inventory–Short Form consists of 27 items, tapping four meaningful and reliable factors. Convergent and discriminant validity were confirmed. Findings indicated that the Fertility Problem Inventory–Short Form is a valid and reliable measure to assess infertility-related stress dimensions. PMID:29379625
Item Difficulty in the Evaluation of Computer-Based Instruction: An Example from Neuroanatomy

PubMed Central

Chariker, Julia H.; Naaz, Farah; Pani, John R.

2012-01-01

This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of comparisons between instructional methods changed with the difficulty of the items to be learned. More challenging items better differentiated between instructional methods. This set of results is important for two reasons. First, it suggests that instruction may be more efficient if sets of consistently difficult items are the targets of instructional methods particularly suited to them. Second, there is wide variation in the published literature regarding the outcomes of empirical evaluations of computer-based instruction. As a consequence, many questions arise as to the factors that may affect such evaluations. The present paper demonstrates that the level of challenge in the material that is presented to learners is an important factor to consider in the evaluation of a computer-based instructional system. PMID:22231801

Item difficulty in the evaluation of computer-based instruction: an example from neuroanatomy.

PubMed

Chariker, Julia H; Naaz, Farah; Pani, John R

2012-01-01

This article reports large item effects in a study of computer-based learning of neuroanatomy. Outcome measures of the efficiency of learning, transfer of learning, and generalization of knowledge diverged by a wide margin across test items, with certain sets of items emerging as particularly difficult to master. In addition, the outcomes of comparisons between instructional methods changed with the difficulty of the items to be learned. More challenging items better differentiated between instructional methods. This set of results is important for two reasons. First, it suggests that instruction may be more efficient if sets of consistently difficult items are the targets of instructional methods particularly suited to them. Second, there is wide variation in the published literature regarding the outcomes of empirical evaluations of computer-based instruction. As a consequence, many questions arise as to the factors that may affect such evaluations. The present article demonstrates that the level of challenge in the material that is presented to learners is an important factor to consider in the evaluation of a computer-based instructional system. Copyright © 2011 American Association of Anatomists.
PubMed Central

PANATTO, D.; ARATA, L.; BEVILACQUA, I.; APPRATO, L.; GASPARINI, R.; AMICIZIA, D.

2015-01-01

Summary Introduction. Health-related knowledge is often assessed through multiple-choice tests. Among the different types of formats, researchers may opt to use multiple-mark items, i.e. with more than one correct answer. Although multiple-mark items have long been used in the academic setting – sometimes with scant or inconclusive results – little is known about the implementation of this format in research on in-field health education and promotion. Methods. A study population of secondary school students completed a survey on nutrition-related knowledge, followed by a single- lecture intervention. Answers were scored by means of eight different scoring algorithms and analyzed from the perspective of classical test theory. The same survey was re-administered to a sample of the students in order to evaluate the short-term change in their knowledge. Results. In all, 286 questionnaires were analyzed. Partial scoring algorithms displayed better psychometric characteristics than the dichotomous rule. In particular, the algorithm proposed by Ripkey and the balanced rule showed greater internal consistency and relative efficiency in scoring multiple-mark items. A penalizing algorithm in which the proportion of marked distracters was subtracted from that of marked correct answers was the only one that highlighted a significant difference in performance between natives and immigrants, probably owing to its slightly better discriminatory ability. This algorithm was also associated with the largest effect size in the pre-/post-intervention score change. Discussion. The choice of an appropriate rule for scoring multiple- mark items in research on health education and promotion should consider not only the psychometric properties of single algorithms but also the study aims and outcomes, since scoring rules differ in terms of biasness, reliability, difficulty, sensitivity to guessing and discrimination. PMID:26900331
An 8-item short form of the Eating Disorder Examination-Questionnaire adapted for children (ChEDE-Q8).

PubMed

Kliem, Sören; Schmidt, Ricarda; Vogel, Mandy; Hiemisch, Andreas; Kiess, Wieland; Hilbert, Anja

2017-06-01

Eating disturbances are common in children placing a vulnerable group of them at risk for full-syndrome eating disorders and adverse health outcomes. To provide a valid self-report assessment of eating disorder psychopathology in children, a short form of the child version of the Eating Disorder Examination (ChEDE-Q) was psychometrically evaluated. Similar to the EDE-Q, the ChEDE-Q provides assessment of eating disorder psychopathology related to anorexia nervosa, bulimia nervosa, and binge-eating disorder; however, the ChEDE-Q does not assess symptoms of avoidant/restrictive food intake disorder, pica, or rumination disorder. In 1,836 participants ages 7 to 18 years, recruited from two independent population-based samples, the factor structure of the recently established 8-item short form EDE-Q8 for adults was examined, including measurement invariance analyses on age, gender, and weight status derived from objectively measured weight and height. For convergent validity, the ChEDE-Q global score, body esteem scale, strengths and difficulties questionnaire, and sociodemographic characteristics were used. Item characteristics and age- and gender-specific norms were calculated. Confirmatory factor analysis revealed good model fit for the 8-item ChEDE-Q. Measurement invariance analyses indicated strict invariance for all analyzed subgroups. Convergent validity was provided through associations with well-established questionnaires and age, gender, and weight status, in expected directions. The newly developed ChEDE-Q8 proved to be a psychometrically sound and economical self-report assessment tool of eating disorder psychopathology in children. Further validation studies are needed, particularly concerning discriminant and predictive validity. © 2017 Wiley Periodicals, Inc.
Development of the Abbreviated Masculine Gender Role Stress Scale

PubMed Central

Swartout, Kevin M.; Parrott, Dominic J.; Cohn, Amy M.; Hagman, Brett T.; Gallagher, Kathryn E.

2014-01-01

Data gathered from six independent samples (n = 1,729) that assessed men’s masculine gender role stress in college and community males were aggregated used to determine the reliability and validity of an abbreviated version of the Masculine Gender Role Stress Scale (MGRS scale). The 15 items with the highest item-to-total scale correlations were used to create an abbreviated MGRS scale. Psychometric properties of each of the 15-items were examined with Item Response Theory (IRT) analysis, using the discrimination and threshold parameters. IRT results showed that the abbreviated scale may hold promise at capturing the same amount of information as the full 40-item scale. Relative to the 40-item scale, the total score of the abbreviated MGRS scale demonstrated comparable convergent validity using the measurement domains of masculine identity, hyper-masculinity, trait anger, anger expression, and alcohol involvement. An abbreviated MGRS scale may be recommended for use in clinical practice and research settings to reduce cost, time, and patient/participant burden. Additionally, IRT analyses identified items with higher discrimination and threshold parameters that may be used to screen for problematic gender role stress in men who may be seen in routine clinical or medical practice. PMID:25528163
Development of the Abbreviated Masculine Gender Role Stress Scale.

PubMed

Swartout, Kevin M; Parrott, Dominic J; Cohn, Amy M; Hagman, Brett T; Gallagher, Kathryn E

2015-06-01

Data gathered from 6 independent samples (n = 1,729) that assessed men's masculine gender role stress in college and community males were aggregated used to determine the reliability and validity of an abbreviated version of the Masculine Gender Role Stress (MGRS) Scale. The 15 items with the highest item-to-total scale correlations were used to create an abbreviated MGRS Scale. Psychometric properties of each of the 15 items were examined with item response theory (IRT) analysis, using the discrimination and threshold parameters. IRT results showed that the abbreviated scale may hold promise at capturing the same amount of information as the full 40-item scale. Relative to the 40-item scale, the total score of the abbreviated MGRS Scale demonstrated comparable convergent validity using the measurement domains of masculine identity, hypermasculinity, trait anger, anger expression, and alcohol involvement. An abbreviated MGRS Scale may be recommended for use in clinical practice and research settings to reduce cost, time, and patient/participant burden. Additionally, IRT analyses identified items with higher discrimination and threshold parameters that may be used to screen for problematic gender role stress in men who may be seen in routine clinical or medical practice. (c) 2015 APA, all rights reserved).
[Development of a scale to measure the self concept of cesarean section mothers].

PubMed

Lee, M L; Cho, J H

1990-08-01

Recently, the rate of cesarean section in Korea has been increasing. The results of several previous studies in foreign countries on the emotional responses of cesarean section mothers showed that they might experience difficulties in the mother-infant interaction due to fatigue, lack of early mother-infant interaction, disappointments, anger, feelings of loss of control, and other factors. Human behavior is said to be determined by one's self concept, and self concept is influenced by both internal and external environmental factors. A scale to measure the self concept of cesarean section mothers was needed in order to identify those who might have difficulties in the mother-infant interactions in future. The purposes of this study were to develop a measuring scale, and to test its reliability and validity. The process of this study was as follows. A structured interview was done with 50 cesarean section and vaginal delivery mothers to find their state of emotional reaction after giving birth to their babies. Based on the results of the interviews, a 50 items Likert scale was developed. The self concept of 268 cesarean section and vaginal delivery mothers who were hospitalized at six hospital in seoul were measured, during the period between Feb. 1 and April 30. Reviewing the discriminating power of each item by means of crosstabulation, ten items were selected for the final scale. The reliability and validity of this ten item scale were tested by Cronbach's alpha and t-test, using spss pc + package. The results of this study and recommendation are as follows. 1. The ten selected items were as follows. I feel pains in my breast. (-) I have a good appetite now. (+) I feel pains in my flank. (-) I feel fine now. (+) My body seems to have returned to its prepregnant state. (+) Thinking of the delivery process, I feel sorry. (-) I want to hold my baby in my arms. (+) I want to keep my own life, even if I became a mother. (-) I want to delegate the care of the baby to my mother/mother in law. (-) I think baby is my alter ege. (+) 2. The reliability of this scale was tested by Cronbach's alpha, and the coefficient of this scale was .8066. 3. The construct validity of this scale was tested by means of known group methods. The value of self concept for cesarean section mother was significantly lower than for vaginal delivery mothers (t = -5.51, df = 266, p = 0.007).(ABSTRACT TRUNCATED AT 400 WORDS)
Visual search for motion-form conjunctions: is form discriminated within the motion system?

PubMed

von Mühlenen, A; Müller, H J

2001-06-01

Motion-form conjunction search can be more efficient when the target is moving (a moving 45 degrees tilted line among moving vertical and stationary 45 degrees tilted lines) rather than stationary. This asymmetry may be due to aspects of form being discriminated within a motion system representing only moving items, whereas discrimination of stationary items relies on a static form system (J. Driver & P. McLeod, 1992). Alternatively, it may be due to search exploiting differential motion velocity and direction signals generated by the moving-target and distractor lines. To decide between these alternatives, 4 experiments systematically varied the motion-signal information conveyed by the moving target and distractors while keeping their form difference salient. Moving-target search was found to be facilitated only when differential motion-signal information was available. Thus, there is no need to assume that form is discriminated within the motion system.
The development and validation of the Questionnaire on Anticipated Discrimination (QUAD).

PubMed

Gabbidon, Jheanell; Brohan, Elaine; Clement, Sarah; Henderson, R Claire; Thornicroft, Graham

2013-11-07

The anticipation of mental health-related discrimination is common amongst people with mental health problems and can have serious adverse effects. This study aimed to develop and validate a measure assessing the extent to which people with mental health problems anticipate that they will personally experience discrimination across a range of contexts. The items and format for the Questionnaire on Anticipated Discrimination (QUAD) were developed from previous versions of the Discrimination and Stigma Scale (DISC), focus groups and cognitive debriefing interviews which were used to further refine the content and format. The resulting provisional version of the QUAD was completed by 117 service users in an online survey and reliability, validity, precision and acceptability were assessed. A final version of the scale was agreed and analyses re-run using the online survey data and data from an independent sample to report the psychometric properties of the finalised scale. The provisional version of the QUAD had 17 items, good internal consistency (alpha = 0.86) and adequate convergent validity as supported by the significant positive correlations with the Stigma Scale (SS) (r = 0.40, p < 0.001) and the Internalised Stigma of Mental Illness Scale (ISMI) (r = 0.40, p < 0.001). Three items were removed due to low endorsements, high inter-correlation or conceptual concerns. The finalised 14 item QUAD had good internal consistency (alpha = 0.86), good test re-test reliability (ρ(c) = 0.81) and adequate convergent validity: correlations with the ISMI (r = 0.45, p < 0.001) and with the SS (r = 0.39, p < 0.001). Reading ease scores indicated good acceptability for general adult populations. Cross-replication in an independent sample further indicated good internal consistency (alpha = 0.88), adequate convergent validity and revealed two factors summarised by institutions/services and interpersonal/professional relationships. The QUAD expanded upon previous versions of the DISC. It is a reliable, valid and acceptable measure which can be used to identify key life areas in which people may personally anticipate discrimination, and an overall tendency to anticipate discrimination. It may also be useful in planning interventions aimed at reducing the stigma of mental illness.
Redintegration, task difficulty, and immediate serial recall tasks.

PubMed

Ritchie, Gabrielle; Tolan, Georgina Anne; Tehan, Gerald

2015-03-01

While current theoretical models remain somewhat inconclusive in their explanation of short-term memory (STM), many theories suggest at least a contribution of long-term memory (LTM) to the short-term system. A number of researchers refer to this process as redintegration (e.g., Schweickert, 1993). Under short-term recall conditions, the current study investigated the effects of redintegration and task difficulty in order to extend research conducted by Neale and Tehan (2007). Thirty participants in Experiment 1 and 26 participants in Experiment 2 completed a serial recall task in which retention interval, presentation rate, and articulatory suppression were used to modify task difficulty. Redintegration was examined by manipulating the characteristics of the to-be-remembered items; lexicality in Experiment 1 and wordlikeness in Experiment 2. Responses were scored based on correct-in-position recall, item scoring, and order accuracy scoring. In line with the Neale and Tehan results, as the difficulty of the task increased so did the effects of redintegration. This was evident in that the advantage for words in Experiment 1 and wordlikeness in Experiment 2 decreased as task difficulty increased. This relationship was observed for item but not order memory, and findings were discussed in relation to the theory of redintegration. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
Phoneme categorization and discrimination in younger and older adults: a comparative analysis of perceptual, lexical, and attentional factors.

PubMed

Mattys, Sven L; Scharenborg, Odette

2014-03-01

This study investigates the extent to which age-related language processing difficulties are due to a decline in sensory processes or to a deterioration of cognitive factors, specifically, attentional control. Two facets of attentional control were examined: inhibition of irrelevant information and divided attention. Younger and older adults were asked to categorize the initial phoneme of spoken syllables ("Was it m or n?"), trying to ignore the lexical status of the syllables. The phonemes were manipulated to range in eight steps from m to n. Participants also did a discrimination task on syllable pairs ("Were the initial sounds the same or different?"). Categorization and discrimination were performed under either divided attention (concurrent visual-search task) or focused attention (no visual task). The results showed that even when the younger and older adults were matched on their discrimination scores: (1) the older adults had more difficulty inhibiting lexical knowledge than did younger adults, (2) divided attention weakened lexical inhibition in both younger and older adults, and (3) divided attention impaired sound discrimination more in older than younger listeners. The results confirm the independent and combined contribution of sensory decline and deficit in attentional control to language processing difficulties associated with aging. The relative weight of these variables and their mechanisms of action are discussed in the context of theories of aging and language. (c) 2014 APA, all rights reserved.
Work ability as prognostic risk marker of disability pension: single-item work ability score versus multi-item work ability index.

PubMed

Roelen, Corné A M; van Rhenen, Willem; Groothoff, Johan W; van der Klink, Jac J L; Twisk, Jos W R; Heymans, Martijn W

2014-07-01

Work ability predicts future disability pension (DP). A single-item work ability score (WAS) is emerging as a measure for work ability. This study compared single-item WAS with the multi-item work ability index (WAI) in its ability to identify workers at risk of DP. This prospective cohort study comprised 11 537 male construction workers, who completed the WAI at baseline and reported DP after a mean 2.3 years of follow-up. WAS and WAI were calibrated for DP risk predictions with the Hosmer-Lemeshow (H-L) test and their ability to discriminate between high- and low-risk construction workers was investigated with the area under the receiver operating characteristic curve (AUC). At follow-up, 336 (3%) construction workers reported DP. Both WAS [odds ratio (OR) 0.72, 95% confidence interval (95% CI) 0.66-0.78] and WAI (OR 0.57, 95% CI 0.52-0.63) scores were associated with DP at follow-up. The WAS showed miscalibration (H-L model χ (�)=10.60; df=3; P=0.01) and poorly discriminated between high- and low-risk construction workers (AUC 0.67, 95% CI 0.64-0.70). In contrast, calibration (H-L model χ �=8.20; df=8; P=0.41) and discrimination (AUC 0.78, 95% CI 0.75-0.80) were both adequate for the WAI. Although associated with the risk of future DP, the single-item WAS poorly identified male construction workers at risk of DP. We recommend using the multi-item WAI to screen for risk of DP in occupational health practice.
Crossing the Divide: Infants Discriminate Small from Large Numerosities

ERIC Educational Resources Information Center

Cordes, Sara; Brannon, Elizabeth M.

2009-01-01

Although young infants have repeatedly demonstrated successful numerosity discrimination across large sets when the number of items in the sets changes twofold (E. M. Brannon, S. Abbott, & D. J. Lutz, 2004; J. N. Wood & E. S. Spelke, 2005; F. Xu & E. S. Spelke, 2000), they consistently fail to discriminate a twofold change in number when one set…
Visual Speech Fills in Both Discrimination and Identification of Non-Intact Auditory Speech in Children

ERIC Educational Resources Information Center

Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Herve

2018-01-01

To communicate, children must discriminate and identify speech sounds. Because visual speech plays an important role in this process, we explored how visual speech influences phoneme discrimination and identification by children. Critical items had intact visual speech (e.g. baez) coupled to non-intact (excised onsets) auditory speech (signified…
Item Information in the Rasch Model. Project Psychometric Aspects of Item Banking No. 34. Research Report 88-7.

ERIC Educational Resources Information Center

Engelen, Ron J. H.; And Others

Fisher's information measure for the item difficulty parameter in the Rasch model and its marginal and conditional formulations are investigated. It is shown that expected item information in the unconditional model equals information in the marginal model, provided the assumption of sampling examinees from an ability distribution is made. For the…
Physics 30 Program Machine-Scorable Open-Ended Questions: Unit 2: Electric and Magnetic Forces. Diploma Examinations Program.

ERIC Educational Resources Information Center

Alberta Dept. of Education, Edmonton.

This document outlines the use of machine-scorable open-ended questions for the evaluation of Physics 30 in Alberta. Contents include: (1) an introduction to the questions; (2) sample instruction sheet; (3) fifteen sample items; (4) item information including the key, difficulty, and source of each item; (5) solutions to items having multiple…
Using the ICF's environmental factors framework to develop an item bank measuring built and natural environmental features affecting persons with disabilities.

PubMed

Heinemann, Allen W; Lai, Jin-Shei; Wong, Alex; Dashner, Jessica; Magasi, Susan; Hahn, Elizabeth A; Carlozzi, Noelle E; Tulsky, David S; Jerousek, Sara; Semik, Patrick; Miskovic, Ana; Gray, David B

2016-11-01

To develop a measure of natural environment and human-made change features (Chapter 2 of the international classification of functioning, disability, and health) and evaluate the influence of perceived barriers on health-related quality of life. A sample of 570 adults with stroke, spinal cord injury, and traumatic brain injury residing in community settings reported their functioning in home, outdoor, and community settings (mean age = 47.0 years, SD = 16.1). They rated 18 items with a 5-point rating scale to describe the influence of barriers to moving around, seeing objects, hearing sounds, hearing conversations, feeling safe, and regulating temperature and indicated whether any difficulties were due to environmental features. We used Rasch analysis to identify misfitting items and evaluate differential item functioning (DIF) across impairment groups. We computed correlations between barriers and patient-reported outcomes measurement information system (PROMIS) social domain measures and community participation indicators (CPI) measures. The 18 items demonstrated person reliability of .70, discriminating nearly three levels of barriers. All items fit the Rasch model; impairment-related DIF was negligible. Ceiling effects were negligible, but 25 % of the respondents were at the floor, indicating that they did not experience barriers that they attributed to the built and natural environment. As anticipated, barriers correlated moderately with PROMIS and CPI variables, suggesting that although this new item bank measures a construct that is related to participation and health-related quality of life, it also captures something unique. Known-groups validity was supported by wheelchair users reporting a higher level of barriers than did ambulatory respondents. Preliminary evidence supports the reliability and validity of this new measure of barriers to the built and natural environment. This measure allows investigators and clinicians to measure perceptions of the natural environment and human-made changes, providing information that can guide interventions to reduce barriers. Moderate relationships between barriers and PROMIS and CPI variables provide support for the measurement and theory of environmental influences on social health and participation.
Speech perception task with pseudowords.

PubMed

Appezzato, Mariana Martins; Hackerott, Maria Mercedes Saraiva; Avila, Clara Regina Brandão de

2018-01-01

Purpose Prepare a list of pseudowords in Brazilian Portuguese to assess the auditory discrimination ability of schoolchildren and investigate the internal consistency of test items and the effect of school grade on discrimination performance. Methods Study participants were 60 schoolchildren (60% female) enrolled in the 3rd (n=14), 4th (n=24) and 5th (n=22) grades of an elementary school in the city of Sao Paulo, Brazil, aged between eight years and two months and 11 years and eight months (99 to 136 months; mean=120.05; SD=10.26), with average school performance score of 7.21 (minimum 5.0; maximum 10; SD=1.23). Forty-eight minimal pairs of Brazilian Portuguese pseudowords distinguished by a single phoneme were prepared. The participants' responses (whether the elements of the pairs were the same or different) were noted and analyzed. The data were analyzed using the Cronbach's Alpha Coefficient, Spearman's Correlation Coefficient, and Bonferroni Post-hoc Test at significance level of 0.05. Results Internal consistency analysis indicated the deletion of 20 pairs. The 28 items with results showed good internal consistency (α=0.84). The maximum and minimum scores of correct discrimination responses were 34 and 16, respectively (mean=30.79; SD=3.68). No correlation was observed between age, school performance, and discrimination performance, and no difference between school grades was found. Conclusion Most of the items proposed for assessing the auditory discrimination of speech sounds showed good internal consistency in relation to the task. Age and school grade did not improve the auditory discrimination of speech sounds.
Teacher Perceived Difficulty in Implementing Differentiated Instructional Strategies in Primary School

ERIC Educational Resources Information Center

Gaitas, Sérgio; Alves Martins, Margarida

2017-01-01

This study analyses teacher perceived difficulty in implementing differentiated instructional strategies in regular classes. The participants were 273 Portuguese primary school teachers with teaching experience ranging from 1 to 33 years. A 39-item questionnaire was used to evaluate teacher perceived difficulty in relation to different…
Measuring and Predicting Graded Reader Difficulty

ERIC Educational Resources Information Center

Holster, Trevor A.; Lake, J. W.; Pellowe, William R.

2017-01-01

This study used many-faceted Rasch measurement to investigate the difficulty of graded readers using a 3-item survey. Book difficulty was compared with Kyoto Level, Yomiyasusa Level, Lexile Level, book length, mean sentence length, and mean word frequency. Word frequency and Kyoto Level were found to be ineffective in predicting students'…
Critical success factors in awareness of and choice towards low vision rehabilitation.

PubMed

Fraser, Sarah A; Johnson, Aaron P; Wittich, Walter; Overbury, Olga

2015-01-01

The goal of the current study was to examine the critical factors indicative of an individual's choice to access low vision rehabilitation services. Seven hundred and forty-nine visually impaired individuals, from the Montreal Barriers Study, completed a structured interview and questionnaires (on visual function, coping, depression, satisfaction with life). Seventy-five factors from the interview and questionnaires were entered into a data-driven Classification and Regression Tree Analysis in order to determine the best predictors of awareness group: positive personal choice (I knew and I went), negative personal choice (I knew and did not go), and lack of information (Nobody told me, and I did not know). Having a response of moderate to no difficulty on item 6 (reading signs) of the Visual Function Index 14 (VF-14) indicated that the person had made a positive personal choice to seek rehabilitation, whereas reporting a great deal of difficulty on this item was associated with a lack of information on low vision rehabilitation. In addition to this factor, symptom duration of under nine years, moderate difficulty or less on item 5 (seeing steps or curbs) of the VF-14, and an indication of little difficulty or less on item 3 (reading large print) of the VF-14 further identified those who were more likely to have made a positive personal choice. Individuals in the lack of information group also reported greater difficulty on items 3 and 5 of the VF-14 and were more likely to be male. The duration-of-symptoms factor suggests that, even in the positive choice group, it may be best to offer rehabilitation services early. Being male and responding moderate difficulty or greater to the VF-14 questions about far, medium-distance and near situations involving vision was associated with individuals that lack information. Consequently, these individuals may need additional education about the benefits of low vision services in order to make a positive personal choice. © 2014 The Authors Ophthalmic & Physiological Optics © 2014 The College of Optometrists.

A virtual shopping task for the assessment of executive functions: Validity for people with stroke.

PubMed

Nir-Hadad, Shira Yama; Weiss, Patrice L; Waizman, Anna; Schwartz, Natalia; Kizony, Rachel

2017-07-01

The importance of assessing executive functions (EF) using ecologically valid assessments has been discussed extensively. Due to the difficulty of carrying out such assessments in real-world settings on a regular basis, virtual reality has been proposed as a technique to provide complex functional tasks under a variety of differing conditions while measuring various aspects of performance and controlling for stimuli. The main goal of this study was to examine the discriminant, construct-convergent and ecological validity of the Adapted Four-Item Shopping Task, an assessment of the Instrumental Activity of Daily Living (IADL) of shopping. Nineteen people with stroke, aged 50-85 years, and 20 age- and gender-matched healthy participants performed the shopping task in both the SeeMe Virtual Interactive Shopping environment and a real shopping environment (the hospital cafeteria) in a counterbalanced order. The shopping task outcomes were compared to clinical measures of EF. The findings provided good initial support for the validity of the Adapted Four-Item Shopping Task as an IADL assessment that requires the use of EF for people with stroke. Further studies should examine this task with a larger sample of people with stroke as well as with other populations who have deficits in EF.
Some considerations in evaluating spoken word recognition by normal-hearing, noise-masked normal-hearing, and cochlear implant listeners. I: The effects of response format.

PubMed

Sommers, M S; Kirk, K I; Pisoni, D B

1997-04-01

The purpose of the present studies was to assess the validity of using closed-set response formats to measure two cognitive processes essential for recognizing spoken words---perceptual normalization (the ability to accommodate acoustic-phonetic variability) and lexical discrimination (the ability to isolate words in the mental lexicon). In addition, the experiments were designed to examine the effects of response format on evaluation of these two abilities in normal-hearing (NH), noise-masked normal-hearing (NMNH), and cochlear implant (CI) subject populations. The speech recognition performance of NH, NMNH, and CI listeners was measured using both open- and closed-set response formats under a number of experimental conditions. To assess talker normalization abilities, identification scores for words produced by a single talker were compared with recognition performance for items produced by multiple talkers. To examine lexical discrimination, performance for words that are phonetically similar to many other words (hard words) was compared with scores for items with few phonetically similar competitors (easy words). Open-set word identification for all subjects was significantly poorer when stimuli were produced in lists with multiple talkers compared with conditions in which all of the words were spoken by a single talker. Open-set word recognition also was better for lexically easy compared with lexically hard words. Closed-set tests, in contrast, failed to reveal the effects of either talker variability or lexical difficulty even when the response alternatives provided were systematically selected to maximize confusability with target items. These findings suggest that, although closed-set tests may provide important information for clinical assessment of speech perception, they may not adequately evaluate a number of cognitive processes that are necessary for recognizing spoken words. The parallel results obtained across all subject groups indicate that NH, NMNH, and CI listeners engage similar perceptual operations to identify spoken words. Implications of these findings for the design of new test batteries that can provide comprehensive evaluations of the individual capacities needed for processing spoken language are discussed.
A Study of Reverse-Worded Matched Item Pairs Using the Generalized Partial Credit and Nominal Response Models

ERIC Educational Resources Information Center

Matlock Cole, Ki Lynn; Turner, Ronna C.; Gitchel, W. Dent

2018-01-01

The generalized partial credit model (GPCM) is often used for polytomous data; however, the nominal response model (NRM) allows for the investigation of how adjacent categories may discriminate differently when items are positively or negatively worded. Ten items from three different self-reported scales were used (anxiety, depression, and…
Interest Inventory Items as Reinforcing Stimuli: A Test of the A-R-D Theory.

ERIC Educational Resources Information Center

Staats, Arthur W.; And Others

An experiement was conducted to test the hypothesis that interest inventory items would function as reinforcing stimuli in a visual discrimination task. When previously rated liked and disliked items from the Strong Vocational Interest Blank were differentially presented following one of two responses, subjects learned to respond to the stimulus…
Effect of Clinically Discriminating, Evidence-Based Checklist Items on the Reliability of Scores from an Internal Medicine Residency OSCE

ERIC Educational Resources Information Center

Daniels, Vijay J.; Bordage, Georges; Gierl, Mark J.; Yudkowsky, Rachel

2014-01-01

Objective structured clinical examinations (OSCEs) are used worldwide for summative examinations but often lack acceptable reliability. Research has shown that reliability of scores increases if OSCE checklists for medical students include only clinically relevant items. Also, checklists are often missing evidence-based items that high-achieving…
Deficient relational binding processes in adolescents with psychosis: evidence from impaired memory for source and temporal context.

PubMed

Doré, Marie-Claire; Caza, Nicole; Gingras, Nathalie; Rouleau, Nancie

2007-11-01

Findings from the literature consistently revealed episodic memory deficits in adolescents with psychosis. However, the nature of the dysfunction remains unclear. Based on a cognitive neuropsychological approach, a theoretically driven paradigm was used to generate valid interpretations about the underlying memory processes impaired in these patients. A total of 16 inpatient adolescents with psychosis and 19 individually matched controls were assessed using an experimental task designed to measure memory for source and temporal context of studied words. Retrospective confidence judgements for source and temporal context responses were also assessed. On word recognition, patients had more difficulty than controls discriminating target words from neutral distractors. In addition, patients identified both source and temporal context features of recognised items less often than controls. Confidence judgements analyses revealed that the difference between the proportions of correct and incorrect responses made with high confidence was lower in patients than in controls. In addition, the proportion of high-confident responses that were errors was higher in patients compared to controls. These findings suggest impaired relational binding processes in adolescents with psychosis, resulting in a difficulty to create unified memory representations. Our findings on retrospective confidence data point to impaired monitoring of retrieved information that may also impair memory performance in these individuals.
Cross-cultural adaptation and construct validity of the Korean version of a physical activity measure for community-dwelling elderly.

PubMed

Choi, Bongsam

2018-01-01

[Purpose] This study aimed to cross-cultural adapt and validate the Korean version of an physical activity measure (K-PAM) for community-dwelling elderly. [Subjects and Methods] One hundred and thirty eight community-dwelling elderlies, 32 males and 106 female, participated in the study. All participants were asked to fill out a fifty-one item questionnaire measuring perceived difficulty in the activities of daily living (ADL) for the elderly. One-parameter model of item response theory (Rasch analysis) was applied to determine the construct validity and to inspect item-level psychometric properties of 51 ADL items of the K-PAM. [Results] Person separation reliability (analogous to Cronbach's alpha) for internal consistency was ranging 0.93 to 0.94. A total of 16 items was misfit to the Rasch model. After misfit item deletion, 35 ADL items of the K-PAM were placed in an empirically meaningful hierarchy from easy to hard. The item-person map analysis delineated that the item difficulty was well matched for the elderlies with moderate and low ability except for high ceilings. [Conclusion] Cross-cultural adapted K-PAM was shown to be sufficient for establishing construct validity and stable psychometric properties confirmed by person separation reliability and fit statistics.
Increased susceptibility to proactive interference in adults with dyslexia?

PubMed

Bogaerts, Louisa; Szmalec, Arnaud; Hachmann, Wibke M; Page, Mike P A; Woumans, Evy; Duyck, Wouter

2015-01-01

Recent findings show that people with dyslexia have an impairment in serial-order memory. Based on these findings, the present study aimed to test the hypothesis that people with dyslexia have difficulties dealing with proactive interference (PI) in recognition memory. A group of 25 adults with dyslexia and a group of matched controls were subjected to a 2-back recognition task, which required participants to indicate whether an item (mis)matched the item that had been presented 2 trials before. PI was elicited using lure trials in which the item matched the item in the 3-back position instead of the targeted 2-back position. Our results demonstrate that the introduction of lure trials affected 2-back recognition performance more severely in the dyslexic group than in the control group, suggesting greater difficulty in resisting PI in dyslexia.
What does distractibility in ADHD reveal about mechanisms for top-down attentional control?

PubMed

Friedman-Hill, Stacia R; Wagman, Meryl R; Gex, Saskia E; Pine, Daniel S; Leibenluft, Ellen; Ungerleider, Leslie G

2010-04-01

In this study, we attempted to clarify whether distractibility in ADHD might arise from increased sensory-driven interference or from inefficient top-down control. We employed an attentional filtering paradigm in which discrimination difficulty and distractor salience (amount of image "graying") were parametrically manipulated. Increased discrimination difficulty should add to the load of top-down processes, whereas increased distractor salience should produce stronger sensory interference. We found an unexpected interaction of discrimination difficulty and distractor salience. For difficult discriminations, ADHD children filtered distractors as efficiently as healthy children and adults; as expected, all three groups were slower to respond with high vs. low salience distractors. In contrast, for easy discriminations, robust between-group differences emerged: ADHD children were much slower and made more errors than either healthy children or adults. For easy discriminations, healthy children and adults filtered out high salience distractors as easily as low salience distractors, but ADHD children were slower to respond on trials with low salience distractors than they did on trials with high salience distractors. These initial results from a small sample of ADHD children have implications for models of attentional control, and ways in which it can malfunction. The fact that ADHD children exhibited efficient attentional filtering when task demands were high, but showed deficient and atypical distractor filtering under low task demands suggests that attention deficits in ADHD may stem from a failure to efficiently engage top-down control rather than an inability to implement filtering in sensory processing regions. Published by Elsevier B.V.
An international measure of awareness and beliefs about cancer: development and testing of the ABC

PubMed Central

Simon, Alice E; Forbes, Lindsay J L; Boniface, David; Warburton, Fiona; Brain, Kate E; Dessaix, Anita; Donnelly, Michael; Haynes, Kerry; Hvidberg, Line; Lagerlund, Magdalena; Petermann, Lisa; Tishelman, Carol; Vedsted, Peter; Vigmostad, Maria Nyre; Wardle, Jane; Ramirez, Amanda J

2012-01-01

Objectives To develop an internationally validated measure of cancer awareness and beliefs; the awareness and beliefs about cancer (ABC) measure. Design and setting Items modified from existing measures were assessed by a working group in six countries (Australia, Canada, Denmark, Norway, Sweden and the UK). Validation studies were completed in the UK, and cross-sectional surveys of the general population were carried out in the six participating countries. Participants Testing in UK English included cognitive interviewing for face validity (N=10), calculation of content validity indexes (six assessors), and assessment of test–retest reliability (N=97). Conceptual and cultural equivalence of modified (Canadian and Australian) and translated (Danish, Norwegian, Swedish and Canadian French) ABC versions were tested quantitatively for equivalence of meaning (≥4 assessors per country) and in bilingual cognitive interviews (three interviews per translation). Response patterns were assessed in surveys of adults aged 50+ years (N≥2000) in each country. Main outcomes Psychometric properties were evaluated through tests of validity and reliability, conceptual and cultural equivalence and systematic item analysis. Test–retest reliability used weighted-κ and intraclass correlations. Construction and validation of aggregate scores was by factor analysis for (1) beliefs about cancer outcomes, (2) beliefs about barriers to symptomatic presentation, and item summation for (3) awareness of cancer symptoms and (4) awareness of cancer risk factors. Results The English ABC had acceptable test–retest reliability and content validity. International assessments of equivalence identified a small number of items where wording needed adjustment. Survey response patterns showed that items performed well in terms of difficulty and discrimination across countries except for awareness of cancer outcomes in Australia. Aggregate scores had consistent factor structures across countries. Conclusions The ABC is a reliable and valid international measure of cancer awareness and beliefs. The methods used to validate and harmonise the ABC may serve as a methodological guide in international survey research. PMID:23253874
Analysis Test of Understanding of Vectors with the Three-Parameter Logistic Model of Item Response Theory and Item Response Curves Technique

ERIC Educational Resources Information Center

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-01-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming…
Psychometric Properties of the Children's Depression Inventory: An Item Response Theory Analysis across Age in a Nonclinical, Longitudinal, Adolescent Sample

ERIC Educational Resources Information Center

Lee, Young-Sun; Krishnan, Anita; Park, Yoon Soo

2012-01-01

The purpose of this study was to investigate psychometric properties of the Children's Depression Inventory within a nonclinical and longitudinal sample (8th and 12th grades). Using the Rasch rating scale, most items represented one dimension. There was adequate separation among items and no overlap between ranges of item difficulties with latent…
Do the Guideline Violations Influence Test Difficulty of High-Stake Test?: An Investigation on University Entrance Examination in Turkey

ERIC Educational Resources Information Center

Atalmis, Erkan Hasan

2016-01-01

Multiple-choice (MC) items are commonly used in high-stake tests. Thus, each item of such tests should be meticulously constructed to increase the accuracy of decisions based on test results. Haladyna and his colleagues (2002) addressed the valid item-writing guidelines to construct high quality MC items in order to increase test reliability and…
Workplace Discrimination, Prejudice, and Diversity Measurement: A Review of Instrumentation.

ERIC Educational Resources Information Center

Burkard, Alan W.; Boticki, Michael A.; Madson, Michael B.

2002-01-01

Critically reviews diversity measures in terms of item development, psychometric evidence, and utility for counseling and development: Workplace Prejudice/Discrimination Inventory, Attitudes toward Diversity Scale; Organizational Diversity Inventory, Workforce Diversity Questionnaire, Perceived Occupational Opportunity Scale-Form B, and Perceived…
Smoke and mirrors: Testing the scope of chimpanzees’ appearance-reality understanding

PubMed Central

Lurz, Robert; Russell, Jamie L.; Hopkins, William D.

2016-01-01

The ability to make appearance-reality (AR) discriminations is an important higher-order cognitive adaptation in humans but is still poorly understood in our closest primate relatives. Previous research showed that chimpanzees are capable of AR discrimination when choosing between food items that appear, due to the effects of distorting lenses, to be smaller or larger than they actually are (Krachun, Call & Tomasello, 2009). In the current study, we investigated the scope and flexibility of chimpanzees’ AR discrimination abilities by presenting them with a wider range of illusory stimuli. In addition to using lenses to change the apparent size of food items (Experiment 1), we used a mirror to change the apparent number of items (Experiment 2), and tinted filters to change their apparent color (Experiment 3). In all three experiments, some chimpanzees were able to maximize their food rewards by making a choice based on the real properties of the stimuli in contrast to their manifest apparent properties. These results replicate the earlier findings for size illusions and extend them to additional situations involving illusory number and color. Control tests, together with findings from previous studies, ruled out lower-level explanations for the chimpanzees’ performance. The findings thus support the hypothesis that chimpanzees are capable of making AR discriminations with a range of illusory stimuli. PMID:26848736
Reevaluating the Selectivity of Face-Processing Difficulties in Children and Adolescents with Autism

ERIC Educational Resources Information Center

Ewing, Louise; Pellicano, Elizabeth; Rhodes, Gillian

2013-01-01

There are few direct examinations of whether face-processing difficulties in autism are disproportionate to difficulties with other complex non-face stimuli. Here we examined discrimination ability and memory for faces, cars, and inverted faces in children and adolescents with and without autism. Results showed that, relative to typical children,…
Rise Time and Formant Transition Duration in the Discrimination of Speech Sounds: The Ba-Wa Distinction in Developmental Dyslexia

ERIC Educational Resources Information Center

Goswami, Usha; Fosker, Tim; Huss, Martina; Mead, Natasha; Szucs, Denes

2011-01-01

Across languages, children with developmental dyslexia have a specific difficulty with the neural representation of the sound structure (phonological structure) of speech. One likely cause of their difficulties with phonology is a perceptual difficulty in auditory temporal processing (Tallal, 1980). Tallal (1980) proposed that basic auditory…
Differential Gender Effects in the Relationship between Perceived Immune Functioning and Autistic Traits.

PubMed

Mackus, Marlou; Kruijff, Deborah de; Otten, Leila S; Kraneveld, Aletta D; Garssen, Johan; Verster, Joris C

2017-04-12

Altered immune functioning has been demonstrated in individuals with autism spectrum disorder (ASD). The current study explores the relationship between perceived immune functioning and experiencing ASD traits in healthy young adults. N = 410 students from Utrecht University completed a survey on immune functioning and autistic traits. In addition to a 1-item perceived immune functioning rating, the Immune Function Questionnaire (IFQ) was completed to assess perceived immune functioning. The Dutch translation of the Autism-Spectrum Quotient (AQ) was completed to examine variation in autistic traits, including the domains "social insights and behavior", "difficulties with change", "communication", "phantasy and imagination", and "detail orientation". The 1-item perceived immune functioning score did not significantly correlate with the total AQ score. However, a significant negative correlation was found between perceived immune functioning and the AQ subscale "difficulties with change" (r = -0.119, p = 0.019). In women, 1-item perceived immune functioning correlated significantly with the AQ subscales "difficulties with change" (r = -0.149, p = 0.029) and "communication" (r = -0.145, p = 0.032). In men, none of the AQ subscales significantly correlated with 1-item perceived immune functioning. In conclusion, a modest relationship between perceived immune functioning and several autistic traits was found.
Modeling Booklet Effects for Nonequivalent Group Designs in Large-Scale Assessment

ERIC Educational Resources Information Center

Hecht, Martin; Weirich, Sebastian; Siegle, Thilo; Frey, Andreas

2015-01-01

Multiple matrix designs are commonly used in large-scale assessments to distribute test items to students. These designs comprise several booklets, each containing a subset of the complete item pool. Besides reducing the test burden of individual students, using various booklets allows aligning the difficulty of the presented items to the assumed…
Effects of Using Modified Items to Test Students with Persistent Academic Difficulties

ERIC Educational Resources Information Center

Elliott, Stephen N.; Kettler, Ryan J.; Beddow, Peter A.; Kurz, Alexander; Compton, Elizabeth; McGrath, Dawn; Bruen, Charles; Hinton, Kent; Palmer, Porter; Rodriguez, Michael C.; Bolt, Daniel; Roach, Andrew T.

2010-01-01

This study investigated the effects of using modified items in achievement tests to enhance accessibility. An experiment determined whether tests composed of modified items would reduce the performance gap between students eligible for an alternate assessment based on modified achievement standards (AA-MAS) and students not eligible, and the…

Regression Effects in Angoff Ratings: Examples from Credentialing Exams

ERIC Educational Resources Information Center

Wyse, Adam E.

2018-01-01

This article discusses regression effects that are commonly observed in Angoff ratings where panelists tend to think that hard items are easier than they are and easy items are more difficult than they are in comparison to estimated item difficulties. Analyses of data from two credentialing exams illustrate these regression effects and the…
A Five-Year Evaluation of Examination Structure in a Cardiovascular Pharmacotherapy Course

PubMed Central

Kolar, Claire; Janke, Kristin K.

2015-01-01

Objective. To evaluate the composition and effectiveness as an assessment tool of a criterion-referenced examination comprised of clinical cases tied to practice decisions, to examine the effect of varying audience response system (ARS) questions on student examination preparation, and to articulate guidelines for structuring examinations to maximize evaluation of student learning. Design. Multiple-choice items developed over 5 years were evaluated using Bloom’s Taxonomy classification, point biserial correlation, item difficulty, and grade distribution. In addition, examination items were classified into categories based on similarity to items used in ARS preparation. Assessment. As the number of items directly tied to clinical practice rose, Bloom’s Taxonomy level and item difficulty also rose. In examination years where Bloom’s levels were high but preparation was minimal, average grade distribution was lower compared with years in which student preparation was higher. Conclusion. Criterion-referenced examinations can benefit from systematic evaluation of their composition and effectiveness as assessment tools. Calculated design and delivery of classroom preparation is an asset in improving examination performance on rigorous, practice-relevant examinations. PMID:27168611
Disruption of amygdala-entorhinal-hippocampal network in late-life depression.

PubMed

Leal, Stephanie L; Noche, Jessica A; Murray, Elizabeth A; Yassa, Michael A

2017-04-01

Episodic memory deficits are evident in late-life depression (LLD) and are associated with subtle synaptic and neurochemical changes in the medial temporal lobes (MTL). However, the particular mechanisms by which memory impairment occurs in LLD are currently unknown. We tested older adults with (DS+) and without (DS-) depressive symptoms using high-resolution fMRI that is capable of discerning signals in hippocampal subfields and amygdala nuclei. Scanning was conducted during performance of an emotional discrimination task used previously to examine the relationship between depressive symptoms and amygdala-mediated emotional modulation of hippocampal pattern separation in young adults. We found that hippocampal dentate gyrus (DG)/CA3 activity was reduced during correct discrimination of negative stimuli and increased during correct discrimination of neutral items in DS+ compared to DS- adults. The extent of the latter increase was correlated with symptom severity. Furthermore, DG/CA3 and basolateral amygdala (BLA) activity predicted discrimination performance on negative trials, a relationship that depended on symptom severity. The impact of the BLA on depressive symptom severity was mediated by the DG/CA3 during discrimination of neutral items, and by the lateral entorhinal cortex (LEC) during false recognition of positive items. These results shed light on a novel mechanistic account for amygdala-hippocampal network changes and concurrent alterations in emotional episodic memory in LLD. The BLA-LEC-DG/CA3 network, which comprises a key pathway by which emotion modulates memory, is specifically implicated in LLD. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
The Amerasians.

ERIC Educational Resources Information Center

Ranard, Donald A.; Gilzow, Douglas F.

1989-01-01

Articles in this newsletter issue examine the experiences, strengths, and problems that Amerasian refugees from Vietnam have had while living in the United States. Topics of discussion include discrimination, educational difficulties, resettlement experiences, and cultural difficulties. The concept of cluster site resettlement, a possible solution…
Effects of spacing of item repetitions in continuous recognition memory: does item retrieval difficulty promote item retention in older adults?

PubMed

Kılıç, Aslı; Hoyer, William J; Howard, Marc W

2013-01-01

BACKGROUND/STUDY CONTEXT: Older adults exhibit an age-related deficit in item memory as a function of the length of the retention interval, but older adults and young adults usually show roughly equivalent benefits due to the spacing of item repetitions in continuous memory tasks. The current experiment investigates the seemingly paradoxical effects of retention interval and spacing in young and older adults using a continuous recognition memory procedure. Fifty young adults and 52 older adults gave memory confidence ratings to words that were presented once (P1), twice (P2), or three times (P3), and the effects of the lag length and retention interval were assessed at P2 and at P3, respectively. Response times at P2 were disproportionately longer for older adults than for younger adults as a function of the number of items occurring between P1 and P2, suggestive of age-related loss in item memory. Ratings of confidence in memory responses revealed that older adults remembered fewer items at P2 with a high degree of certainty. Confidence ratings given at P3 suggested that young and older adults derived equivalent benefits from the spacing between P1 and P2. Findings of this study support theoretical accounts that suggest that recursive reminding and/or item retrieval difficulty promote item retention in older adults.
The Social, Emotional and Behavioural Difficulties of Primary School Children with Poor Attendance Records

ERIC Educational Resources Information Center

Carroll, H. C. M.

2013-01-01

Two complementary studies of poor and better attenders are presented. To measure emotional and behavioural difficulties (EBD) different teacher-completed rating scales were employed, and to determine social difficulties, the studies used sociometry and some items from the scales. One study had a longitudinal design. It revealed that, after…
The diagnostic utility of separation anxiety disorder symptoms: An item response theory analysis

PubMed Central

Cooper-Vince, Christine E.; Emmert-Aronson, Benjamin O.; Pincus, Donna B.; Comer, Jonathan S.

2013-01-01

At present, it is not clear whether the current definition of separation anxiety disorder (SAD) is the optimal classification of developmentally inappropriate, severe, and interfering separation anxiety in youth. Much remains to be learned about the relative contributions of individual SAD symptoms for informing diagnosis. Two-parameter logistic Item Response Theory analyses were conducted on the eight core SAD symptoms in an outpatient anxiety sample of treatment-seeking children (N=359, 59.3% female, MAge=11.2) and their parents to determine the diagnostic utility of each of these symptoms. Analyses considered values of item threshold, which characterize the SAD severity level at which each symptom has a 50% chance of being endorsed, and item discrimination, which characterize how well each symptom distinguishes individuals with higher and lower levels of SAD. Distress related to separation and fear of being alone without major attachment figures showed the strongest discrimination properties and the lowest thresholds for being endorsed. In contrast, worry about harm befalling attachment figures showed the poorest discrimination properties, and nightmares about separation showed the highest threshold for being endorsed. Distress related to separation demonstrated crossing differential item functioning associated with age—at lower separation anxiety levels excessive fear at separation was more likely to be endorsed for children ≥9 years, whereas at higher levels this symptom was more likely to be endorsed by children <9 years. Implications are discussed for optimizing the taxonomy of SAD in youth. PMID:23963543
Fourteen years of progress testing in radiology residency training: experiences from The Netherlands.

PubMed

Rutgers, D R; van Raamt, F; van Lankeren, W; Ravesloot, C J; van der Gijp, A; Ten Cate, Th J; van Schaik, J P J

2018-05-01

To describe the development of the Dutch Radiology Progress Test (DRPT) for knowledge testing in radiology residency training in The Netherlands from its start in 2003 up to 2016. We reviewed all DRPTs conducted since 2003. We assessed key changes and events in the test throughout the years, as well as resident participation and dispensation for the DRPT, test reliability and discriminative power of test items. The DRPT has been conducted semi-annually since 2003, except for 2015 when one digital DRPT failed. Key changes in these years were improvements in test analysis and feedback, test digitalization (2013) and inclusion of test items on nuclear medicine (2016). From 2003 to 2016, resident dispensation rates increased (Pearson's correlation coefficient 0.74, P-value <0.01) to maximally 16 %. Cronbach´s alpha for test reliability varied between 0.83 and 0.93. The percentage of DRPT test items with negative item-rest-correlations, indicating relatively poor discriminative power, varied between 4 % and 11 %. Progress testing has proven feasible and sustainable in Dutch radiology residency training, keeping up with innovations in the radiological profession. Test reliability and discriminative power of test items have remained fair over the years, while resident dispensation rates have increased. • Progress testing allows for monitoring knowledge development from novice to senior trainee. • In postgraduate medical training, progress testing is used infrequently. • Progress testing is feasible and sustainable in radiology residency training.
The Effect of SSM Grading on Reliability When Residual Items Have No Discriminating Power.

ERIC Educational Resources Information Center

Kane, Michael T.; Moloney, James M.

Gilman and Ferry have shown that when the student's score on a multiple choice test is the total number of responses necessary to get all items correct, substantial increases in reliability can occur. In contrast, similar procedures giving partial credit on multiple choice items have resulted in relatively small gains in reliability. The analysis…
Sample Invariance of the Structural Equation Model and the Item Response Model: A Case Study.

ERIC Educational Resources Information Center

Breithaupt, Krista; Zumbo, Bruno D.

2002-01-01

Evaluated the sample invariance of item discrimination statistics in a case study using real data, responses of 10 random samples of 500 people to a depression scale. Results lend some support to the hypothesized superiority of a two-parameter item response model over the common form of structural equation modeling, at least when responses are…
Person Response Functions and the Definition of Units in the Social Sciences

ERIC Educational Resources Information Center

Engelhard, George, Jr.; Perkins, Aminah F.

2011-01-01

Humphry (this issue) has written a thought-provoking piece on the interpretation of item discrimination parameters as scale units in item response theory. One of the key features of his work is the description of an item response theory (IRT) model that he calls the logistic measurement function that combines aspects of two traditions in IRT that…
[Severe intimate partner violence risk prediction scale-revised].

PubMed

Echeburúa, Enrique; Amor, Pedro Javier; Loinaz, Ismael; de Corral, Paz

2010-11-01

The aim of this study was to describe the psychometric properties of the Severe Intimate Partner Violence Risk Prediction Scale and to revise it in order to ponderate the 20 items according to their discriminant capacity and to solve the missing item problem. The sample for this study consisted of 450 male batterers who were reported to the police station. The victims were classified as high-risk (18.2%), moderate-risk (45.8%) and low-risk (36%), depending on the cutoff scores in the original scale. Internal consistency (Cronbach's alpha=.72) and interrater reliability (r=.73) were acceptable. The point biserial correlation coefficient between each item and the corrected total score of the 20-item scale was calculated to determine the most discriminative items, which were associated with the context of intimate partner violence in the last month, with the male batterer's profile and with the victim's vulnerability. A revised scale (EPV-R) with new cutoff scores and indications on how to deal with the missing items were proposed in accordance with these results. This easy-to-use tool appears to be suitable to the requirements of criminal justice professionals and is intended for use in safety planning. Implications of these results for further research are discussed.
Final Sampling Bias in Haptic Judgments: How Final Touch Affects Decision-Making.

PubMed

Mitsuda, Takashi; Yoshioka, Yuichi

2018-01-01

When people make a choice between multiple items, they usually evaluate each item one after the other repeatedly. The effect of the order and number of evaluating items on one's choices is essential to understanding the decision-making process. Previous studies have shown that when people choose a favorable item from two items, they tend to choose the item that they evaluated last. This tendency has been observed regardless of sensory modalities. This study investigated the origin of this bias by using three experiments involving two-alternative forced-choice tasks using handkerchiefs. First, the bias appeared in a smoothness discrimination task, which indicates that the bias was not based on judgments of preference. Second, the handkerchief that was touched more often tended to be chosen more frequently in the preference task, but not in the smoothness discrimination task, indicating that a mere exposure effect enhanced the bias. Third, in the condition where the number of touches did not differ between handkerchiefs, the bias appeared when people touched a handkerchief they wanted to touch last, but not when people touched the handkerchief that was predetermined. This finding suggests a direct coupling between final voluntary touching and judgment.
Redress of Grievances.

ERIC Educational Resources Information Center

Davies, Helen C.; Davies, Robert E.

The status of women in higher education, sex discrimination, laws providing protection against sex descrimination, grievance procedures, and difficulties involved in filing complaints are addressed. Empirical evidence is cited that illustrates discrimination against women at the hiring level in the scientific academic community. To substantiate…
Quantifying traditional Chinese medicine patterns using modern test theory: an example of functional constipation.

PubMed

Shen, Minxue; Cui, Yuanwu; Hu, Ming; Xu, Linyong

2017-01-13

The study aimed to validate a scale to assess the severity of "Yin deficiency, intestine heat" pattern of functional constipation based on the modern test theory. Pooled longitudinal data of 237 patients with "Yin deficiency, intestine heat" pattern of constipation from a prospective cohort study were used to validate the scale. Exploratory factor analysis was used to examine the common factors of items. A multidimensional item response model was used to assess the scale with the presence of multidimensionality. The Cronbach's alpha ranged from 0.79 to 0.89, and the split-half reliability ranged from 0.67 to 0.79 at different measurements. Exploratory factor analysis identified two common factors, and all items had cross factor loadings. Bidimensional model had better goodness of fit than the unidimensional model. Multidimensional item response model showed that the all items had moderate to high discrimination parameters. Parameters indicated that the first latent trait signified intestine heat, while the second trait characterized Yin deficiency. Information function showed that items demonstrated highest discrimination power among patients with moderate to high level of disease severity. Multidimensional item response theory provides a useful and rational approach in validating scales for assessing the severity of patterns in traditional Chinese medicine.
Verbal Discrimination: Re-pairing, Language Frequency, and Associative Properties of the Stimuli

ERIC Educational Resources Information Center

Lovelace, Eugene A.; Bansal, Leslie

1973-01-01

The present paper reports the results of four experiments on verbal discrimination learning. These experiments manipulated the associative properties and the language frequency of stimuli, as well as the pairings of "right' and "wrong' items within a list. (Author)
Avoid Age Discrimination.

ERIC Educational Resources Information Center

Bernstein, Michael I.

1982-01-01

Steps a school board can take to minimize the risk of age discrimination suits include reviewing all written policies, forms, files, and collective bargaining agreements for age discriminatory items; preparing a detailed statistical analysis of the age of personnel; and reviewing reduction-in-force procedures. (Author/MLF)
Middle school students' reading comprehension of mathematical texts and algebraic equations

NASA Astrophysics Data System (ADS)

Duru, Adem; Koklu, Onder

2011-06-01

In this study, middle school students' abilities to translate mathematical texts into algebraic representations and vice versa were investigated. In addition, students' difficulties in making such translations and the potential sources for these difficulties were also explored. Both qualitative and quantitative methods were used to collect data for this study: questionnaire and clinical interviews. The questionnaire consisted of two general types of items: (1) selected-response (multiple-choice) items for which the respondent selects from multiple options and (2) open-ended items for which the respondent constructs a response. In order to further investigate the students' strategies while they were translating the given mathematical texts to algebraic equations and vice versa, five randomly chosen (n = 5) students were interviewed. Data were collected in the 2007-2008 school year from 185 middle-school students in five teachers' classrooms in three different schools in the city of Adıyaman, Turkey. After the analysis of data, it was found that students who participated in this study had difficulties in translating the mathematical texts into algebraic equations by using symbols. It was also observed that these students had difficulties in translating the symbolic representations into mathematical texts because of their weak reading comprehension. In addition, finding of this research revealed that students' difficulties in translating the given mathematical texts into symbolic representations or vice versa come from different sources.
Comparative analysis of three screening instruments for autism spectrum disorder in toddlers at high risk.

PubMed

Oosterling, Iris J; Swinkels, Sophie H; van der Gaag, Rutger Jan; Visser, Janne C; Dietz, Claudine; Buitelaar, Jan K

2009-06-01

Several instruments have been developed to screen for autism spectrum disorders (ASD) in high-risk populations. However, few studies compare different instruments in one sample. Data were gathered from the Early Screening of Autistic Traits Questionnaire, Social Communication Questionnaire, Communication and Symbolic Behavior Scales-Developmental Profile, Infant-Toddler Checklist and key items of the Checklist for Autism in Toddlers in 238 children (mean age = 29.6 months, SD = 6.4) at risk for ASD. Discriminative properties are compared in the whole sample and in two age groups separately (8-24 months and 25-44 months). No instrument or individual item shows satisfying power in discriminating ASD from non-ASD, but pros and cons of instruments and items are discussed and directions for future research are proposed.
Assessing organizational climate: psychometric properties of the CLIOR Scale.

PubMed

Peña-Suárez, Elsa; Muñiz, José; Campillo-Álvarez, Angela; Fonseca-Pedrero, Eduardo; García-Cueto, Eduardo

2013-02-01

Organizational climate is the set of perceptions shared by workers who occupy the same workplace. The main goal of this study is to develop a new organizational climate scale and to determine its psychometric properties. The sample consisted of 3,163 Health Service workers. A total of 88.7% of participants worked in hospitals, and 11.3% in primary care; 80% were women and 20% men, with a mean age of 51.9 years (SD= 6.28). The proposed scale consists of 50 Likert-type items, with an alpha coefficient of 0.97, and an essentially one-dimensional structure. The discrimination indexes of the items are greater than 0.40, and the items show no differential item functioning in relation to participants' sex. A short version of the scale was developed, made up of 15 items, with discrimination indexes higher than 0.40, an alpha coefficient of 0.94, and its structure was clearly one-dimensional. These results indicate that the new scale has adequate psychometric properties, allowing a reliable and valid assessment of organizational climate.

Secondary Psychometric Examination of the Dimensional Obsessive-Compulsive Scale: Classical Testing, Item Response Theory, and Differential Item Functioning.

PubMed

Thibodeau, Michel A; Leonard, Rachel C; Abramowitz, Jonathan S; Riemann, Bradley C

2015-12-01

The Dimensional Obsessive-Compulsive Scale (DOCS) is a promising measure of obsessive-compulsive disorder (OCD) symptoms but has received minimal psychometric attention. We evaluated the utility and reliability of DOCS scores. The study included 832 students and 300 patients with OCD. Confirmatory factor analysis supported the originally proposed four-factor structure. DOCS total and subscale scores exhibited good to excellent internal consistency in both samples (α = .82 to α = .96). Patient DOCS total scores reduced substantially during treatment (t = 16.01, d = 1.02). DOCS total scores discriminated between students and patients (sensitivity = 0.76, 1 - specificity = 0.23). The measure did not exhibit gender-based differential item functioning as tested by Mantel-Haenszel chi-square tests. Expected response options for each item were plotted as a function of item response theory and demonstrated that DOCS scores incrementally discriminate OCD symptoms ranging from low to extremely high severity. Incremental differences in DOCS scores appear to represent unbiased and reliable differences in true OCD symptom severity. © The Author(s) 2014.
Optimization of perceptual learning: effects of task difficulty and external noise in older adults.

PubMed

DeLoss, Denton J; Watanabe, Takeo; Andersen, George J

2014-06-01

Previous research has shown a wide array of age-related declines in vision. The current study examined the effects of perceptual learning (PL), external noise, and task difficulty in fine orientation discrimination with older individuals (mean age 71.73, range 65-91). Thirty-two older subjects participated in seven 1.5-h sessions conducted on separate days over a three-week period. A two-alternative forced choice procedure was used in discriminating the orientation of Gabor patches. Four training groups were examined in which the standard orientations for training were either easy or difficult and included either external noise (additive Gaussian noise) or no external noise. In addition, the transfer to an untrained orientation and noise levels were examined. An analysis of the four groups prior to training indicated no significant differences between the groups. An analysis of the change in performance post-training indicated that the degree of learning was related to task difficulty and the presence of external noise during training. In addition, measurements of pupil diameter indicated that changes in orientation discrimination were not associated with changes in retinal illuminance. These results suggest that task difficulty and training in noise are factors important for optimizing the effects of training among older individuals. Copyright © 2013 Elsevier B.V. All rights reserved.
Dorso-Lateral Frontal Cortex of the Ferret Encodes Perceptual Difficulty during Visual Discrimination

PubMed Central

Zhou, Zhe Charles; Yu, Chunxiu; Sellers, Kristin K.; Fröhlich, Flavio

2016-01-01

Visual discrimination requires sensory processing followed by a perceptual decision. Despite a growing understanding of visual areas in this behavior, it is unclear what role top-down signals from prefrontal cortex play, in particular as a function of perceptual difficulty. To address this gap, we investigated how neurons in dorso-lateral frontal cortex (dl-FC) of freely-moving ferrets encode task variables in a two-alternative forced choice visual discrimination task with high- and low-contrast visual input. About two-thirds of all recorded neurons in dl-FC were modulated by at least one of the two task variables, task difficulty and target location. More neurons in dl-FC preferred the hard trials; no such preference bias was found for target location. In individual neurons, this preference for specific task types was limited to brief epochs. Finally, optogenetic stimulation confirmed the functional role of the activity in dl-FC before target touch; suppression of activity in pyramidal neurons with the ArchT silencing opsin resulted in a decrease in reaction time to touch the target but not to retrieve reward. In conclusion, dl-FC activity is differentially recruited for high perceptual difficulty in the freely-moving ferret and the resulting signal may provide top-down behavioral inhibition. PMID:27025995
Dorso-Lateral Frontal Cortex of the Ferret Encodes Perceptual Difficulty during Visual Discrimination.

PubMed

Zhou, Zhe Charles; Yu, Chunxiu; Sellers, Kristin K; Fröhlich, Flavio

2016-03-30

Visual discrimination requires sensory processing followed by a perceptual decision. Despite a growing understanding of visual areas in this behavior, it is unclear what role top-down signals from prefrontal cortex play, in particular as a function of perceptual difficulty. To address this gap, we investigated how neurons in dorso-lateral frontal cortex (dl-FC) of freely-moving ferrets encode task variables in a two-alternative forced choice visual discrimination task with high- and low-contrast visual input. About two-thirds of all recorded neurons in dl-FC were modulated by at least one of the two task variables, task difficulty and target location. More neurons in dl-FC preferred the hard trials; no such preference bias was found for target location. In individual neurons, this preference for specific task types was limited to brief epochs. Finally, optogenetic stimulation confirmed the functional role of the activity in dl-FC before target touch; suppression of activity in pyramidal neurons with the ArchT silencing opsin resulted in a decrease in reaction time to touch the target but not to retrieve reward. In conclusion, dl-FC activity is differentially recruited for high perceptual difficulty in the freely-moving ferret and the resulting signal may provide top-down behavioral inhibition.
The perceptual learning of time-compressed speech: A comparison of training protocols with different levels of difficulty

PubMed Central

Gabay, Yafit; Karni, Avi; Banai, Karen

2017-01-01

Speech perception can improve substantially with practice (perceptual learning) even in adults. Here we compared the effects of four training protocols that differed in whether and how task difficulty was changed during a training session, in terms of the gains attained and the ability to apply (transfer) these gains to previously un-encountered items (tokens) and to different talkers. Participants trained in judging the semantic plausibility of sentences presented as time-compressed speech and were tested on their ability to reproduce, in writing, the target sentences; trail-by-trial feedback was afforded in all training conditions. In two conditions task difficulty (low or high compression) was kept constant throughout the training session, whereas in the other two conditions task difficulty was changed in an adaptive manner (incrementally from easy to difficult, or using a staircase procedure). Compared to a control group (no training), all four protocols resulted in significant post-training improvement in the ability to reproduce the trained sentences accurately. However, training in the constant-high-compression protocol elicited the smallest gains in deciphering and reproducing trained items and in reproducing novel, untrained, items after training. Overall, these results suggest that training procedures that start off with relatively little signal distortion (“easy” items, not far removed from standard speech) may be advantageous compared to conditions wherein severe distortions are presented to participants from the very beginning of the training session. PMID:28545039
Adaptive Mental Testing: The State of the Art

DTIC Science & Technology

1979-11-01

typically vary in their psychometric properties --particularly in their difficulty--the test designer must decide what configuration of these item...psychometric properties best suits the test’s purpose. There are two extreme ration- ales to guide that decision. One rationale is to choose items that are...development of item response theory (Rasch, 1960; Lord, 1952, 1970, 1974a; Birnbaum, 1968) that provided the needed invariance properties for item
An Empirical Bayes Approach to Item Banking. Project Psychometric Aspects of Item Banking No. 6. Research Report 86-6.

ERIC Educational Resources Information Center

van der Linden, Wim J.; Eggen, Theo J. H. M.

A procedure for the sequential optimization of the calibration of an item bank is given. The procedure is based on an empirical Bayes approach to a reformulation of the Rasch model as a model for paired comparisons between the difficulties of test items in which ties are allowed to occur. First, it is indicated how a paired-comparisons design…
Assessment of item-writing flaws in multiple-choice questions.

PubMed

Nedeau-Cayo, Rosemarie; Laughlin, Deborah; Rus, Linda; Hall, John

2013-01-01

This study evaluated the quality of multiple-choice questions used in a hospital's e-learning system. Constructing well-written questions is fraught with difficulty, and item-writing flaws are common. Study results revealed that most items contained flaws and were written at the knowledge/comprehension level. Few items had linked objectives, and no association was found between the presence of objectives and flaws. Recommendations include education for writing test questions.
Measuring Black Men’s Police-Based Discrimination Experiences: Development and Validation of the Police and Law Enforcement (PLE) Scale

PubMed Central

English, Devin; Bowleg, Lisa; del Río-González, Ana Maria; Tschann, Jeanne M.; Agans, Robert; Malebranche, David J

2017-01-01

Objectives Although social science research has examined police and law enforcement-perpetrated discrimination against Black men using policing statistics and implicit bias studies, there is little quantitative evidence detailing this phenomenon from the perspective of Black men. Consequently, there is a dearth of research detailing how Black men’s perspectives on police and law enforcement-related stress predict negative physiological and psychological health outcomes. This study addresses these gaps with the qualitative development and quantitative test of the Police and Law Enforcement (PLE) scale. Methods In Study 1, we employed thematic analysis on transcripts of individual qualitative interviews with 90 Black men to assess key themes and concepts and develop quantitative items. In Study 2, we used 2 focus groups comprised of 5 Black men each (n=10), intensive cognitive interviewing with a separate sample of Black men (n=15), and piloting with another sample of Black men (n=13) to assess the ecological validity of the quantitative items. For study 3, we analyzed data from a sample of 633 Black men between the ages of 18 and 65 to test the factor structure of the PLE, as we all as its concurrent validity and convergent/discriminant validity. Results Qualitative analyses and confirmatory factor analyses suggested that a 5-item, 1-factor measure appropriately represented respondents’ experiences of police/law enforcement discrimination. As hypothesized, the PLE was positively associated with measures of racial discrimination and depressive symptoms. Conclusions Preliminary evidence suggests that the PLE is a reliable and valid measure of Black men’s experiences of discrimination with police/law enforcement. PMID:28080104
Measuring Black men's police-based discrimination experiences: Development and validation of the Police and Law Enforcement (PLE) Scale.

PubMed

English, Devin; Bowleg, Lisa; Del Río-González, Ana Maria; Tschann, Jeanne M; Agans, Robert P; Malebranche, David J

2017-04-01

Although social science research has examined police and law enforcement-perpetrated discrimination against Black men using policing statistics and implicit bias studies, there is little quantitative evidence detailing this phenomenon from the perspective of Black men. Consequently, there is a dearth of research detailing how Black men's perspectives on police and law enforcement-related stress predict negative physiological and psychological health outcomes. This study addresses these gaps with the qualitative development and quantitative test of the Police and Law Enforcement (PLE) Scale. In Study 1, we used thematic analysis on transcripts of individual qualitative interviews with 90 Black men to assess key themes and concepts and develop quantitative items. In Study 2, we used 2 focus groups comprised of 5 Black men each (n = 10), intensive cognitive interviewing with a separate sample of Black men (n = 15), and piloting with another sample of Black men (n = 13) to assess the ecological validity of the quantitative items. For Study 3, we analyzed data from a sample of 633 Black men between the ages of 18 and 65 to test the factor structure of the PLE, as we all as its concurrent validity and convergent/discriminant validity. Qualitative analyses and confirmatory factor analyses suggested that a 5-item, 1-factor measure appropriately represented respondents' experiences of police/law enforcement discrimination. As hypothesized, the PLE was positively associated with measures of racial discrimination and depressive symptoms. Preliminary evidence suggests that the PLE is a reliable and valid measure of Black men's experiences of discrimination with police/law enforcement. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Item-Based Top-N Recommendation Algorithms

DTIC Science & Technology

2003-01-20

basket of items, utilized by many e-commerce sites, cannot take advantage of pre-computed user-to-user similarities. Finally, even though the...not discriminate between items that are present in frequent itemsets and items that are not, while still maintaining the computational advantages of...453219 0.02% 7.74 ccard 42629 68793 398619 0.01% 9.35 ecommerce 6667 17491 91222 0.08% 13.68 em 8002 1648 769311 5.83% 96.14 ml 943 1682 100000 6.31
Testing item response theory invariance of the standardized Quality-of-life Disease Impact Scale (QDIS(®)) in acute coronary syndrome patients: differential functioning of items and test.

PubMed

Deng, Nina; Anatchkova, Milena D; Waring, Molly E; Han, Kyung T; Ware, John E

2015-08-01

The Quality-of-life (QOL) Disease Impact Scale (QDIS(®)) standardizes the content and scoring of QOL impact attributed to different diseases using item response theory (IRT). This study examined the IRT invariance of the QDIS-standardized IRT parameters in an independent sample. The differential functioning of items and test (DFIT) of a static short-form (QDIS-7) was examined across two independent sources: patients hospitalized for acute coronary syndrome (ACS) in the TRACE-CORE study (N = 1,544) and chronically ill US adults in the QDIS standardization sample. "ACS-specific" IRT item parameters were calibrated and linearly transformed to compare to "standardized" IRT item parameters. Differences in IRT model-expected item, scale and theta scores were examined. The DFIT results were also compared in a standard logistic regression differential item functioning analysis. Item parameters estimated in the ACS sample showed lower discrimination parameters than the standardized discrimination parameters, but only small differences were found for thresholds parameters. In DFIT, results on the non-compensatory differential item functioning index (range 0.005-0.074) were all below the threshold of 0.096. Item differences were further canceled out at the scale level. IRT-based theta scores for ACS patients using standardized and ACS-specific item parameters were highly correlated (r = 0.995, root-mean-square difference = 0.09). Using standardized item parameters, ACS patients scored one-half standard deviation higher (indicating greater QOL impact) compared to chronically ill adults in the standardization sample. The study showed sufficient IRT invariance to warrant the use of standardized IRT scoring of QDIS-7 for studies comparing the QOL impact attributed to acute coronary disease and other chronic conditions.
Spatial short-term memory in children with nonverbal learning disabilities: impairment in encoding spatial configuration.

PubMed

Narimoto, Tadamasa; Matsuura, Naomi; Takezawa, Tomohiro; Mitsuhashi, Yoshinori; Hiratani, Michio

2013-01-01

The authors investigated whether impaired spatial short-term memory exhibited by children with nonverbal learning disabilities is due to a problem in the encoding process. Children with or without nonverbal learning disabilities performed a simple spatial test that required them to remember 3, 5, or 7 spatial items presented simultaneously in random positions (i.e., spatial configuration) and to decide if a target item was changed or all items including the target were in the same position. The results showed that, even when the spatial positions in the encoding and probe phases were similar, the mean proportion correct of children with nonverbal learning disabilities was 0.58 while that of children without nonverbal learning disabilities was 0.84. The authors argue with the results that children with nonverbal learning disabilities have difficulty encoding relational information between spatial items, and that this difficulty is responsible for their impaired spatial short-term memory.
Development, Validation, and Application of the Microbiology Concept Inventory †

PubMed Central

Paustian, Timothy D.; Briggs, Amy G.; Brennan, Robert E.; Boury, Nancy; Buchner, John; Harris, Shannon; Horak, Rachel E. A.; Hughes, Lee E.; Katz-Amburn, D. Sue; Massimelli, Maria J.; McDonald, Ann H.; Primm, Todd P.; Smith, Ann C.; Stevens, Ann M.; Yung, Sunny B.

2017-01-01

If we are to teach effectively, tools are needed to measure student learning. A widely used method for quickly measuring student understanding of core concepts in a discipline is the concept inventory (CI). Using the American Society for Microbiology Curriculum Guidelines (ASMCG) for microbiology, faculty from 11 academic institutions created and validated a new microbiology concept inventory (MCI). The MCI was developed in three phases. In phase one, learning outcomes and fundamental statements from the ASMCG were used to create T/F questions coupled with open responses. In phase two, the 743 responses to MCI 1.0 were examined to find the most common misconceptions, which were used to create distractors for multiple-choice questions. MCI 2.0 was then administered to 1,043 students. The responses of these students were used to create MCI 3.0, a 23-question CI that measures students’ understanding of all 27 fundamental statements. MCI 3.0 was found to be reliable, with a Cronbach’s alpha score of 0.705 and Ferguson’s delta of 0.97. Test item analysis demonstrated good validity and discriminatory power as judged by item difficulty, item discrimination, and point-biserial correlation coefficient. Comparison of pre- and posttest scores showed that microbiology students at 10 institutions showed an increase in understanding of concepts after instruction, except for questions probing metabolism (average normalized learning gain was 0.15). The MCI will enable quantitative analysis of student learning gains in understanding microbiology, help to identify misconceptions, and point toward areas where efforts should be made to develop teaching approaches to overcome them. PMID:29854042
Bayesian inference in an item response theory model with a generalized student t link function

NASA Astrophysics Data System (ADS)

Azevedo, Caio L. N.; Migon, Helio S.

2012-10-01

In this paper we introduce a new item response theory (IRT) model with a generalized Student t-link function with unknown degrees of freedom (df), named generalized t-link (GtL) IRT model. In this model we consider only the difficulty parameter in the item response function. GtL is an alternative to the two parameter logit and probit models, since the degrees of freedom (df) play a similar role to the discrimination parameter. However, the behavior of the curves of the GtL is different from those of the two parameter models and the usual Student t link, since in GtL the curve obtained from different df's can cross the probit curves in more than one latent trait level. The GtL model has similar proprieties to the generalized linear mixed models, such as the existence of sufficient statistics and easy parameter interpretation. Also, many techniques of parameter estimation, model fit assessment and residual analysis developed for that models can be used for the GtL model. We develop fully Bayesian estimation and model fit assessment tools through a Metropolis-Hastings step within Gibbs sampling algorithm. We consider a prior sensitivity choice concerning the degrees of freedom. The simulation study indicates that the algorithm recovers all parameters properly. In addition, some Bayesian model fit assessment tools are considered. Finally, a real data set is analyzed using our approach and other usual models. The results indicate that our model fits the data better than the two parameter models.
HoNOSCA-D As a Measure of the Severity of Diagnosed Mental Disorders in Children and Adolescents—Psychometric Properties of the German Translation

PubMed Central

von Wyl, Agnes; Toggweiler, Stephan; Zollinger, Ruedi

2017-01-01

The Health of the Nation Outcome Scales for Children and Adolescents (HoNOSCA), in use worldwide, is a 13-item measure assessing the biopsychosocial severity of mental health problems in children and adolescents. This article introduces the authorized German-language version of HoNOSCA, the HoNOSCA-D, and examines and discusses its psychometric properties based on a clinical sample of 1,533 children and adolescents aged 4;0 to 17;11 years. For the HoNOSCA-D total score (severity of mental health problems), internal consistency (Cronbach’s alpha) was 0.63. The discriminative power of the items ranged from 0.07 to 0.44; the average interitem correlation was 0.11. Due to this stochastic independence, calculation of a total severity index is acceptable. Using factor analysis, the principal axis factoring and varimax rotation resulted in a four-factor structure, which with a Kaiser–Meyer–Olkin measure of sampling adequacy of 0.684 explained 30.62% of total variance. The convergent correlations with the German-language parent report version of the Strengths and Difficulties Questionnaire were as expected and showed a medium effect size. Gender and age differences in the HoNOSCA-D total score were small. Regarding the 13 items gender and age differences were negligible to medium. The highest severity was found for schizophrenia and psychotic disorders, followed by affective disorders and social behavior disorders. Overall, validity of HoNOSCA-D was clearly supported. PMID:29033858
Pragmatic Difficulties in the Production of the Speech Act of Apology by Iraqi EFL Learners

ERIC Educational Resources Information Center

Al-Ghazalli, Mehdi Falih; Al-Shammary, Mohanad A. Amert

2014-01-01

The purpose of this paper is to investigate the pragmatic difficulties encountered by Iraqi EFL university students in producing the speech act of apology. Although the act of apology is easy to recognize or use by native speakers of English, non-native speakers generally encounter difficulties in discriminating one speech act from another. The…
Application of Computerized Adaptive Testing to Entrance Examination for Graduate Studies in Turkey

ERIC Educational Resources Information Center

Bulut, Okan; Kan, Adnan

2012-01-01

Problem Statement: Computerized adaptive testing (CAT) is a sophisticated and efficient way of delivering examinations. In CAT, items for each examinee are selected from an item bank based on the examinee's responses to the items. In this way, the difficulty level of the test is adjusted based on the examinee's ability level. Instead of…
Examining the Invariance of Rater and Project Calibrations Using a Multi-facet Rasch Model.

ERIC Educational Resources Information Center

O'Neill, Thomas R.; Lunz, Mary E.

To generalize test results beyond the particular test administration, an examinee's ability estimate must be independent of the particular items attempted, and the item difficulty calibrations must be independent of the particular sample of people attempting the items. This stability is a key concept of the Rasch model, a latent trait model of…
Rasch Based Analysis of Oral Proficiency Test Data.

ERIC Educational Resources Information Center

Nakamura, Yuji

2001-01-01

This paper examines the rating scale data of oral proficiency tests analyzed by a Rasch Analysis focusing on an item map and factor analysis. In discussing the item map, the difficulty order of six items and students' answering patterns are analyzed using descriptive statistics and measures of central tendency of test scores. The data ranks the…

Application of Item Analysis to Assess Multiple-Choice Examinations in the Mississippi Master Cattle Producer Program

ERIC Educational Resources Information Center

Parish, Jane A.; Karisch, Brandi B.

2013-01-01

Item analysis can serve as a useful tool in improving multiple-choice questions used in Extension programming. It can identify gaps between instruction and assessment. An item analysis of Mississippi Master Cattle Producer program multiple-choice examination responses was performed to determine the difficulty of individual examinations, assess the…
Exploring the Manifestations of Anxiety in Children with Autism Spectrum Disorders

ERIC Educational Resources Information Center

Hallett, Victoria; Lecavalier, Luc; Sukhodolsky, Denis G.; Cipriano, Noreen; Aman, Michael G.; McCracken, James T.; McDougle, Christopher J.; Tierney, Elaine; King, Bryan H.; Hollander, Eric; Sikich, Linmarie; Bregman, Joel; Anagnostou, Evdokia; Donnelly, Craig; Katsovich, Lily; Dukes, Kimberly; Vitiello, Benedetto; Gadow, Kenneth; Scahill, Lawrence

2013-01-01

This study explores the manifestation and measurement of anxiety symptoms in 415 children with ASDs on a 20-item, parent-rated, DSM-IV referenced anxiety scale. In both high and low-functioning children (IQ above vs. below 70), commonly endorsed items assessed restlessness, tension and sleep difficulties. Items requiring verbal expression of worry…
Sensitivity of Equated Aggregate Scores to the Treatment of Misbehaving Common Items

ERIC Educational Resources Information Center

Michaelides, Michalis P.

2010-01-01

The delta-plot method (Angoff, 1972) is a graphical technique used in the context of test equating for identifying common items with aberrant changes in their item difficulties across administrations or alternate forms. This brief research report explores the effects on equated aggregate scores when delta-plot outliers are either retained in or…
Associations Between Multiple Forms of Discrimination and Tobacco Use Among People Living With HIV: The Mediating Role of Avoidance Coping.

PubMed

Crockett, Kaylee B; Rice, Whitney S; Turan, Bulent

2018-05-01

People living with HIV (PLWH) have higher levels of tobacco use compared with the general population, increasing their risk of morbidity and mortality. PLWH also face potential chronic stressors related to the stigma and discrimination associated with HIV and other characteristics (eg, race and sexual orientation). These experiences may be associated with harmful health behaviors, such as tobacco use. The purpose of the current study is to explore the psychosocial context of tobacco use in PLWH, examining avoidance coping as a mediator in the relationship between multiple forms of discrimination and tobacco use. Participants included 202 PLWH recruited from an HIV primary care clinic in Birmingham, AL, between 2013 and 2015. Participants responded to parallel items assessing experiences of discrimination related to HIV status, race, and sexual orientation, as well as items assessing avoidance coping. Data on current tobacco use were obtained from participants' clinic records. Mediation models for each form of discrimination (HIV, race and sexual orientation) adjusting for demographic variables and the other forms of discrimination were evaluated. The indirect effect of HIV-related discrimination on likelihood of tobacco use through avoidance coping was significant, suggesting that avoidance coping mediates the association between HIV-related discrimination and tobacco use. However, the indirect effects of the other forms of discrimination were not significant. Given the disparity in tobacco use in PLWH, behavioral scientists and interventionists should consider including content specific to coping with experiences of discrimination in tobacco prevention and cessation programs for PLWH.
[Comment on] Statistical discrimination

NASA Astrophysics Data System (ADS)

Chinn, Douglas

In the December 8, 1981, issue of Eos, a news item reported the conclusion of a National Research Council study that sexual discrimination against women with Ph.D.'s exists in the field of geophysics. Basically, the item reported that even when allowances are made for motherhood the percentage of female Ph.D.'s holding high university and corporate positions is significantly lower than the percentage of male Ph.D.'s holding the same types of positions. The sexual discrimination conclusion, based only on these statistics, assumes that there are no basic psychological differences between men and women that might cause different populations in the employment group studied. Therefore, the reasoning goes, after taking into account possible effects from differences related to anatomy, such as women stopping their careers in order to bear and raise children, the statistical distributions of positions held by male and female Ph.D.'s ought to be very similar to one another. Any significant differences between the distributions must be caused primarily by sexual discrimination.
Using listening difficulty ratings of conditions for speech communication in rooms

NASA Astrophysics Data System (ADS)

Sato, Hiroshi; Bradley, John S.; Morimoto, Masayuki

2005-03-01

The use of listening difficulty ratings of speech communication in rooms is explored because, in common situations, word recognition scores do not discriminate well among conditions that are near to acceptable. In particular, the benefits of early reflections of speech sounds on listening difficulty were investigated and compared to the known benefits to word intelligibility scores. Listening tests were used to assess word intelligibility and perceived listening difficulty of speech in simulated sound fields. The experiments were conducted in three types of sound fields with constant levels of ambient noise: only direct sound, direct sound with early reflections, and direct sound with early reflections and reverberation. The results demonstrate that (1) listening difficulty can better discriminate among these conditions than can word recognition scores; (2) added early reflections increase the effective signal-to-noise ratio equivalent to the added energy in the conditions without reverberation; (3) the benefit of early reflections on difficulty scores is greater than expected from the simple increase in early arriving speech energy with reverberation; (4) word intelligibility tests are most appropriate for conditions with signal-to-noise (S/N) ratios less than 0 dBA, and where S/N is between 0 and 15-dBA S/N, listening difficulty is a more appropriate evaluation tool. .
The measurement of threat orientations.

PubMed

Thompson, Suzanne C; Schlehofer, Michèle M; Bovin, Michelle J

2006-01-01

To develop measures of 3 threat orientations that affect responses to health behavior messages. In Study 1, college students (N = 47) completed items assessing threat orientations and health behaviors. In Study 2, college students and community adults (N = 110) completed the threat orientation items and measures of convergent and discriminant validity. In Study 1, the control-based, denial-based, and heightened-sensitivity-based threat orientation scales demonstrated good internal consistency and correlated with engagement in health behaviors. In Study 2, the convergent and discriminant validity of the 3 measures was established. The 3 scales have good internal reliability and construct validity.
Using an analytical hierarchy process (AHP) for weighting items of a measurement scale: a pilot study.

PubMed

Benaïm, C; Perennou, D-A; Pelissier, J-Y; Daures, J-P

2010-02-01

Many clinical scales contain items that are scored separately prior to being compiled into a single score. However, if the items have different degrees of importance, they should be weighted differently before being compiled. The principal aims of this study were to show how the "analytic hierarchy process" (AHP), which has never been used for this purpose, can be applied to weighting the six items of the "London handicap scale", and to compare the AHP to the "conjoint analysis" (CA), which was previously implemented by Harwood et al. (1994) [1]. In order to assess the relative importance of the six items, we submitted AHP and CA to a group of 10 physiatrists. We compared the methods in terms of item ranking according to importance, assessment of fictitious patients based on weights determined by each method, and perceived difficulty by the physiatrist. For both techniques, "Physical independence" (PHY) was the best-weighted item, but other ranks varied depending on the technique. AHP was better than CA in terms of accuracy (global assessment of the clinical status) and perceived difficulty. AHP may be used to reveal the importance that experts assign to the items of a multidimensional scale, and to calculate the appropriate weights for specific items. For this purpose, AHP seems to be more accurate than CA.
Validity of a Protocol for Adult Self-Report of Dyslexia and Related Difficulties

ERIC Educational Resources Information Center

Snowling, Margaret; Dawes, Piers; Nash, Hannah; Hulme, Charles

2012-01-01

Background: There is an increased prevalence of reading and related difficulties in children of dyslexic parents. In order to understand the causes of these difficulties, it is important to quantify the risk factors passed from parents to their offspring. Method: 417 adults completed a protocol comprising a 15-item questionnaire rating reading and…
Type I Error Inflation in DIF Identification with Mantel-Haenszel: An Explanation and a Solution

ERIC Educational Resources Information Center

Magis, David; De Boeck, Paul

2014-01-01

It is known that sum score-based methods for the identification of differential item functioning (DIF), such as the Mantel-Haenszel (MH) approach, can be affected by Type I error inflation in the absence of any DIF effect. This may happen when the items differ in discrimination and when there is item impact. On the other hand, outlier DIF methods…
Social Loafing Construct Validity in Higher Education: How Well Do Three Measures of Social Loafing Stand up to Scrutiny?

ERIC Educational Resources Information Center

de l'Eau, Jacquelyn

2017-01-01

The purpose of this study was to examine the construct validity of social loafing using convergent and discriminant validity principles. Three instruments that purport to measure social loafing were factor analyzed: A ten-item instrument by George (1992), a 13-item instrument by Mulvey and Klein (1998), and a 22-item instrument by Jassawalla,…
The impact of racial discrimination on the health of Australian Indigenous children aged 5-10 years: analysis of national longitudinal data.

PubMed

Shepherd, Carrington C J; Li, Jianghong; Cooper, Matthew N; Hopkins, Katrina D; Farrant, Brad M

2017-07-03

A growing body of literature highlights that racial discrimination has negative impacts on child health, although most studies have been limited to an examination of direct forms of racism using cross-sectional data. We aim to provide further insights on the impact of early exposure to racism on child health using longitudinal data among Indigenous children in Australia and multiple indicators of racial discrimination. We used data on 1239 Indigenous children aged 5-10 years from Waves 1-6 (2008-2013) of Footprints in Time, a longitudinal study of Indigenous children across Australia. We examined associations between three dimensions of carer-reported racial discrimination (measuring the direct experiences of children and vicarious exposure by their primary carer and family) and a range of physical and mental health outcomes. Analysis was conducted using multivariate logistic regression within a multilevel framework. Two-fifths (40%) of primary carers, 45% of families and 14% of Indigenous children aged 5-10 years were reported to have experienced racial discrimination at some point in time, with 28-40% of these experiencing it persistently (reported at multiple time points). Primary carer and child experiences of racial discrimination were each associated with poor child mental health status (high risk of clinically significant emotional or behavioural difficulties), sleep difficulties, obesity and asthma, but not with child general health or injury. Children exposed to persistent vicarious racial discrimination were more likely to have sleep difficulties and asthma in multivariate models than those with a time-limited exposure. The findings indicate that direct and persistent vicarious racial discrimination are detrimental to the physical and mental health of Indigenous children in Australia, and suggest that prolonged and more frequent exposure to racial discrimination that starts in the early lifecourse can impact on multiple domains of health in later life. Tackling and reducing racism should be an integral part of policy and intervention aimed at improving the health of Australian Indigenous children and thereby reducing health disparities between Indigenous and non-Indigenous children.
[The influence of perceived discrimination on health in migrants].

PubMed

Igel, Ulrike; Brähler, Elmar; Grande, Gesine

2010-05-01

The aim of the study was to investigate the influence of racial discrimination on subjective health in migrants. The sample included 1.844 migrants from the SOEP. Discrimination was assessed by two items. Socioeconomic status, country of origin, and health behavior were included in multivariate regression models to control for effects on health. Differential models with regard to gender and origin were analysed. Migrants who experienced discrimination report a worse health status. Discrimination determines mental and physical health of migrants. There are differences in models due to gender and origin. In addition to socioeconomic factors experienced discrimination should be taken into account as a psycho-social stressor of migrants.
TEDS-M 2008 User Guide for the International Database. Supplement 4: TEDS-M Released Mathematics and Mathematics Pedagogy Knowledge Assessment Items

ERIC Educational Resources Information Center

Brese, Falk, Ed.

2012-01-01

The goal for selecting the released set of test items was to have approximately 25% of each of the full item sets for mathematics content knowledge (MCK) and mathematics pedagogical content knowledge (MPCK) that would represent the full range of difficulty, content, and item format used in the TEDS-M study. The initial step in the selection was to…
The hierarchy of the activities of daily living in the Katz index in residents of skilled nursing facilities.

PubMed

Gerrard, Paul

2013-01-01

Nursing facility patients are a population that has not been well studied with regard to functional status and independence previously. As such, the manner in which activities of daily living (ADL) relate to one another is not well understood in this population. An understanding of ADL difficulty ordering has helped to devise systems of functional independence grading in other populations, which have value in understanding patients' global levels of independence and providing expectations regarding changes in function. This study seeks to examine the hierarchy of ADL in the nursing facility population. Data were analyzed from the 2004 National Nursing Home Survey, a cross-sectional data set of 13 507 skilled nursing facility subjects with functional independence items. The ADL difficulty hierarchy was determined using Rasch analysis. Item fit values for the Rasch model using Mean-Square infit statistics were also determined. The robustness of the hierarchy was tested for each ADL. Two grading systems were devised from the results of the item difficulty ordering. One was based on the most difficult item that he or she could perform, and the other assigned a grade based on the least difficult item that a subject could not perform. A total of 13 113 patients were included in this analysis, the majority of whom were female and white. They had an average age of 81 years. An ordered hierarchy of ADL was found with eating being the easiest and bathing the most difficult. All items in the Katz index fit the Rasch model adequately well. The majority of patients able to perform any particular ADL were also able to perform all easier ADL. Cohen's κ for the 2 grading systems was 0.73. This study is the first to show the expected hierarchy of difficulty of the 6 activities of daily proposed in the Katz index in the nursing facility population. The hierarchy found in this population matches the original hierarchy found in older adults in the community and acute care settings. It is also similar to hierarchy found in the inpatient rehabilitation setting. Patients would be expected to lose or gain function based on the order of difficulty, but this remains to be confirmed. Among the 6 activities of daily living tested here, their order from easiest to most difficult is eating, maintaining continence, transferring, toileting, dressing, and bathing. In addition, the index formed by these 6 items has construct validity in the nursing facility population.
The effect of perceived racial discrimination on bodily pain among older African American men.

PubMed

Burgess, Diana J; Grill, Joseph; Noorbaloochi, Siamak; Griffin, Joan M; Ricards, Jennifer; van Ryn, Michelle; Partin, Melissa R

2009-11-01

We examined the extent to which experiences of racial discrimination are associated with bodily pain reported by African American men. The study sample consisted of 393 African American male veterans who responded to a national survey of patients aged 50-75 who received care from the Veterans Health Administration (VHA). Veterans were surveyed by mail, with a telephone follow-up. The response rate for African Americans in the sample was 60.5%. Pain (assessed using the bodily pain subscale of the 36-item short-form health survey), experiences of discrimination, employment, education, and income were obtained through the survey. Age, race, and mental health comorbidities were obtained from VA administrative data. Multiple regression analysis adjusting for item non-response (via imputation) and unit non-response (via propensity scores and weighting) was used to assess the association between racial discrimination and likelihood of experiencing moderate or severe pain over the past 4 weeks. Experiences of racial discrimination were associated with greater bodily pain (beta = -0.25, P < 0.0001), even after controlling for socioeconomic and health-related characteristics. Perceived racial discrimination was associated with greater pain among a sample of older African American male patients in the VA. Additional research is needed to replicate this finding among other populations of African Americans.
Migration Experiences of Foreign Educated Nurses: A Systematic Review of the Literature.

PubMed

Moyce, Sally; Lash, Rebecca; de Leon Siantz, Mary Lou

2016-03-01

Global nurse migration has a recognized impact on host and source countries, but the lived experience of foreign educated nurses is an important aspect of the success of this migration. A systematic review of the literature was conducted to understand the lived migration and acculturation experiences of foreign educated nurses. A systematic review of the literature, based on Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, was conducted. Primary research articles or secondary analyses were selected based on keyword and citation-based searches (n = 44). Nurses' experiences included migration and licensing barriers, difficulty with communication, racism and discrimination, skill underutilization, acculturation, and the role of the family. Barriers encountered in host countries may impede acculturation and successful nursing practice, resulting in circular migration and poor patient safety outcomes. Social support systems and cultural orientation programs can mitigate the impacts of social isolation and racism. Addressing common barriers can help minimize deskilling and allow safe and effective transitions to host countries. © The Author(s) 2015.
Development and validation of the Overall Depression Severity and Impairment Scale.

PubMed

Bentley, Kate H; Gallagher, Matthew W; Carl, Jenna R; Barlow, David H

2014-09-01

The need to capture severity and impairment of depressive symptomatology is widespread. Existing depression scales are lengthy and largely focus on individual symptoms rather than resulting impairment. The Overall Depression Severity and Impairment Scale (ODSIS) is a 5-item, continuous measure designed for use across heterogeneous mood disorders and with subthreshold depressive symptoms. This study examined the psychometric properties of the ODSIS in outpatients in a clinic for emotional disorders (N = 100), undergraduate students (N = 566), and community-based adults (N = 189). Internal consistency, latent structure, item response theory, classification accuracy, convergent and discriminant validity, and differential item functioning analyses were conducted. ODSIS scores exhibited excellent internal consistency, and confirmatory factor analyses supported a unidimensional structure. Item response theory results demonstrated that the ODSIS provides more information about individuals with high levels of depression than those with low levels of depression. Responses on the ODSIS discriminated well between individuals with and without a mood disorder and depression-related severity across clinical and subclinical levels. A cut score of 8 correctly classified 82% of outpatients as with or without a mood disorder; it evidenced a favorable balance of sensitivity and specificity and of positive and negative predictive values. The ODSIS demonstrated good convergent and discriminant validity, and results indicate that items function similarly across clinical and nonclinical samples. Overall, findings suggest that the ODSIS is a valid tool for measuring depression-related severity and impairment. The brevity and ease of use of the ODSIS support its utility for screening and monitoring treatment response across a variety of settings. PsycINFO Database Record (c) 2014 APA, all rights reserved.
A Comparison of Different Psychometric Approaches to Modeling Testlet Structures: An Example with C-Tests

ERIC Educational Resources Information Center

Schroeders, Ulrich; Robitzsch, Alexander; Schipolowski, Stefan

2014-01-01

C-tests are a specific variant of cloze tests that are considered time-efficient, valid indicators of general language proficiency. They are commonly analyzed with models of item response theory assuming local item independence. In this article we estimated local interdependencies for 12 C-tests and compared the changes in item difficulties,…
Demonstrating the Difference between Classical Test Theory and Item Response Theory Using Derived Test Data

ERIC Educational Resources Information Center

Magno, Carlo

2009-01-01

The present report demonstrates the difference between classical test theory (CTT) and item response theory (IRT) approach using an actual test data for chemistry junior high school students. The CTT and IRT were compared across two samples and two forms of test on their item difficulty, internal consistency, and measurement errors. The specific…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.