item response patterns: Topics by Science.gov

Sample records for item response patterns

Relationship between Item Responses of Negative Affect Items and the Distribution of the Sum of the Item Scores in the General Population

PubMed Central

Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Furukawa, Toshiaki A.; Ono, Yutaka

2016-01-01

Background Several studies have shown that total depressive symptom scores in the general population approximate an exponential pattern, except for the lower end of the distribution. The Center for Epidemiologic Studies Depression Scale (CES-D) consists of 20 items, each of which may take on four scores: “rarely,” “some,” “occasionally,” and “most of the time.” Recently, we reported that the item responses for 16 negative affect items commonly exhibit exponential patterns, except for the level of “rarely,” leading us to hypothesize that the item responses at the level of “rarely” may be related to the non-exponential pattern typical of the lower end of the distribution. To verify this hypothesis, we investigated how the item responses contribute to the distribution of the sum of the item scores. Methods Data collected from 21,040 subjects who had completed the CES-D questionnaire as part of a Japanese national survey were analyzed. To assess the item responses of negative affect items, we used a parameter r, which denotes the ratio of “rarely” to “some” in each item response. The distributions of the sum of negative affect items in various combinations were analyzed using log-normal scales and curve fitting. Results The sum of the item scores approximated an exponential pattern regardless of the combination of items, whereas, at the lower end of the distributions, there was a clear divergence between the actual data and the predicted exponential pattern. At the lower end of the distributions, the sum of the item scores with high values of r exhibited higher scores compared to those predicted from the exponential pattern, whereas the sum of the item scores with low values of r exhibited lower scores compared to those predicted. Conclusions The distributional pattern of the sum of the item scores could be predicted from the item responses of such items. PMID:27806132
1999 Survey of Active Duty Personnel: Administration, Datasets, and Codebook. Appendix G: Frequency and Percentage Distributions for Variables in the Survey Analysis Files.

DTIC Science & Technology

2000-12-01

A SKIP FLAG INDICATING THE RESULT OF CHECKING THE RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP...RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP PATTERN. SEE TABLE D-5, NOTE 2, IN APPENDIX D. G-52...RESULT OF CHECKING THE RESPONSE ON THE PARENT (SCREENING) ITEM AGAINST THE RESPONSE(S) ON THE ITEMS WITHIN THE SKIP PATTERN. SEE TABLE D-5
Pattern analysis of total item score and item response of the Kessler Screening Scale for Psychological Distress (K6) in a nationally representative sample of US adults

PubMed Central

Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Yutaka, Ono; Furukawa, Toshiaki A.

2017-01-01

Background Several recent studies have shown that total scores on depressive symptom measures in a general population approximate an exponential pattern except for the lower end of the distribution. Furthermore, we confirmed that the exponential pattern is present for the individual item responses on the Center for Epidemiologic Studies Depression Scale (CES-D). To confirm the reproducibility of such findings, we investigated the total score distribution and item responses of the Kessler Screening Scale for Psychological Distress (K6) in a nationally representative study. Methods Data were drawn from the National Survey of Midlife Development in the United States (MIDUS), which comprises four subsamples: (1) a national random digit dialing (RDD) sample, (2) oversamples from five metropolitan areas, (3) siblings of individuals from the RDD sample, and (4) a national RDD sample of twin pairs. K6 items are scored using a 5-point scale: “none of the time,” “a little of the time,” “some of the time,” “most of the time,” and “all of the time.” The pattern of total score distribution and item responses were analyzed using graphical analysis and exponential regression model. Results The total score distributions of the four subsamples exhibited an exponential pattern with similar rate parameters. The item responses of the K6 approximated a linear pattern from “a little of the time” to “all of the time” on log-normal scales, while “none of the time” response was not related to this exponential pattern. Discussion The total score distribution and item responses of the K6 showed exponential patterns, consistent with other depressive symptom scales. PMID:28289560
Analysis of Item Response Patterns: Consistency Indices and Their Application to Criterion-Referenced Tests.

ERIC Educational Resources Information Center

Harnisch, Delwyn L.

The major emphasis of this paper is in the examination of test item response patterns. Tatsuoka and Tatsuoka (1980) have developed two indices of response consistency: the norm-conformity index (NCI) and the individual consistency index (ICI). The NCI provides a measure of the degree of consistency between the response pattern of an individual and…
Influence of Skip Patterns on Item Non-Response in a Substance Use Survey of 7th to 12th Grade Students

ERIC Educational Resources Information Center

Ding, Kele; Olds, R. Scott; Thombs, Dennis L.

2009-01-01

This retrospective case study assessed the influence of item non-response error on subsequent response to questionnaire items assessing adolescent alcohol and marijuana use. Post-hoc analyses were conducted on survey results obtained from 4,371 7th to 12th grade students in Ohio in 2005. A skip pattern design in a conventional questionnaire…
The Consequences of Ignoring Item Parameter Drift in Longitudinal Item Response Models

ERIC Educational Resources Information Center

Lee, Wooyeol; Cho, Sun-Joo

2017-01-01

Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…
Quantifying traditional Chinese medicine patterns using modern test theory: an example of functional constipation.

PubMed

Shen, Minxue; Cui, Yuanwu; Hu, Ming; Xu, Linyong

2017-01-13

The study aimed to validate a scale to assess the severity of "Yin deficiency, intestine heat" pattern of functional constipation based on the modern test theory. Pooled longitudinal data of 237 patients with "Yin deficiency, intestine heat" pattern of constipation from a prospective cohort study were used to validate the scale. Exploratory factor analysis was used to examine the common factors of items. A multidimensional item response model was used to assess the scale with the presence of multidimensionality. The Cronbach's alpha ranged from 0.79 to 0.89, and the split-half reliability ranged from 0.67 to 0.79 at different measurements. Exploratory factor analysis identified two common factors, and all items had cross factor loadings. Bidimensional model had better goodness of fit than the unidimensional model. Multidimensional item response model showed that the all items had moderate to high discrimination parameters. Parameters indicated that the first latent trait signified intestine heat, while the second trait characterized Yin deficiency. Information function showed that items demonstrated highest discrimination power among patients with moderate to high level of disease severity. Multidimensional item response theory provides a useful and rational approach in validating scales for assessing the severity of patterns in traditional Chinese medicine.
The Spanish version of the Self-Determination Inventory Student Report: application of item response theory to self-determination measurement.

PubMed

Mumbardó-Adam, C; Guàrdia-Olmos, J; Giné, C; Raley, S K; Shogren, K A

2018-04-01

A new measure of self-determination, the Self-Determination Inventory: Student Report (Spanish version), has recently been adapted and empirically validated in Spanish language. As it is the first instrument intended to measure self-determination in youth with and without disabilities, there is a need to further explore and strengthen its psychometric analysis based on item response patterns. Through item response theory approach, this study examined item observed distributions across the essential characteristics of self-determination. The results demonstrated satisfactory to excellent item functioning patterns across characteristics, particularly within agentic action domains. Increased variability across items was also found within action-control beliefs dimensions, specifically within the self-realisation subdomain. These findings further support the instrument's psychometric properties and outline future research directions. © 2017 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.
Converging evidence for control of color-word Stroop interference at the item level.

PubMed

Bugg, Julie M; Hutchison, Keith A

2013-04-01

Prior studies have shown that cognitive control is implemented at the list and context levels in the color-word Stroop task. At first blush, the finding that Stroop interference is reduced for mostly incongruent items as compared with mostly congruent items (i.e., the item-specific proportion congruence [ISPC] effect) appears to provide evidence for yet a third level of control, which modulates word reading at the item level. However, evidence to date favors the view that ISPC effects reflect the rapid prediction of high-contingency responses and not item-specific control. In Experiment 1, we first show that an ISPC effect is obtained when the relevant dimension (i.e., color) signals proportion congruency, a problematic pattern for theories based on differential response contingencies. In Experiment 2, we replicate and extend this pattern by showing that item-specific control settings transfer to new stimuli, ruling out alternative frequency-based accounts. In Experiment 3, we revert to the traditional design in which the irrelevant dimension (i.e., word) signals proportion congruency. Evidence for item-specific control, including transfer of the ISPC effect to new stimuli, is apparent when 4-item sets are employed but not when 2-item sets are employed. We attribute this pattern to the absence of high-contingency responses on incongruent trials in the 4-item set. These novel findings provide converging evidence for reactive control of color-word Stroop interference at the item level, reveal theoretically important factors that modulate reliance on item-specific control versus contingency learning, and suggest an update to the item-specific control account (Bugg, Jacoby, & Chanani, 2011).
A Polytomous Item Response Theory Analysis of Social Physique Anxiety Scale

ERIC Educational Resources Information Center

Fletcher, Richard B.; Crocker, Peter

2014-01-01

The present study investigated the social physique anxiety scale's factor structure and item properties using confirmatory factor analysis and item response theory. An additional aim was to identify differences in response patterns between groups (gender). A large sample of high school students aged 11-15 years (N = 1,529) consisting of n =…
Modeling Answer Change Behavior: An Application of a Generalized Item Response Tree Model

ERIC Educational Resources Information Center

Jeon, Minjeong; De Boeck, Paul; van der Linden, Wim

2017-01-01

We present a novel application of a generalized item response tree model to investigate test takers' answer change behavior. The model allows us to simultaneously model the observed patterns of the initial and final responses after an answer change as a function of a set of latent traits and item parameters. The proposed application is illustrated…
A Bayesian Semiparametric Item Response Model with Dirichlet Process Priors

ERIC Educational Resources Information Center

Miyazaki, Kei; Hoshino, Takahiro

2009-01-01

In Item Response Theory (IRT), item characteristic curves (ICCs) are illustrated through logistic models or normal ogive models, and the probability that examinees give the correct answer is usually a monotonically increasing function of their ability parameters. However, since only limited patterns of shapes can be obtained from logistic models…
An Analysis of Differential Response Patterns on the Peabody Picture Vocabulary Test-IIIB in Struggling Adult Readers and Third-Grade Children

ERIC Educational Resources Information Center

Pae, Hye K.; Greenberg, Daphne; Williams, Rihana S.

2012-01-01

This study examines the Peabody Picture Vocabulary Test-IIIB (PPVT-IIIB) performance of 130 adults identified as struggling readers, in comparison to 175 third-grade children. Response patterns to the items on the PPVT-IIIB by these two groups were investigated, focusing on items, semantic categories, and lexical features, including word length,…
Simulation-based Bayesian inference for latent traits of item response models: Introduction to the ltbayes package for R.

PubMed

Johnson, Timothy R; Kuhn, Kristine M

2015-12-01

This paper introduces the ltbayes package for R. This package includes a suite of functions for investigating the posterior distribution of latent traits of item response models. These include functions for simulating realizations from the posterior distribution, profiling the posterior density or likelihood function, calculation of posterior modes or means, Fisher information functions and observed information, and profile likelihood confidence intervals. Inferences can be based on individual response patterns or sets of response patterns such as sum scores. Functions are included for several common binary and polytomous item response models, but the package can also be used with user-specified models. This paper introduces some background and motivation for the package, and includes several detailed examples of its use.
Cognitive Diagnostic Attribute-Level Discrimination Indices

ERIC Educational Resources Information Center

Henson, Robert; Roussos, Louis; Douglas, Jeff; He, Xuming

2008-01-01

Cognitive diagnostic models (CDMs) model the probability of correctly answering an item as a function of an examinee's attribute mastery pattern. Because estimation of the mastery pattern involves more than a continuous measure of ability, reliability concepts introduced by classical test theory and item response theory do not apply. The cognitive…
Can Item Keyword Feedback Help Remediate Knowledge Gaps?

PubMed

Feinberg, Richard A; Clauser, Amanda L

2016-10-01

In graduate medical education, assessment results can effectively guide professional development when both assessment and feedback support a formative model. When individuals cannot directly access the test questions and responses, a way of using assessment results formatively is to provide item keyword feedback. The purpose of the following study was to investigate whether exposure to item keyword feedback aids in learner remediation. Participants included 319 trainees who completed a medical subspecialty in-training examination (ITE) in 2012 as first-year fellows, and then 1 year later in 2013 as second-year fellows. Performance on 2013 ITE items in which keywords were, or were not, exposed as part of the 2012 ITE score feedback was compared across groups based on the amount of time studying (preparation). For the same items common to both 2012 and 2013 ITEs, response patterns were analyzed to investigate changes in answer selection. Test takers who indicated greater amounts of preparation on the 2013 ITE did not perform better on the items in which keywords were exposed compared to those who were not exposed. The response pattern analysis substantiated overall growth in performance from the 2012 ITE. For items with incorrect responses on both attempts, examinees selected the same option 58% of the time. Results from the current study were unsuccessful in supporting the use of item keywords in aiding remediation. Unfortunately, the results did provide evidence of examinees retaining misinformation.
Best Design for Multidimensional Computerized Adaptive Testing With the Bifactor Model

PubMed Central

Seo, Dong Gi; Weiss, David J.

2015-01-01

Most computerized adaptive tests (CATs) have been studied using the framework of unidimensional item response theory. However, many psychological variables are multidimensional and might benefit from using a multidimensional approach to CATs. This study investigated the accuracy, fidelity, and efficiency of a fully multidimensional CAT algorithm (MCAT) with a bifactor model using simulated data. Four item selection methods in MCAT were examined for three bifactor pattern designs using two multidimensional item response theory models. To compare MCAT item selection and estimation methods, a fixed test length was used. The Ds-optimality item selection improved θ estimates with respect to a general factor, and either D- or A-optimality improved estimates of the group factors in three bifactor pattern designs under two multidimensional item response theory models. The MCAT model without a guessing parameter functioned better than the MCAT model with a guessing parameter. The MAP (maximum a posteriori) estimation method provided more accurate θ estimates than the EAP (expected a posteriori) method under most conditions, and MAP showed lower observed standard errors than EAP under most conditions, except for a general factor condition using Ds-optimality item selection. PMID:29795848
A randomized trial of mailed questionnaires versus telephone interviews: Response patterns in a survey

PubMed Central

Feveile, Helene; Olsen, Ole; Hogh, Annie

2007-01-01

Background Data for health surveys are often collected using either mailed questionnaires, telephone interviews or a combination. Mode of data collection can affect the propensity to refuse to respond and result in different patterns of responses. The objective of this paper is to examine and quantify effects of mode of data collection in health surveys. Methods A stratified sample of 4,000 adults residing in Denmark was randomised to mailed questionnaires or computer-assisted telephone interviews. 45 health-related items were analyzed; four concerning behaviour and 41 concerning self assessment. Odds ratios for more positive answers and more frequent use of extreme response categories (both positive and negative) among telephone respondents compared to questionnaire respondents were estimated. Tests were Bonferroni corrected. Results For the four health behaviour items there were no significant differences in the response patterns. For 32 of the 41 health self assessment items the response pattern was statistically significantly different and extreme response categories were used more frequently among telephone respondents (Median estimated odds ratio: 1.67). For a majority of these mode sensitive items (26/32), a more positive reporting was observed among telephone respondents (Median estimated odds ratio: 1.73). The overall response rate was similar among persons randomly assigned to questionnaires (58.1%) and to telephone interviews (56.2%). A differential nonresponse bias for age and gender was observed. The rate of missing responses was higher for questionnaires (0.73 – 6.00%) than for telephone interviews (0 – 0.51%). The "don't know" option was used more often by mail respondents (10 – 24%) than by telephone respondents (2 – 4%). Conclusion The mode of data collection affects the reporting of self assessed health items substantially. In epidemiological studies, the method effect may be as large as the effects under investigation. Caution is needed when comparing prevalences across surveys or when studying time trends. PMID:17592653
A Monte Carlo Simulation Investigating the Validity and Reliability of Ability Estimation in Item Response Theory with Speeded Computer Adaptive Tests

ERIC Educational Resources Information Center

Schmitt, T. A.; Sass, D. A.; Sullivan, J. R.; Walker, C. M.

2010-01-01

Imposed time limits on computer adaptive tests (CATs) can result in examinees having difficulty completing all items, thus compromising the validity and reliability of ability estimates. In this study, the effects of speededness were explored in a simulated CAT environment by varying examinee response patterns to end-of-test items. Expectedly,…
Modeling Nonignorable Missing Data in Speeded Tests

ERIC Educational Resources Information Center

Glas, Cees A. W.; Pimentel, Jonald L.

2008-01-01

In tests with time limits, items at the end are often not reached. Usually, the pattern of missing responses depends on the ability level of the respondents; therefore, missing data are not ignorable in statistical inference. This study models data using a combination of two item response theory (IRT) models: one for the observed response data and…

Item response theory in personality assessment: a demonstration using the MMPI-2 depression scale.

PubMed

Childs, R A; Dahlstrom, W G; Kemp, S M; Panter, A T

2000-03-01

Item response theory (IRT) analyses have, over the past 3 decades, added much to our understanding of the relationships among and characteristics of test items, as revealed in examinees response patterns. Assessment instruments used outside the educational context have only infrequently been analyzed using IRT, however. This study demonstrates the relevance of IRT to personality data through analyses of Scale 2 (the Depression Scale) on the revised Minnesota Multiphasic Personality Inventory (MMPI-2). A rich set of hypotheses regarding the items on this scale, including contrasts among the Harris-Lingoes and Wiener-Harmon subscales and differences in the items measurement characteristics for men and women, are investigated through the IRT analyses.
Design Patterns for Digital Item Types in Higher Education

ERIC Educational Resources Information Center

Draaijer, S.; Hartog, R. J. M.

2007-01-01

A set of design patterns for digital item types has been developed in response to challenges identified in various projects by teachers in higher education. The goal of the projects in question was to design and develop formative and summative tests, and to develop interactive learning material in the form of quizzes. The subject domains involved…
Can Item Keyword Feedback Help Remediate Knowledge Gaps?

PubMed Central

Feinberg, Richard A.; Clauser, Amanda L.

2016-01-01

ABSTRACT Background In graduate medical education, assessment results can effectively guide professional development when both assessment and feedback support a formative model. When individuals cannot directly access the test questions and responses, a way of using assessment results formatively is to provide item keyword feedback. Objective The purpose of the following study was to investigate whether exposure to item keyword feedback aids in learner remediation. Methods Participants included 319 trainees who completed a medical subspecialty in-training examination (ITE) in 2012 as first-year fellows, and then 1 year later in 2013 as second-year fellows. Performance on 2013 ITE items in which keywords were, or were not, exposed as part of the 2012 ITE score feedback was compared across groups based on the amount of time studying (preparation). For the same items common to both 2012 and 2013 ITEs, response patterns were analyzed to investigate changes in answer selection. Results Test takers who indicated greater amounts of preparation on the 2013 ITE did not perform better on the items in which keywords were exposed compared to those who were not exposed. The response pattern analysis substantiated overall growth in performance from the 2012 ITE. For items with incorrect responses on both attempts, examinees selected the same option 58% of the time. Conclusions Results from the current study were unsuccessful in supporting the use of item keywords in aiding remediation. Unfortunately, the results did provide evidence of examinees retaining misinformation. PMID:27777664
Response pattern of depressive symptoms among college students: What lies behind items of the Beck Depression Inventory-II?

PubMed

de Sá Junior, Antonio Reis; de Andrade, Arthur Guerra; Andrade, Laura Helena; Gorenstein, Clarice; Wang, Yuan-Pang

2018-07-01

This study examines the response pattern of depressive symptoms in a nationwide student sample, through item analyses of a rating scale by both classical test theory (CTT) and item response theory (IRT). The 21-item Beck Depression Inventory-II (BDI-II) was administered to 12,711 college students. First, the psychometric properties of the scale were described. Thereafter, the endorsement probability of depressive symptom in each scale item was analyzed through CTT and IRT. Graphical plots depicted the endorsement probability of scale items and intensity of depression. Three items of different difficulty level were compared through CTT and IRT approach. Four in five students reported the presence of depressive symptoms. The BDI-II items presented good reliability and were distributed along the symptomatic continuum of depression. Similarly, in both CTT and IRT approaches, the item 'changes in sleep' was easily endorsed, 'loss of interest' moderately and 'suicidal thoughts' hardly. Graphical representation of BDI-II of both methods showed much equivalence in terms of item discrimination and item difficulty. The item characteristic curve of the IRT method provided informative evaluation of item performance. The inventory was applied only in college students. Depressive symptoms were frequent psychopathological manifestations among college students. The performance of the BDI-II items indicated convergent results from both methods of analysis. While the CTT was easy to understand and to apply, the IRT was more complex to understand and to implement. Comprehensive assessment of the functioning of each BDI-II item might be helpful in efficient detection of depressive conditions in college students. Copyright © 2018 Elsevier B.V. All rights reserved.
For Which Boys and Which Girls Are Reading Assessment Items Biased Against? Detection of Differential Item Functioning in Heterogeneous Gender Populations

ERIC Educational Resources Information Center

Grover, Raman K.; Ercikan, Kadriye

2017-01-01

In gender differential item functioning (DIF) research it is assumed that all members of a gender group have similar item response patterns and therefore generalizations from group level to subgroup and individual levels can be made accurately. However DIF items do not necessarily disadvantage every member of a gender group to the same degree,…
The Heteroscedastic Graded Response Model with a Skewed Latent Trait: Testing Statistical and Substantive Hypotheses Related to Skewed Item Category Functions

ERIC Educational Resources Information Center

Molenaar, Dylan; Dolan, Conor V.; de Boeck, Paul

2012-01-01

The Graded Response Model (GRM; Samejima, "Estimation of ability using a response pattern of graded scores," Psychometric Monograph No. 17, Richmond, VA: The Psychometric Society, 1969) can be derived by assuming a linear regression of a continuous variable, Z, on the trait, [theta], to underlie the ordinal item scores (Takane & de Leeuw in…
A Practical Guide to Check the Consistency of Item Response Patterns in Clinical Research Through Person-Fit Statistics: Examples and a Computer Program.

PubMed

Meijer, Rob R; Niessen, A Susan M; Tendeiro, Jorge N

2016-02-01

Although there are many studies devoted to person-fit statistics to detect inconsistent item score patterns, most studies are difficult to understand for nonspecialists. The aim of this tutorial is to explain the principles of these statistics for researchers and clinicians who are interested in applying these statistics. In particular, we first explain how invalid test scores can be detected using person-fit statistics; second, we provide the reader practical examples of existing studies that used person-fit statistics to detect and to interpret inconsistent item score patterns; and third, we discuss a new R-package that can be used to identify and interpret inconsistent score patterns. © The Author(s) 2015.
Evidences of School Related Alienation in Elementary School Pupils.

ERIC Educational Resources Information Center

McElhinney, James H.; And Others.

In the spring of 1969 over 6,000 students in grades four through six responded to a 72 item questionnaire. Of the 72, 11 include responses which suggest possible alienation of this age group. Each school's pupils produced a unique pattern of responses to the 11 items, which suggests that the immediate school environment is one contributing factor…
A Mixture Rasch Model with a Covariate: A Simulation Study via Bayesian Markov Chain Monte Carlo Estimation

ERIC Educational Resources Information Center

Dai, Yunyun

2013-01-01

Mixtures of item response theory (IRT) models have been proposed as a technique to explore response patterns in test data related to cognitive strategies, instructional sensitivity, and differential item functioning (DIF). Estimation proves challenging due to difficulties in identification and questions of effect size needed to recover underlying…
The Probability of Exceedance as a Nonparametric Person-Fit Statistic for Tests of Moderate Length

ERIC Educational Resources Information Center

Tendeiro, Jorge N.; Meijer, Rob R.

2013-01-01

To classify an item score pattern as not fitting a nonparametric item response theory (NIRT) model, the probability of exceedance (PE) of an observed response vector x can be determined as the sum of the probabilities of all response vectors that are, at most, as likely as x, conditional on the test's total score. Vector x is to be considered…
Searching for serial refreshing in working memory: Using response times to track the content of the focus of attention over time.

PubMed

Vergauwe, Evie; Hardman, Kyle O; Rouder, Jeffrey N; Roemer, Emily; McAllaster, Sara; Cowan, Nelson

2016-12-01

One popular idea is that, to support the maintenance of a set of elements over brief periods of time, the focus of attention rotates among the different elements, thereby serially refreshing the content of working memory (WM). In the research reported here, probe letters were presented between to-be-remembered letters, and response times to these probes were used to infer the status of the different items in WM. If the focus of attention cycles from one item to the next, its content should be different at different points in time, and this should be reflected in a change in the response time patterns over time. Across a set of four experiments, we demonstrated a striking pattern of invariance in the response time patterns over time, suggesting either that the content of the focus of attention did not change over time or that response times cannot be used to infer the content of the focus of attention. We discuss how this pattern constrains models of WM, attention, and human information processing.
Use of item response curves of the Force and Motion Conceptual Evaluation to compare Japanese and American students' views on force and motion

NASA Astrophysics Data System (ADS)

Ishimoto, Michi; Davenport, Glen; Wittmann, Michael C.

2017-12-01

Student views of force and motion reflect the personal experiences and physics education of the student. With a different language, culture, and educational system, we expect that Japanese students' views on force and motion might be different from those of American students. The Force and Motion Conceptual Evaluation (FMCE) is an instrument used to probe student views on force and motion. It was designed using research on American students, and, as such, the items might function differently for Japanese students. Preliminary results from a translated version indicated that Japanese students had similar misconceptions as those of American students. In this study, we used item response curves (IRCs) to make more detailed item-by-item comparisons. IRCs show the functioning of individual items across all levels of performance by plotting the proportion of each response as a function of the total score. Most of the IRCs showed very similar patterns on both correct and incorrect responses; however, a few of the plots indicate differences between the populations. The similar patterns indicate that students tend to interact with FMCE items similarly, despite differences in culture, language, and education. We speculate about the possible causes for the differences in some of the IRCs. This report is intended to show how IRCs can be used as a part of the validation process when making comparisons across languages and nationalities. Differences in IRCs can help to pinpoint artifacts of translation, contextual effects because of differences in culture, and perhaps intrinsic differences in student understanding of Newtonian motion.
Comparison of response patterns in different survey designs: a longitudinal panel with mixed-mode and online-only design.

PubMed

Rübsamen, Nicole; Akmatov, Manas K; Castell, Stefanie; Karch, André; Mikolajczyk, Rafael T

2017-01-01

Increasing availability of the Internet allows using only online data collection for more epidemiological studies. We compare response patterns in a population-based health survey using two survey designs: mixed-mode (choice between paper-and-pencil and online questionnaires) and online-only design (without choice). We used data from a longitudinal panel, the Hygiene and Behaviour Infectious Diseases Study (HaBIDS), conducted in 2014/2015 in four regions in Lower Saxony, Germany. Individuals were recruited using address-based probability sampling. In two regions, individuals could choose between paper-and-pencil and online questionnaires. In the other two regions, individuals were offered online-only participation. We compared sociodemographic characteristics of respondents who filled in all panel questionnaires between the mixed-mode group (n = 1110) and the online-only group (n = 482). Using 134 items, we performed multinomial logistic regression to compare responses between survey designs in terms of type (missing, "do not know" or valid response) and ordinal regression to compare responses in terms of content. We applied the false discovery rates (FDR) to control for multiple testing and investigated effects of adjusting for sociodemographic characteristic. For validation of the differential response patterns between mixed-mode and online-only, we compared the response patterns between paper and online mode among the respondents in the mixed-mode group in one region (n = 786). Respondents in the online-only group were older than those in the mixed-mode group, but both groups did not differ regarding sex or education. Type of response did not differ between the online-only and the mixed-mode group. Survey design was associated with different content of response in 18 of the 134 investigated items; which decreased to 11 after adjusting for sociodemographic variables. In the validation within the mixed-mode, only two of those were among the 11 significantly different items. The probability of observing by chance the same two or more significant differences in this setting was 22%. We found similar response patterns in both survey designs with only few items being answered differently, likely attributable to chance. Our study supports the equivalence of the compared survey designs and suggests that, in the studied setting, using online-only design does not cause strong distortion of the results.
The Effect of Sequential Dependence on the Sampling Distributions of KR-20, KR-21, and Split-Halves Reliabilities.

ERIC Educational Resources Information Center

Sullins, Walter L.

Five-hundred dichotomously scored response patterns were generated with sequentially independent (SI) items and 500 with dependent (SD) items for each of thirty-six combinations of sampling parameters (i.e., three test lengths, three sample sizes, and four item difficulty distributions). KR-20, KR-21, and Split-Half (S-H) reliabilities were…
Spotting Erroneous Rules of Operation by the Individual Consistency Index.

ERIC Educational Resources Information Center

Tatsuoka, Kikumi K.; Tatsuoka, Maurice M.

1983-01-01

This study introduces the individual consistency index (ICI), which measures the extent to which patterns of responses to parallel sets of items remain consistent over time. ICI is used as an error diagnostic tool to detect aberrant response patterns resulting from the consistent application of erroneous rules of operation. (Author/PN)
A Permutation Test for Correlated Errors in Adjacent Questionnaire Items

ERIC Educational Resources Information Center

Hildreth, Laura A.; Genschel, Ulrike; Lorenz, Frederick O.; Lesser, Virginia M.

2013-01-01

Response patterns are of importance to survey researchers because of the insight they provide into the thought processes respondents use to answer survey questions. In this article we propose the use of structural equation modeling to examine response patterns and develop a permutation test to quantify the likelihood of observing a specific…
The development of automaticity in short-term memory search: Item-response learning and category learning.

PubMed

Cao, Rui; Nosofsky, Robert M; Shiffrin, Richard M

2017-05-01

In short-term-memory (STM)-search tasks, observers judge whether a test probe was present in a short list of study items. Here we investigated the long-term learning mechanisms that lead to the highly efficient STM-search performance observed under conditions of consistent-mapping (CM) training, in which targets and foils never switch roles across trials. In item-response learning, subjects learn long-term mappings between individual items and target versus foil responses. In category learning, subjects learn high-level codes corresponding to separate sets of items and learn to attach old versus new responses to these category codes. To distinguish between these 2 forms of learning, we tested subjects in categorized varied mapping (CV) conditions: There were 2 distinct categories of items, but the assignment of categories to target versus foil responses varied across trials. In cases involving arbitrary categories, CV performance closely resembled standard varied-mapping performance without categories and departed dramatically from CM performance, supporting the item-response-learning hypothesis. In cases involving prelearned categories, CV performance resembled CM performance, as long as there was sufficient practice or steps taken to reduce trial-to-trial category-switching costs. This pattern of results supports the category-coding hypothesis for sufficiently well-learned categories. Thus, item-response learning occurs rapidly and is used early in CM training; category learning is much slower but is eventually adopted and is used to increase the efficiency of search beyond that available from item-response learning. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Examining student heuristic usage in a hydrogen bonding assessment.

PubMed

Miller, Kathryn; Kim, Thomas

2017-09-01

This study investigates the role of representational competence in student responses to an assessment of hydrogen bonding. The assessment couples the use of a multiple-select item ("Choose all that apply") with an open-ended item to allow for an examination of students' cognitive processes as they relate to the assignment of hydrogen bonding within a structural representation. Response patterns from the multiple-select item implicate heuristic usage as a contributing factor to students' incorrect responses. The use of heuristics is further supported by the students' corresponding responses to the open-ended assessment item. Taken together, these data suggest that poor representational competence may contribute to students' previously observed inability to correctly navigate the concept of hydrogen bonding. © 2017 by The International Union of Biochemistry and Molecular Biology, 45(5):411-416, 2017. © 2017 The International Union of Biochemistry and Molecular Biology.
Detecting Measurement Disturbances in Rater-Mediated Assessments

ERIC Educational Resources Information Center

Wind, Stefanie A.; Schumacker, Randall E.

2017-01-01

The term measurement disturbance has been used to describe systematic conditions that affect a measurement process, resulting in a compromised interpretation of person or item estimates. Measurement disturbances have been discussed in relation to systematic response patterns associated with items and persons, such as start-up, plodding, boredom,…
Support for an auto-associative model of spoken cued recall: evidence from fMRI.

PubMed

de Zubicaray, Greig; McMahon, Katie; Eastburn, Mathew; Pringle, Alan J; Lorenz, Lina; Humphreys, Michael S

2007-03-02

Cued recall and item recognition are considered the standard episodic memory retrieval tasks. However, only the neural correlates of the latter have been studied in detail with fMRI. Using an event-related fMRI experimental design that permits spoken responses, we tested hypotheses from an auto-associative model of cued recall and item recognition [Chappell, M., & Humphreys, M. S. (1994). An auto-associative neural network for sparse representations: Analysis and application to models of recognition and cued recall. Psychological Review, 101, 103-128]. In brief, the model assumes that cues elicit a network of phonological short term memory (STM) and semantic long term memory (LTM) representations distributed throughout the neocortex as patterns of sparse activations. This information is transferred to the hippocampus which converges upon the item closest to a stored pattern and outputs a response. Word pairs were learned from a study list, with one member of the pair serving as the cue at test. Unstudied words were also intermingled at test in order to provide an analogue of yes/no recognition tasks. Compared to incorrectly rejected studied items (misses) and correctly rejected (CR) unstudied items, correctly recalled items (hits) elicited increased responses in the left hippocampus and neocortical regions including the left inferior prefrontal cortex (LIPC), left mid lateral temporal cortex and inferior parietal cortex, consistent with predictions from the model. This network was very similar to that observed in yes/no recognition studies, supporting proposals that cued recall and item recognition involve common rather than separate mechanisms.

[Effects of false memories on the Concealed Information Test].

PubMed

Zaitsu, Wataru

2012-10-01

The effects of false memories on polygraph examinations with the Concealed Information Test (CIT) were investigated by using the Deese-Roediger-McDermott (DRM) paradigm, which allows participants to evoke false memories. Physiological responses to questions consisting of learned, lure, and unlearned items were measured and recorded. The results indicated that responses to lure questions showed critical responses to questions about learned items. These responses included repression of respiration, an increase in electrodermal activity, and a drop in heart rate. These results suggest that critical response patterns are generated in the peripheral nervous system by true and false memories.
Scoring and Classifying Examinees Using Measurement Decision Theory

ERIC Educational Resources Information Center

Rudner, Lawrence M.

2009-01-01

This paper describes and evaluates the use of measurement decision theory (MDT) to classify examinees based on their item response patterns. The model has a simple framework that starts with the conditional probabilities of examinees in each category or mastery state responding correctly to each item. The presented evaluation investigates: (1) the…
Outlier Detection in High-Stakes Certification Testing. Research Report.

ERIC Educational Resources Information Center

Meijer, Rob R.

Recent developments of person-fit analysis in computerized adaptive testing (CAT) are discussed. Methods from statistical process control are presented that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory (IRT) model in a CAT. Most person-fit research in CAT is restricted to…
Outlier Detection in High-Stakes Certification Testing.

ERIC Educational Resources Information Center

Meijer, Rob R.

2002-01-01

Used empirical data from a certification test to study methods from statistical process control that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory model in computerized adaptive testing. Results for 1,392 examinees show that different types of misfit can be distinguished. (SLD)
Detection and validation of unscalable item score patterns using item response theory: an illustration with Harter's Self-Perception Profile for Children.

PubMed

Meijer, Rob R; Egberink, Iris J L; Emons, Wilco H M; Sijtsma, Klaas

2008-05-01

We illustrate the usefulness of person-fit methodology for personality assessment. For this purpose, we use person-fit methods from item response theory. First, we give a nontechnical introduction to existing person-fit statistics. Second, we analyze data from Harter's (1985) Self-Perception Profile for Children (Harter, 1985) in a sample of children ranging from 8 to 12 years of age (N = 611) and argue that for some children, the scale scores should be interpreted with care and caution. Combined information from person-fit indexes and from observation, interviews, and self-concept theory showed that similar score profiles may have a different interpretation. For some children in the sample, item scores did not adequately reflect their trait level. Based on teacher interviews, this was found to be due most likely to a less developed self-concept and/or problems understanding the meaning of the questions. We recommend investigating the scalability of score patterns when using self-report inventories to help the researcher interpret respondents' behavior correctly.
Caregiver Appraisals of Functional Dependence in Individuals With Dementia and Associated Caregiver Upset: Psychometric Properties of a New Scale and Response Patterns by Caregiver and Care Recipient Characteristics

PubMed Central

GITLIN, LAURA N.; ROTH, DAVID L.; BURGIO, LOUIS D.; LOEWENSTEIN, DAVID A.; WINTER, LARAINE; NICHOLS, LINDA; ARGÜELLES, SOLEDAD; CORCORAN, MARY; BURNS, ROBERT; MARTINDALE, JENNIFER

2008-01-01

Objective To evaluate psychometric properties and response patterns of the Caregiver Assessment of Function and Upset (CAFU), a 15-item multidimensional measure of dependence in dementia patients and caregiver reaction. Method 640 families were administered the CAFU (53% White, 43% African American, and 4% mixed race and ethnicity). We created a random split of the sample and conducted exploratory factor analyses on Sample 1 and confirmatory factor analyses on Sample 2. Convergent and discriminant validity were evaluated using Spearman rank correlation coefficients. Results A two-factor structure for functional items was derived, and excellent factorial validity was obtained. Convergent and discriminant validity were obtained for function and upset measures. Differential response patterns for dependence and caregiver upset were found for caregiver race, relationship, and care recipient gender but not for caregiver gender. Discussion The CAFU is easily administered, reliable, and valid for evaluating appraisals of dependencies and upsetting care areas. PMID:15750049
The l z ( p ) * Person-Fit Statistic in an Unfolding Model Context.

PubMed

Tendeiro, Jorge N

2017-01-01

Although person-fit analysis has a long-standing tradition within item response theory, it has been applied in combination with dominance response models almost exclusively. In this article, a popular log likelihood-based parametric person-fit statistic under the framework of the generalized graded unfolding model is used. Results from a simulation study indicate that the person-fit statistic performed relatively well in detecting midpoint response style patterns and not so well in detecting extreme response style patterns.
Building an Evaluation Scale using Item Response Theory.

PubMed

Lalor, John P; Wu, Hao; Yu, Hong

2016-11-01

Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.
Building an Evaluation Scale using Item Response Theory

PubMed Central

Lalor, John P.; Wu, Hao; Yu, Hong

2016-01-01

Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.1 PMID:28004039
Does the hippocampus mediate objective binding or subjective remembering?

PubMed

Slotnick, Scott D

2010-01-15

Human functional magnetic resonance imaging (fMRI) evidence suggests the hippocampus is associated with context memory to a greater degree than item memory (where only context memory requires item-in-context binding). A separate line of fMRI research suggests the hippocampus is associated with "remember" responses to a greater degree than "know" or familiarity based responses (where only remembering reflects the subjective experience of specific detail). Previous studies, however, have confounded context memory with remembering and item memory with knowing. The present fMRI study independently tested the binding hypothesis and remembering hypothesis of hippocampal function by evaluating activity within hippocampal regions-of-interest (ROIs). At encoding, participants were presented with colored and gray abstract shapes and instructed to remember each shape and whether it was colored or gray. At retrieval, old and new shapes were presented in gray and participants classified each shape as "old and previously colored", "old and previously gray", or "new", followed by a "remember" or "know" response. In 3 of 11 hippocampal ROIs, activity was significantly greater for context memory than item memory, the context memory-item memory by remember-know interaction was significant, and activity was significantly greater for context memory-knowing than item memory-remembering. This pattern of activity only supports the binding hypothesis. The analogous pattern of activity that would have supported the remembering hypothesis was never observed in the hippocampus. However, a targeted analysis revealed remembering specific activity in the left inferior parietal cortex. The present results suggest parietal cortex may be associated with subjective remembering while the hippocampus mediates binding.
Assessing the mechanism of response in the retrosplenial cortex of good and poor navigators☆

PubMed Central

Auger, Stephen D.; Maguire, Eleanor A.

2013-01-01

The retrosplenial cortex (RSC) is consistently engaged by a range of tasks that examine episodic memory, imagining the future, spatial navigation, and scene processing. Despite this, an account of its exact contribution to these cognitive functions remains elusive. Here, using functional MRI (fMRI) and multi-voxel pattern analysis (MVPA) we found that the RSC coded for the specific number of permanent outdoor items that were in view, that is, items which are fixed and never change their location. Moreover, this effect was selective, and was not apparent for other item features such as size and visual salience. This detailed detection of the number of permanent items in view was echoed in the parahippocampal cortex (PHC), although the two brain structures diverged when participants were divided into good and poor navigators. There was no difference in the responsivity of the PHC between the two groups, while significantly better decoding of the number of permanent items in view was possible from patterns of activity in the RSC of good compared to poor navigators. Within good navigators, the RSC also facilitated significantly better prediction of item permanence than the PHC. Overall, these findings suggest that the RSC in particular is concerned with coding the presence of every permanent item that is in view. This mechanism may represent a key building block for spatial and scene representations that are central to episodic memories and imagining the future, and could also be a prerequisite for successful navigation. PMID:24012136
Oropharyngeal dysphagia: surveying practice patterns of the speech-language pathologist.

PubMed

Martino, Rosemary; Pron, Gaylene; Diamant, Nicholas E

2004-01-01

The present study was designed to obtain a comprehensive view of the dysphagia assessment practice patterns of speech-language pathologists and their opinion on the importance of these practices using survey methods and taking into consideration clinician, patient, and practice-setting variables. A self-administered mail questionnaire was developed following established methodology to maximize response rates. Eight dysphagia experts independently rated the new survey for content validity. Test-retest reliability was assessed with a random sample of 23 participants. The survey was sent to 50 speech-language pathologists randomly selected from the Canadian professional association database of members who practice in dysphagia. Surveys were mailed according to the Dillman Total Design Method and included an incentive offer. High survey (64%) and item response (95%) rates were achieved and clinicians were reliable reporters of their practice behaviors (ICC>0.60). Of all the clinical assessment items, 36% were reported with high (>80%) utilization and 24% with low (<20%) utilization, the former pertaining to tongue motion and vocal quality after food/fluid intake and the latter to testing of oral sensation without food. One-third (33%) of instrumental assessment items were highly utilized and included assessment of bolus movement and laryngeal response to bolus misdirection. Overall, clinician experience and teaching institutions influenced greater utilization. Opinions of importance were similar to utilization behaviors (r = 0.947, p = 0.01). Of all patients referred for dysphagia assessment, full clinical assessments were administered to 71% of patients but instrumental assessments to only 36%. A hierarchical model of practice behavior is proposed to explain this pattern of progressively decreasing item utilization.
Segregating the significant from the mundane on a moment-to-moment basis via direct and indirect amygdala contributions

PubMed Central

Lim, Seung-Lark; Padmala, Srikanth; Pessoa, Luiz

2009-01-01

If the amygdala is involved in shaping perceptual experience when affectively significant visual items are encountered, responses in this structure should be correlated with both visual cortex responses and behavioral reports. Here, we investigated how affective significance shapes visual perception during an attentional blink paradigm combined with aversive conditioning. Behaviorally, following aversive learning, affectively significant scenes (CS+) were better detected than neutral (CS−) ones. In terms of mean brain responses, both amygdala and visual cortical responses were stronger during CS+ relative to CS− trials. Increased brain responses in these regions were associated with improved behavioral performance across participants and followed a mediation-like pattern. Importantly, the mediation pattern was observed in a trial-by-trial analysis, revealing that the specific pattern of trial-by-trial variability in brain responses was closely related to single-trial behavioral performance. Furthermore, the influence of the amygdala on visual cortical responses was consistent with a mediation, although partial, via frontal brain regions. Our results thus suggest that affective significance potentially determines the fate of a visual item during competitive interactions by enhancing sensory processing through both direct and indirect paths. In so doing, the amygdala helps separate the significant from the mundane. PMID:19805383
Predicting Survey Responses: How and Why Semantics Shape Survey Statistics on Organizational Behaviour

PubMed Central

Arnulf, Jan Ketil; Larsen, Kai Rune; Martinsen, Øyvind Lund; Bong, Chih How

2014-01-01

Some disciplines in the social sciences rely heavily on collecting survey responses to detect empirical relationships among variables. We explored whether these relationships were a priori predictable from the semantic properties of the survey items, using language processing algorithms which are now available as new research methods. Language processing algorithms were used to calculate the semantic similarity among all items in state-of-the-art surveys from Organisational Behaviour research. These surveys covered areas such as transformational leadership, work motivation and work outcomes. This information was used to explain and predict the response patterns from real subjects. Semantic algorithms explained 60–86% of the variance in the response patterns and allowed remarkably precise prediction of survey responses from humans, except in a personality test. Even the relationships between independent and their purported dependent variables were accurately predicted. This raises concern about the empirical nature of data collected through some surveys if results are already given a priori through the way subjects are being asked. Survey response patterns seem heavily determined by semantics. Language algorithms may suggest these prior to administering a survey. This study suggests that semantic algorithms are becoming new tools for the social sciences, opening perspectives on survey responses that prevalent psychometric theory cannot explain. PMID:25184672
Predicting survey responses: how and why semantics shape survey statistics on organizational behaviour.

PubMed

Arnulf, Jan Ketil; Larsen, Kai Rune; Martinsen, Øyvind Lund; Bong, Chih How

2014-01-01

Some disciplines in the social sciences rely heavily on collecting survey responses to detect empirical relationships among variables. We explored whether these relationships were a priori predictable from the semantic properties of the survey items, using language processing algorithms which are now available as new research methods. Language processing algorithms were used to calculate the semantic similarity among all items in state-of-the-art surveys from Organisational Behaviour research. These surveys covered areas such as transformational leadership, work motivation and work outcomes. This information was used to explain and predict the response patterns from real subjects. Semantic algorithms explained 60-86% of the variance in the response patterns and allowed remarkably precise prediction of survey responses from humans, except in a personality test. Even the relationships between independent and their purported dependent variables were accurately predicted. This raises concern about the empirical nature of data collected through some surveys if results are already given a priori through the way subjects are being asked. Survey response patterns seem heavily determined by semantics. Language algorithms may suggest these prior to administering a survey. This study suggests that semantic algorithms are becoming new tools for the social sciences, opening perspectives on survey responses that prevalent psychometric theory cannot explain.
The effect of response modality on immediate serial recall in dementia of the Alzheimer type.

PubMed

Macé, Anne-Laure; Ergis, Anne-Marie; Caza, Nicole

2012-09-01

Contrary to traditional models of verbal short-term memory (STM), psycholinguistic accounts assume that temporary retention of verbal materials is an intrinsic property of word processing. Therefore, memory performance will depend on the nature of the STM tasks, which vary according to the linguistic representations they engage. The aim of this study was to explore the effect of response modality on verbal STM performance in individuals with dementia of the Alzheimer Type (DAT), and its relationship with the patients' word-processing deficits. Twenty individuals with mild DAT and 20 controls were tested on an immediate serial recall (ISR) task using the same items across two response modalities (oral and picture pointing) and completed a detailed language assessment. When scoring of ISR performance was based on item memory regardless of item order, a response modality effect was found for all participants, indicating that they recalled more items with picture pointing than with oral response. However, this effect was less marked in patients than in controls, resulting in an interaction. Interestingly, when recall of both item and order was considered, results indicated similar performance between response modalities in controls, whereas performance was worse for pointing than for oral response in patients. Picture-naming performance was also reduced in patients relative to controls. However, in the word-to-picture matching task, a similar pattern of responses was found between groups for incorrectly named pictures of the same items. The finding of a response modality effect in item memory for all participants is compatible with the assumption that semantic influences are greater in picture pointing than in oral response, as predicted by psycholinguistic models. Furthermore, patients' performance was modulated by their word-processing deficits, showing a reduced advantage relative to controls. Overall, the response modality effect observed in this study for item memory suggests that verbal STM performance is intrinsically linked with word processing capacities in both healthy controls and individuals with mild DAT, supporting psycholinguistic models of STM.
Person-Fit Statistics for Joint Models for Accuracy and Speed

ERIC Educational Resources Information Center

Fox, Jean-Paul; Marianti, Sukaesi

2017-01-01

Response accuracy and response time data can be analyzed with a joint model to measure ability and speed of working, while accounting for relationships between item and person characteristics. In this study, person-fit statistics are proposed for joint models to detect aberrant response accuracy and/or response time patterns. The person-fit tests…
Development of an abbreviated Career Indecision Profile-65 using item response theory: The CIP-Short.

PubMed

Xu, Hui; Tracey, Terence J G

2017-03-01

The current study developed an abbreviated version of the Career Indecision Profile-65 (CIP-65; Hacker, Carr, Abrams, & Brown, 2013) by using item response theory. In order to improve the efficiency of the CIP-65 in measuring career indecision, the individual item performance of the CIP-65 was examined with respect to the ordering of response occurrence and gender differential item functioning. The best 5 items of each scale of the CIP-65 (i.e., neuroticism/negative affectivity, choice/commitment anxiety, lack of readiness, and interpersonal conflicts) were retained in the CIP-Short using a sample of 588 college students. A validation sample (N = 174) supported the reliability and structural validity of the CIP-Short. The convergent and divergent validity of the CIP-Short was additionally supported in the findings of a hypothesized differential relational pattern in a separate sample (N = 360). While the current study supported the CIP-Short being a sound brief measure of career indecision, the limitations of this study and suggestions for future research were discussed as well. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Development of a PROMIS item bank to measure pain interference.

PubMed

Amtmann, Dagmar; Cook, Karon F; Jensen, Mark P; Chen, Wen-Hung; Choi, Seung; Revicki, Dennis; Cella, David; Rothrock, Nan; Keefe, Francis; Callahan, Leigh; Lai, Jin-Shei

2010-07-01

This paper describes the psychometric properties of the PROMIS-pain interference (PROMIS-PI) bank. An initial candidate item pool (n=644) was developed and evaluated based on the review of existing instruments, interviews with patients, and consultation with pain experts. From this pool, a candidate item bank of 56 items was selected and responses to the items were collected from large community and clinical samples. A total of 14,848 participants responded to all or a subset of candidate items. The responses were calibrated using an item response theory (IRT) model. A final 41-item bank was evaluated with respect to IRT assumptions, model fit, differential item function (DIF), precision, and construct and concurrent validity. Items of the revised bank had good fit to the IRT model (CFI and NNFI/TLI ranged from 0.974 to 0.997), and the data were strongly unidimensional (e.g., ratio of first and second eigenvalue=35). Nine items exhibited statistically significant DIF. However, adjusting for DIF had little practical impact on score estimates and the items were retained without modifying scoring. Scores provided substantial information across levels of pain; for scores in the T-score range 50-80, the reliability was equivalent to 0.96-0.99. Patterns of correlations with other health outcomes supported the construct validity of the item bank. The scores discriminated among persons with different numbers of chronic conditions, disabling conditions, levels of self-reported health, and pain intensity (p<0.0001). The results indicated that the PROMIS-PI items constitute a psychometrically sound bank. Computerized adaptive testing and short forms are available. Copyright 2010 International Association for the Study of Pain. All rights reserved.
Category-Specific Neural Oscillations Predict Recall Organization During Memory Search

PubMed Central

Morton, Neal W.; Kahana, Michael J.; Rosenberg, Emily A.; Baltuch, Gordon H.; Litt, Brian; Sharan, Ashwini D.; Sperling, Michael R.; Polyn, Sean M.

2013-01-01

Retrieved-context models of human memory propose that as material is studied, retrieval cues are constructed that allow one to target particular aspects of past experience. We examined the neural predictions of these models by using electrocorticographic/depth recordings and scalp electroencephalography (EEG) to characterize category-specific oscillatory activity, while participants studied and recalled items from distinct, neurally discriminable categories. During study, these category-specific patterns predict whether a studied item will be recalled. In the scalp EEG experiment, category-specific activity during study also predicts whether a given item will be recalled adjacent to other same-category items, consistent with the proposal that a category-specific retrieval cue is used to guide memory search. Retrieved-context models suggest that integrative neural circuitry is involved in the construction and maintenance of the retrieval cue. Consistent with this hypothesis, we observe category-specific patterns that rise in strength as multiple same-category items are studied sequentially, and find that individual differences in this category-specific neural integration during study predict the degree to which a participant will use category information to organize memory search. Finally, we track the deployment of this retrieval cue during memory search: Category-specific patterns are stronger when participants organize their responses according to the category of the studied material. PMID:22875859

Item response theory detects differential item functioning between healthy and ill children in QoL measures

PubMed Central

Langer, Michelle M.; Hill, Cheryl D.; Thissen, David; Burwinkle, Tasha M.; Varni, James W.; DeWalt, Darren A.

2008-01-01

Objective To demonstrate the value of item response theory (IRT) and differential item functioning (DIF) methods in examining a health-related quality of life (HRQOL) measure in children and adolescents. Study Design and Setting This illustration uses data from 5,429 children using the four subscales of the PedsQL™ 4.0 Generic Core Scales. The IRT model-based likelihood ratio test was used to detect and evaluate DIF between healthy children and children with a chronic condition. Results DIF was detected for a majority of items but cancelled out at the total test score level due to opposing directions of DIF. Post-hoc analysis indicated that this pattern of results may be due to multidimensionality. We discuss issues in detecting and handling DIF. Conclusion This paper describes how to perform DIF analyses in validating a questionnaire to ensure that scores have equivalent meaning across subgroups. It offers insight into ways information gained through the analysis can be used to evaluate an existing scale. PMID:18226750
The construction of categorization judgments: using subjective confidence and response latency to test a distributed model.

PubMed

Koriat, Asher; Sorka, Hila

2015-01-01

The classification of objects to natural categories exhibits cross-person consensus and within-person consistency, but also some degree of between-person variability and within-person instability. What is more, the variability in categorization is also not entirely random but discloses systematic patterns. In this study, we applied the Self-Consistency Model (SCM, Koriat, 2012) to category membership decisions, examining the possibility that confidence judgments and decision latency track the stable and variable components of categorization responses. The model assumes that category membership decisions are constructed on the fly depending on a small set of clues that are sampled from a commonly shared population of pertinent clues. The decision and confidence are based on the balance of evidence in favor of a positive or a negative response. The results confirmed several predictions derived from SCM. For each participant, consensual responses to items were more confident than non-consensual responses, and for each item, participants who made the consensual response tended to be more confident than those who made the nonconsensual response. The difference in confidence between consensual and nonconsensual responses increased with the proportion of participants who made the majority response for the item. A similar pattern was observed for response speed. The pattern of results obtained for cross-person consensus was replicated by the results for response consistency when the responses were classified in terms of within-person agreement across repeated presentations. These results accord with the sampling assumption of SCM, that confidence and response speed should be higher when the decision is consistent with what follows from the entire population of clues than when it deviates from it. Results also suggested that the context for classification can bias the sample of clues underlying the decision, and that confidence judgments mirror the effects of context on categorization decisions. The model and results offer a principled account of the stable and variable contributions to categorization behavior within a decision-making framework. Copyright © 2014 Elsevier B.V. All rights reserved.
Age-related increases in false recognition: the role of perceptual and conceptual similarity.

PubMed

Pidgeon, Laura M; Morcom, Alexa M

2014-01-01

Older adults (OAs) are more likely to falsely recognize novel events than young adults, and recent behavioral and neuroimaging evidence points to a reduced ability to distinguish overlapping information due to decline in hippocampal pattern separation. However, other data suggest a critical role for semantic similarity. Koutstaal et al. [(2003) false recognition of abstract vs. common objects in older and younger adults: testing the semantic categorization account, J. Exp. Psychol. Learn. 29, 499-510] reported that OAs were only vulnerable to false recognition of items with pre-existing semantic representations. We replicated Koutstaal et al.'s (2003) second experiment and examined the influence of independently rated perceptual and conceptual similarity between stimuli and lures. At study, young and OAs judged the pleasantness of pictures of abstract (unfamiliar) and concrete (familiar) items, followed by a surprise recognition test including studied items, similar lures, and novel unrelated items. Experiment 1 used dichotomous "old/new" responses at test, while in Experiment 2 participants were also asked to judge lures as "similar," to increase explicit demands on pattern separation. In both experiments, OAs showed a greater increase in false recognition for concrete than abstract items relative to the young, replicating Koutstaal et al.'s (2003) findings. However, unlike in the earlier study, there was also an age-related increase in false recognition of abstract lures when multiple similar images had been studied. In line with pattern separation accounts of false recognition, OAs were more likely to misclassify concrete lures with high and moderate, but not low degrees of rated similarity to studied items. Results are consistent with the view that OAs are particularly susceptible to semantic interference in recognition memory, and with the possibility that this reflects age-related decline in pattern separation.
Age-related increases in false recognition: the role of perceptual and conceptual similarity

PubMed Central

Pidgeon, Laura M.; Morcom, Alexa M.

2014-01-01

Older adults (OAs) are more likely to falsely recognize novel events than young adults, and recent behavioral and neuroimaging evidence points to a reduced ability to distinguish overlapping information due to decline in hippocampal pattern separation. However, other data suggest a critical role for semantic similarity. Koutstaal et al. [(2003) false recognition of abstract vs. common objects in older and younger adults: testing the semantic categorization account, J. Exp. Psychol. Learn. 29, 499–510] reported that OAs were only vulnerable to false recognition of items with pre-existing semantic representations. We replicated Koutstaal et al.’s (2003) second experiment and examined the influence of independently rated perceptual and conceptual similarity between stimuli and lures. At study, young and OAs judged the pleasantness of pictures of abstract (unfamiliar) and concrete (familiar) items, followed by a surprise recognition test including studied items, similar lures, and novel unrelated items. Experiment 1 used dichotomous “old/new” responses at test, while in Experiment 2 participants were also asked to judge lures as “similar,” to increase explicit demands on pattern separation. In both experiments, OAs showed a greater increase in false recognition for concrete than abstract items relative to the young, replicating Koutstaal et al.’s (2003) findings. However, unlike in the earlier study, there was also an age-related increase in false recognition of abstract lures when multiple similar images had been studied. In line with pattern separation accounts of false recognition, OAs were more likely to misclassify concrete lures with high and moderate, but not low degrees of rated similarity to studied items. Results are consistent with the view that OAs are particularly susceptible to semantic interference in recognition memory, and with the possibility that this reflects age-related decline in pattern separation. PMID:25368576
Construct Validation of the Self-Efficacy Teaching and Knowledge Instrument for Science Teachers-Revised (SETAKIST-R): Lessons Learned

NASA Astrophysics Data System (ADS)

Pruski, Linda A.; Blanco, Sharon L.; Riggs, Rosemary A.; Grimes, Kandi K.; Fordtran, Chase W.; Barbola, Gina M.; Cornell, John E.; Lichtenstein, Michael J.

2013-11-01

Described herein is the academic lineage and independent validation of the Self-Efficacy Teaching and Knowledge Instrument for Science Teachers-Revised (SETAKIST-R). Data from 334 K-12 science teachers were analyzed using Partial Credit Rasch models. Principal components analysis on the person-item residuals suggest two latent dimensions: Knowledge and Teaching Self-Efficacies. Item-fit statistics were used to select items for each subscale. Person and item separation (reliability) indices were quite low, and we noted disordered response patterns on the person-item maps that revealed problems with item content and/or scaling for both subscales. These issues include the presence of: verbal negatives, ambiguous modifiers, counter-intuitive scaling, and an "undecided/uncertain" option. The SETAKIST-R, in its current form, cannot be recommended as a measure of science teacher self-efficacy.
Development of the Computer-Adaptive Version of the Late-Life Function and Disability Instrument

PubMed Central

Tian, Feng; Kopits, Ilona M.; Moed, Richard; Pardasaney, Poonam K.; Jette, Alan M.

2012-01-01

Background. Having psychometrically strong disability measures that minimize response burden is important in assessing of older adults. Methods. Using the original 48 items from the Late-Life Function and Disability Instrument and newly developed items, a 158-item Activity Limitation and a 62-item Participation Restriction item pool were developed. The item pools were administered to a convenience sample of 520 community-dwelling adults 60 years or older. Confirmatory factor analysis and item response theory were employed to identify content structure, calibrate items, and build the computer-adaptive testings (CATs). We evaluated real-data simulations of 10-item CAT subscales. We collected data from 102 older adults to validate the 10-item CATs against the Veteran’s Short Form-36 and assessed test–retest reliability in a subsample of 57 subjects. Results. Confirmatory factor analysis revealed a bifactor structure, and multi-dimensional item response theory was used to calibrate an overall Activity Limitation Scale (141 items) and an overall Participation Restriction Scale (55 items). Fit statistics were acceptable (Activity Limitation: comparative fit index = 0.95, Tucker Lewis Index = 0.95, root mean square error approximation = 0.03; Participation Restriction: comparative fit index = 0.95, Tucker Lewis Index = 0.95, root mean square error approximation = 0.05). Correlation of 10-item CATs with full item banks were substantial (Activity Limitation: r = .90; Participation Restriction: r = .95). Test–retest reliability estimates were high (Activity Limitation: r = .85; Participation Restriction r = .80). Strength and pattern of correlations with Veteran’s Short Form-36 subscales were as hypothesized. Each CAT, on average, took 3.56 minutes to administer. Conclusions. The Late-Life Function and Disability Instrument CATs demonstrated strong reliability, validity, accuracy, and precision. The Late-Life Function and Disability Instrument CAT can achieve psychometrically sound disability assessment in older persons while reducing respondent burden. Further research is needed to assess their ability to measure change in older adults. PMID:22546960
The Job Responsibilities Scale: Invariance in a Longitudinal Prospective Study.

ERIC Educational Resources Information Center

Ludlow, Larry H.; Lunz, Mary E.

1998-01-01

The degree of invariance of the Job Responsibilities Scale for medical technologists was studied for 1993 and 1995, conducting factor analyses of data from each year (1063 and 665 individuals, respectively). Nearly identical factor patterns were found, and Rasch rating scale analyses found nearly identical pairs of item estimates. Implications are…
Missing data in FFQs: making assumptions about item non-response.

PubMed

Lamb, Karen E; Olstad, Dana Lee; Nguyen, Cattram; Milte, Catherine; McNaughton, Sarah A

2017-04-01

FFQs are a popular method of capturing dietary information in epidemiological studies and may be used to derive dietary exposures such as nutrient intake or overall dietary patterns and diet quality. As FFQs can involve large numbers of questions, participants may fail to respond to all questions, leaving researchers to decide how to deal with missing data when deriving intake measures. The aim of the present commentary is to discuss the current practice for dealing with item non-response in FFQs and to propose a research agenda for reporting and handling missing data in FFQs. Single imputation techniques, such as zero imputation (assuming no consumption of the item) or mean imputation, are commonly used to deal with item non-response in FFQs. However, single imputation methods make strong assumptions about the missing data mechanism and do not reflect the uncertainty created by the missing data. This can lead to incorrect inference about associations between diet and health outcomes. Although the use of multiple imputation methods in epidemiology has increased, these have seldom been used in the field of nutritional epidemiology to address missing data in FFQs. We discuss methods for dealing with item non-response in FFQs, highlighting the assumptions made under each approach. Researchers analysing FFQs should ensure that missing data are handled appropriately and clearly report how missing data were treated in analyses. Simulation studies are required to enable systematic evaluation of the utility of various methods for handling item non-response in FFQs under different assumptions about the missing data mechanism.
Spontaneous generalization of abstract multimodal patterns in young domestic chicks.

PubMed

Versace, Elisabetta; Spierings, Michelle J; Caffini, Matteo; Ten Cate, Carel; Vallortigara, Giorgio

2017-05-01

From the early stages of life, learning the regularities associated with specific objects is crucial for making sense of experiences. Through filial imprinting, young precocial birds quickly learn the features of their social partners by mere exposure. It is not clear though to what extent chicks can extract abstract patterns of the visual and acoustic stimuli present in the imprinting object, and how they combine them. To investigate this issue, we exposed chicks (Gallus gallus) to three days of visual and acoustic imprinting, using either patterns with two identical items or patterns with two different items, presented visually, acoustically or in both modalities. Next, chicks were given a choice between the familiar and the unfamiliar pattern, present in either the multimodal, visual or acoustic modality. The responses to the novel stimuli were affected by their imprinting experience, and the effect was stronger for chicks imprinted with multimodal patterns than for the other groups. Interestingly, males and females adopted a different strategy, with males more attracted by unfamiliar patterns and females more attracted by familiar patterns. Our data show that chicks can generalize abstract patterns by mere exposure through filial imprinting and that multimodal stimulation is more effective than unimodal stimulation for pattern learning.
Measurement Invariance and the Five-Factor Model of Personality: Asian International and Euro American Cultural Groups.

PubMed

Rollock, David; Lui, P Priscilla

2016-10-01

This study examined measurement invariance of the NEO Five-Factor Inventory (NEO-FFI), assessing the five-factor model (FFM) of personality among Euro American (N = 290) and Asian international (N = 301) students (47.8% women, Mage = 19.69 years). The full 60-item NEO-FFI data fit the expected five-factor structure for both groups using exploratory structural equation modeling, and achieved configural invariance. Only 37 items significantly loaded onto the FFM-theorized factors for both groups and demonstrated metric invariance. Threshold invariance was not supported with this reduced item set. Groups differed the most in the item-factor relationships for Extraversion and Agreeableness, as well as in response styles. Asian internationals were more likely to use midpoint responses than Euro Americans. While the FFM can characterize broad nomothetic patterns of personality traits, metric invariance with only the subset of NEO-FFI items identified limits direct group comparisons of correlation coefficients among personality domains and with other constructs, and of mean differences on personality domains. © The Author(s) 2015.
Modeling Student Test-Taking Motivation in the Context of an Adaptive Achievement Test

ERIC Educational Resources Information Center

Wise, Steven L.; Kingsbury, G. Gage

2016-01-01

This study examined the utility of response time-based analyses in understanding the behavior of unmotivated test takers. For the data from an adaptive achievement test, patterns of observed rapid-guessing behavior and item response accuracy were compared to the behavior expected under several types of models that have been proposed to represent…
Attributions to Failure: The Effects of Effort, Ability, and Learning Strategy Use on Perceptions of Future Goals and Emotional Responses.

ERIC Educational Resources Information Center

Holschuh, Jodi Patrick; Nist, Sherrie L.; Olejnik, Stephen

2001-01-01

Examines college students' attributions to failure in an introductory biology course. Determines how males and females viewed the attributions of ability, effort, and learning strategy use. Concludes that collectively, results indicate differences in patterns of responses between future goal and emotional items. Notes the importance for…
Mixture IRT Model with a Higher-Order Structure for Latent Traits

ERIC Educational Resources Information Center

Huang, Hung-Yu

2017-01-01

Mixture item response theory (IRT) models have been suggested as an efficient method of detecting the different response patterns derived from latent classes when developing a test. In testing situations, multiple latent traits measured by a battery of tests can exhibit a higher-order structure, and mixtures of latent classes may occur on…
Do people with and without medical conditions respond similarly to the short health anxiety inventory? An assessment of differential item functioning using item response theory.

PubMed

LeBouthillier, Daniel M; Thibodeau, Michel A; Alberts, Nicole M; Hadjistavropoulos, Heather D; Asmundson, Gordon J G

2015-04-01

Individuals with medical conditions are likely to have elevated health anxiety; however, research has not demonstrated how medical status impacts response patterns on health anxiety measures. Measurement bias can undermine the validity of a questionnaire by overestimating or underestimating scores in groups of individuals. We investigated whether the Short Health Anxiety Inventory (SHAI), a widely-used measure of health anxiety, exhibits medical condition-based bias on item and subscale levels, and whether the SHAI subscales adequately assess the health anxiety continuum. Data were from 963 individuals with diabetes, breast cancer, or multiple sclerosis, and 372 healthy individuals. Mantel-Haenszel tests and item characteristic curves were used to classify the severity of item-level differential item functioning in all three medical groups compared to the healthy group. Test characteristic curves were used to assess scale-level differential item functioning and whether the SHAI subscales adequately assess the health anxiety continuum. Nine out of 14 items exhibited differential item functioning. Two items exhibited differential item functioning in all medical groups compared to the healthy group. In both Thought Intrusion and Fear of Illness subscales, differential item functioning was associated with mildly deflated scores in medical groups with very high levels of the latent traits. Fear of Illness items poorly discriminated between individuals with low and very low levels of the latent trait. While individuals with medical conditions may respond differentially to some items, clinicians and researchers can confidently use the SHAI with a variety of medical populations without concern of significant bias. Copyright © 2015 Elsevier Inc. All rights reserved.
Psychometric Properties of IRT Proficiency Estimates

ERIC Educational Resources Information Center

Kolen, Michael J.; Tong, Ye

2010-01-01

Psychometric properties of item response theory proficiency estimates are considered in this paper. Proficiency estimators based on summed scores and pattern scores include non-Bayes maximum likelihood and test characteristic curve estimators and Bayesian estimators. The psychometric properties investigated include reliability, conditional…
Predictors of maternal responsiveness.

PubMed

Drake, Emily E; Humenick, Sharron S; Amankwaa, Linda; Younger, Janet; Roux, Gayle

2007-01-01

To explore maternal responsiveness in the first 2 to 4 months after delivery and to evaluate potential predictors of maternal responsiveness, including infant feeding, maternal characteristics, and demographic factors such as age, socioeconomic status, and educational level. A cross-sectional survey design was used to assess the variables of maternal responsiveness, feeding patterns, and maternal characteristics in a convenience sample of 177 mothers in the first 2 to 4 months after delivery. The 60-item self-report instrument included scales to measure maternal responsiveness, self-esteem, and satisfaction with life as well as infant feeding questions and sociodemographic items. An online data-collection strategy was used, resulting in participants from 41 U.S. states. Multiple regression analysis showed that satisfaction with life, self-esteem, and number of children, but not breastfeeding, explained a significant portion of the variance in self-reported maternal responsiveness scores. In this analysis, sociodemographic variables such as age, education, income, and work status showed little or no relationship to maternal responsiveness scores. This study provides additional information about patterns of maternal behavior in the transition to motherhood and some of the variables that influence that transition. Satisfaction with life was a new predictor of maternal responsiveness. However, with only 15% of the variance explained by the predictors in this study, a large portion of the variance in maternal responsiveness remains unexplained. Further research in this area is needed.
Clusters of cultures: diversity in meaning of family value and gender role items across Europe.

PubMed

van Vlimmeren, Eva; Moors, Guy B D; Gelissen, John P T M

2017-01-01

Survey data are often used to map cultural diversity by aggregating scores of attitude and value items across countries. However, this procedure only makes sense if the same concept is measured in all countries. In this study we argue that when (co)variances among sets of items are similar across countries, these countries share a common way of assigning meaning to the items. Clusters of cultures can then be observed by doing a cluster analysis on the (co)variance matrices of sets of related items. This study focuses on family values and gender role attitudes. We find four clusters of cultures that assign a distinct meaning to these items, especially in the case of gender roles. Some of these differences reflect response style behavior in the form of acquiescence. Adjusting for this style effect impacts on country comparisons hence demonstrating the usefulness of investigating the patterns of meaning given to sets of items prior to aggregating scores into cultural characteristics.
Fighting bias with statistics: Detecting gender differences in responses to items on a preschool science assessment

NASA Astrophysics Data System (ADS)

Greenberg, Ariela Caren

Differential item functioning (DIF) and differential distractor functioning (DDF) are methods used to screen for item bias (Camilli & Shepard, 1994; Penfield, 2008). Using an applied empirical example, this mixed-methods study examined the congruency and relationship of DIF and DDF methods in screening multiple-choice items. Data for Study I were drawn from item responses of 271 female and 236 male low-income children on a preschool science assessment. Item analyses employed a common statistical approach of the Mantel-Haenszel log-odds ratio (MH-LOR) to detect DIF in dichotomously scored items (Holland & Thayer, 1988), and extended the approach to identify DDF (Penfield, 2008). Findings demonstrated that the using MH-LOR to detect DIF and DDF supported the theoretical relationship that the magnitude and form of DIF and are dependent on the DDF effects, and demonstrated the advantages of studying DIF and DDF in multiple-choice items. A total of 4 items with DIF and DDF and 5 items with only DDF were detected. Study II incorporated an item content review, an important but often overlooked and under-published step of DIF and DDF studies (Camilli & Shepard). Interviews with 25 female and 22 male low-income preschool children and an expert review helped to interpret the DIF and DDF results and their comparison, and determined that a content review process of studied items can reveal reasons for potential item bias that are often congruent with the statistical results. Patterns emerged and are discussed in detail. The quantitative and qualitative analyses were conducted in an applied framework of examining the validity of the preschool science assessment scores for evaluating science programs serving low-income children, however, the techniques can be generalized for use with measures across various disciplines of research.
Distributed patterns of activity in sensory cortex reflect the precision of multiple items maintained in visual short-term memory.

PubMed

Emrich, Stephen M; Riggall, Adam C; Larocque, Joshua J; Postle, Bradley R

2013-04-10

Traditionally, load sensitivity of sustained, elevated activity has been taken as an index of storage for a limited number of items in visual short-term memory (VSTM). Recently, studies have demonstrated that the contents of a single item held in VSTM can be decoded from early visual cortex, despite the fact that these areas do not exhibit elevated, sustained activity. It is unknown, however, whether the patterns of neural activity decoded from sensory cortex change as a function of load, as one would expect from a region storing multiple representations. Here, we use multivoxel pattern analysis to examine the neural representations of VSTM in humans across multiple memory loads. In an important extension of previous findings, our results demonstrate that the contents of VSTM can be decoded from areas that exhibit a transient response to visual stimuli, but not from regions that exhibit elevated, sustained load-sensitive delay-period activity. Moreover, the neural information present in these transiently activated areas decreases significantly with increasing load, indicating load sensitivity of the patterns of activity that support VSTM maintenance. Importantly, the decrease in classification performance as a function of load is correlated with within-subject changes in mnemonic resolution. These findings indicate that distributed patterns of neural activity in putatively sensory visual cortex support the representation and precision of information in VSTM.
Retrieval orientation and the control of recollection: an fMRI study.

PubMed

Morcom, Alexa M; Rugg, Michael D

2012-12-01

This study used event-related fMRI to examine the impact of the adoption of different retrieval orientations on the neural correlates of recollection. In each of two study-test blocks, participants encoded a mixed list of words and pictures and then performed a recognition memory task with words as the test items. In one block, the requirement was to respond positively to test items corresponding to studied words and to reject both new items and items corresponding to the studied pictures. In the other block, positive responses were made to test items corresponding to pictures, and items corresponding to words were classified along with the new items. On the basis of previous ERP findings, we predicted that in the word task, recollection-related effects would be found for target information only. This prediction was fulfilled. In both tasks, targets elicited the characteristic pattern of recollection-related activity. By contrast, nontargets elicited this pattern in the picture task, but not in the word task. Importantly, the left angular gyrus was among the regions demonstrating this dissociation of nontarget recollection effects according to retrieval orientation. The findings for the angular gyrus parallel prior findings for the "left-parietal" ERP old/new effect and add to the evidence that the effect reflects recollection-related neural activity originating in left ventral parietal cortex. Thus, the results converge with the previous ERP findings to suggest that the processing of retrieval cues can be constrained to prevent the retrieval of goal-irrelevant information.

Using item response theory to address vulnerabilities in FFQ.

PubMed

Kazman, Josh B; Scott, Jonathan M; Deuster, Patricia A

2017-09-01

The limitations for self-reporting of dietary patterns are widely recognised as a major vulnerability of FFQ and the dietary screeners/scales derived from FFQ. Such instruments can yield inconsistent results to produce questionable interpretations. The present article discusses the value of psychometric approaches and standards in addressing these drawbacks for instruments used to estimate dietary habits and nutrient intake. We argue that a FFQ or screener that treats diet as a 'latent construct' can be optimised for both internal consistency and the value of the research results. Latent constructs, a foundation for item response theory (IRT)-based scales (e.g. Patient Reported Outcomes Measurement Information System) are typically introduced in the design stage of an instrument to elicit critical factors that cannot be observed or measured directly. We propose an iterative approach that uses such modelling to refine FFQ and similar instruments. To that end, we illustrate the benefits of psychometric modelling by using items and data from a sample of 12 370 Soldiers who completed the 2012 US Army Global Assessment Tool (GAT). We used factor analysis to build the scale incorporating five out of eleven survey items. An IRT-driven assessment of response category properties indicates likely problems in the ordering or wording of several response categories. Group comparisons, examined with differential item functioning (DIF), provided evidence of scale validity across each Army sub-population (sex, service component and officer status). Such an approach holds promise for future FFQ.
The effects of age on remembering and knowing misinformation.

PubMed

Saunders, Jo; Jess, Alice

2010-01-01

Previous research has suggested that older adults are more susceptible to misleading information. The current experiments examined the nature of older and younger participants' conscious experience of contradictory and additive misinformation (Experiment 1), and misinformation about a memorable or non-memorable item (Experiment 2). Participants watched a video of a burglary before answering questions about the event that contained misinformation. Participants then completed a cued recall task whereby they answered questions and indicated whether they remembered the item, knew the item, or were guessing. The results indicated that older adults were less likely to remember or know the original item in comparison to younger adults but were also more likely to know misinformation than younger adults. This pattern occurred for contradictory misinformation and misleading information about memorable and non-memorable items. Only additive misinformation was associated with more remember responses for older but not younger adults.
More Reasons to be Straightforward: Findings and Norms for Two Scales Relevant to Social Anxiety

PubMed Central

Rodebaugh, Thomas L.; Heimberg, Richard G.; Brown, Patrick J.; Fernandez, Katya C.; Blanco, Carlos; Schneier, Franklin R.; Liebowitz, Michael R.

2011-01-01

The validity of both the Social Interaction Anxiety Scale and Brief Fear of Negative Evaluation scale has been well-supported, yet the scales have a small number of reverse-scored items that may detract from the validity of their total scores. The current study investigates two characteristics of participants that may be associated with compromised validity of these items: higher age and lower levels of education. In community and clinical samples, the validity of each scale's reverse-scored items was moderated by age, years of education, or both. The straightforward items did not show this pattern. To encourage the use of the straightforward items of these scales, we provide normative data from the same samples as well as two large student samples. We contend that although response bias can be a substantial problem, the reverse-scored questions of these scales do not solve that problem and instead decrease overall validity. PMID:21388781
Non-ignorable missingness item response theory models for choice effects in examinee-selected items.

PubMed

Liu, Chen-Wei; Wang, Wen-Chung

2017-11-01

Examinee-selected item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set, always yields incomplete data (i.e., when only the selected items are answered, data are missing for the others) that are likely non-ignorable in likelihood inference. Standard item response theory (IRT) models become infeasible when ESI data are missing not at random (MNAR). To solve this problem, the authors propose a two-dimensional IRT model that posits one unidimensional IRT model for observed data and another for nominal selection patterns. The two latent variables are assumed to follow a bivariate normal distribution. In this study, the mirt freeware package was adopted to estimate parameters. The authors conduct an experiment to demonstrate that ESI data are often non-ignorable and to determine how to apply the new model to the data collected. Two follow-up simulation studies are conducted to assess the parameter recovery of the new model and the consequences for parameter estimation of ignoring MNAR data. The results of the two simulation studies indicate good parameter recovery of the new model and poor parameter recovery when non-ignorable missing data were mistakenly treated as ignorable. © 2017 The British Psychological Society.
Teacher perspectives after implementing a human sexuality education program.

PubMed

Gingiss, P L; Hamilton, R

1989-12-01

To help teachers enhance the effectiveness of their classroom instruction in human sexuality education, it is necessary to understand their attitudes and concerns about their teaching experiences. Forty-seven sixth grade teachers were surveyed one year after curriculum implementation to examine perceptions of themselves, their students, colleagues, and community. Teachers answered 70% of the knowledge items correctly and indicated slightly liberal orientations. Overall levels of teachers' views generally were positive on scales designed to measure: importance of the items studied, responsibility for student outcomes, three measures of comfort, adequacy of preparation, required changes, ease of use, social supports, and student responses. However, patterns of teacher responses within scales indicated numerous concerns related to curriculum implementation. The concerns and teacher-identified benefits and barriers to teaching the course indicate a focus for continuing education.
The modified Memorial Symptom Assessment Scale Short Form: a modified response format and rational scoring rules.

PubMed

Sharp, J L; Gough, K; Pascoe, M C; Drosdowsky, A; Chang, V T; Schofield, P

2018-07-01

The Memorial Symptom Assessment Scale Short Form (MSAS-SF) is a widely used symptom assessment instrument. Patients who self-complete the MSAS-SF have difficulty following the two-part response format, resulting in incorrectly completed responses. We describe modifications to the response format to improve useability, and rational scoring rules for incorrectly completed items. The modified MSAS-SF was completed by 311 women in our Peer and Nurse support Trial to Assist women in Gynaecological Oncology; the PeNTAGOn study. Descriptive statistics were used to summarise completion of the modified MSAS-SF, and provide symptom statistics before and after applying the rational scoring rules. Spearman's correlations with the Functional Assessment for Cancer Therapy-General (FACT-G) and Hospital Anxiety and Depression Scale (HADS) were assessed. Correct completion of the modified MSAS-SF items ranged from 91.5 to 98.7%. The rational scoring rules increased the percentage of useable responses on average 4% across all symptoms. MSAS-SF item statistics were similar with and without the scoring rules. The pattern of correlations with FACT-G and HADS was compatible with prior research. The modified MSAS-SF was useable for self-completion and responses demonstrated validity. The rational scoring rules can minimise loss of data from incorrectly completed responses. Further investigation is recommended.
MMPI-2 Item Endorsements in Dissociative Identity Disorder vs. Simulators.

PubMed

Brand, Bethany L; Chasson, Gregory S; Palermo, Cori A; Donato, Frank M; Rhodes, Kyle P; Voorhees, Emily F

2016-03-01

Elevated scores on some MMPI-2 (Minnesota Multiphasic Inventory-2) validity scales are common among patients with dissociative identity disorder (DID), which raises questions about the validity of their responses. Such patients show elevated scores on atypical answers (F), F-psychopathology (Fp), atypical answers in the second half of the test (FB), schizophrenia (Sc), and depression (D) scales, with Fp showing the greatest utility in distinguishing them from coached and uncoached DID simulators. In the current study, we investigated the items on the MMPI-2 F, Fp, FB, Sc, and D scales that were most and least commonly endorsed by participants with DID in our 2014 study and compared these responses with those of coached and uncoached DID simulators. The comparisons revealed that patients with DID most frequently endorsed items related to dissociation, trauma, depression, fearfulness, conflict within family, and self-destructiveness. The coached group more successfully imitated item endorsements of the DID group than did the uncoached group. However, both simulating groups, especially the uncoached group, frequently endorsed items that were uncommonly endorsed by the DID group. The uncoached group endorsed items consistent with popular media portrayals of people with DID being violent, delusional, and unlawful. These results suggest that item endorsement patterns can provide useful information to clinicians making determinations about whether an individual is presenting with DID or feigning. © 2016 American Academy of Psychiatry and the Law.
The emotion regulation questionnaire in women with cancer: A psychometric evaluation and an item response theory analysis.

PubMed

Brandão, Tânia; Schulz, Marc S; Gross, James J; Matos, Paula Mena

2017-10-01

Emotion regulation is thought to play an important role in adaptation to cancer. However, the emotion regulation questionnaire (ERQ), a widely used instrument to assess emotion regulation, has not yet been validated in this context. This study addresses this gap by examining the psychometric properties of the ERQ in a sample of Portuguese women with cancer. The ERQ was administered to 204 women with cancer (mean age = 48.89 years, SD = 7.55). Confirmatory factor analysis and item response theory analysis were used to examine psychometric properties of the ERQ. Confirmatory factor analysis confirmed the 2-factor solution proposed by the original authors (expressive suppression and cognitive reappraisal). This solution was invariant across age and type of cancer. Item response theory analyses showed that all items were moderately to highly discriminant and that items are better suited for identifying moderate levels of expressive suppression and cognitive reappraisal. Support was found for the internal consistency and test-retest reliability of the ERQ. The pattern of relationships with emotional control, alexithymia, emotional self-efficacy, attachment, and quality of life provided evidence of the convergent and concurrent validity for both dimensions of the ERQ. Overall, the ERQ is a psychometrically sound approach for assessing emotion regulation strategies in the oncological context. Clinical implications are discussed. Copyright © 2016 John Wiley & Sons, Ltd.
Three-dimensional structural representation of the sleep-wake adaptability.

PubMed

Putilov, Arcady A

2016-01-01

Various characteristics of the sleep-wake cycle can determine the success or failure of individual adjustment to certain temporal conditions of the today's society. However, it remains to be explored how many such characteristics can be self-assessed and how they are inter-related one to another. The aim of the present report was to apply a three-dimensional structural representation of the sleep-wake adaptability in the form of "rugby cake" (scalene or triaxial ellipsoid) to explain the results of analysis of the pattern of correlations of the responses to the initial 320-item list of a new inventory with scores on the six scales designed for multidimensional self-assessment of the sleep-wake adaptability (Morning and Evening Lateness, Anytime and Nighttime Sleepability, and Anytime and Daytime Wakeability). The results obtained for sample consisting of 149 respondents were confirmed by the results of similar analysis of earlier collected responses of 139 respondents to the same list of 320 items and responses of 1213 respondents to the 72 items of one of the earlier established questionnaire tools. Empirical evidence was provided in support of the model-driven prediction of the possibility to identify items linked to as many as 36 narrow (6 core and 30 mixed) adaptabilities of the sleep-wake cycle. The results enabled the selection of 168 items for self-assessment of all these adaptabilities predicted by the rugby cake model.
Cross-ethnic measurement equivalence of measures of depression, social anxiety, and worry.

PubMed

Hambrick, James P; Rodebaugh, Thomas L; Balsis, Steve; Woods, Carol M; Mendez, Julia L; Heimberg, Richard G

2010-06-01

Although study of clinical phenomena in individuals from different ethnic backgrounds has improved over the years, African American and Asian American individuals continue to be underrepresented in research samples. Without adequate psychometric data about how questionnaires perform in individuals from different ethnic samples, findings from both within and across groups are arguably uninterpretable. Analyses based on item response theory (IRT) allow us to make fine-grained comparisons of the ways individuals from different ethnic groups respond to clinical measures. This study compared response patterns of African American and Asian American undergraduates to White undergraduates on measures of depression, social anxiety, and worry. On the Beck Depression Inventory-II, response patterns for African American participants were roughly equivalent to the response patterns of White participants. On measures of worry and social anxiety, there were substantial differences, suggesting that the use of these measures in African American and Asian American populations may lead to biased conclusions.
Well-being as a moving target: measurement equivalence of the Bradburn Affect Balance Scale.

PubMed

Maitland, S B; Dixon, R A; Hultsch, D F; Hertzog, C

2001-03-01

Although the Bradburn Affect Balance scale (ABS) is a frequently used two-factor indicator of well-being in later life, its measurement and invariance properties are not well documented. We examined these issues using confirmatory factor analyses of cross-sectional (adults ages 54-87 years) and longitudinal data from the Victoria Longitudinal Study. Stability of the positive and negative affect factors was moderate across a 3-year period. Overall, factor loadings for positive affect items were invariant over time with the exception of the pleased item. Negative affect items were time invariant. However, age-group comparisons between young-old and old-old groups revealed age differences in loadings for the upset item at Time 1. Finally, gender groups differed in loadings for the top of the world and going your way items. Thus a pattern of partial measurement equivalence characterized item response to the ABS. Our results suggest that group comparisons and longitudinal change in ABS scale scores of positive and negative affect should be interpreted with caution.
Reinstatement of Individual Past Events Revealed by the Similarity of Distributed Activation Patterns during Encoding and Retrieval

PubMed Central

Wing, Erik A.; Ritchey, Maureen; Cabeza, Roberto

2015-01-01

Neurobiological memory models assume memory traces are stored in neocortex, with pointers in the hippocampus, and are then reactivated during retrieval, yielding the experience of remembering. Whereas most prior neuroimaging studies on reactivation have focused on the reactivation of sets or categories of items, the current study sought to identify cortical patterns pertaining to memory for individual scenes. During encoding, participants viewed pictures of scenes paired with matching labels (e.g., “barn,” “tunnel”), and, during retrieval, they recalled the scenes in response to the labels and rated the quality of their visual memories. Using representational similarity analyses, we interrogated the similarity between activation patterns during encoding and retrieval both at the item level (individual scenes) and the set level (all scenes). The study yielded four main findings. First, in occipitotemporal cortex, memory success increased with encoding-retrieval similarity (ERS) at the item level but not at the set level, indicating the reactivation of individual scenes. Second, in ventrolateral pFC, memory increased with ERS for both item and set levels, indicating the recapitulation of memory processes that benefit encoding and retrieval of all scenes. Third, in retrosplenial/posterior cingulate cortex, ERS was sensitive to individual scene information irrespective of memory success, suggesting automatic activation of scene contexts. Finally, consistent with neurobiological models, hippocampal activity during encoding predicted the subsequent reactivation of individual items. These findings show the promise of studying memory with greater specificity by isolating individual mnemonic representations and determining their relationship to factors like the detail with which past events are remembered. PMID:25313659
Trends in public perceptions and preferences on energy and environmental policy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Farhar, B.C.

1993-02-01

This report presents selected results from a secondary analysis of public opinion surveys, taken at the national and state/local levels, relevant to energy and environmental policy choices. The data base used in the analysis includes about 2000 items from nearly 600 separate surveys conducted between 1979 and 1992. Answers to word-for-word questions were traced over time, permitting trend analysis. Patterns of response were also identified for findings from similarly worded survey items. The analysis identifies changes in public opinion concerning energy during the past 10 to 15 years.
Effect of response format on cognitive reflection: Validating a two- and four-option multiple choice question version of the Cognitive Reflection Test.

PubMed

Sirota, Miroslav; Juanchich, Marie

2018-03-27

The Cognitive Reflection Test, measuring intuition inhibition and cognitive reflection, has become extremely popular because it reliably predicts reasoning performance, decision-making, and beliefs. Across studies, the response format of CRT items sometimes differs, based on the assumed construct equivalence of tests with open-ended versus multiple-choice items (the equivalence hypothesis). Evidence and theoretical reasons, however, suggest that the cognitive processes measured by these response formats and their associated performances might differ (the nonequivalence hypothesis). We tested the two hypotheses experimentally by assessing the performance in tests with different response formats and by comparing their predictive and construct validity. In a between-subjects experiment (n = 452), participants answered stem-equivalent CRT items in an open-ended, a two-option, or a four-option response format and then completed tasks on belief bias, denominator neglect, and paranormal beliefs (benchmark indicators of predictive validity), as well as on actively open-minded thinking and numeracy (benchmark indicators of construct validity). We found no significant differences between the three response formats in the numbers of correct responses, the numbers of intuitive responses (with the exception of the two-option version, which had a higher number than the other tests), and the correlational patterns of the indicators of predictive and construct validity. All three test versions were similarly reliable, but the multiple-choice formats were completed more quickly. We speculate that the specific nature of the CRT items helps build construct equivalence among the different response formats. We recommend using the validated multiple-choice version of the CRT presented here, particularly the four-option CRT, for practical and methodological reasons. Supplementary materials and data are available at https://osf.io/mzhyc/ .
Comparison of Self-Reported Telephone Interviewing and Web-Based Survey Responses: Findings From the Second Australian Young and Well National Survey

PubMed Central

Davenport, Tracey A; Burns, Jane M; Hickie, Ian B

2017-01-01

Background Web-based self-report surveying has increased in popularity, as it can rapidly yield large samples at a low cost. Despite this increase in popularity, in the area of youth mental health, there is a distinct lack of research comparing the results of Web-based self-report surveys with the more traditional and widely accepted computer-assisted telephone interviewing (CATI). Objective The Second Australian Young and Well National Survey 2014 sought to compare differences in respondent response patterns using matched items on CATI versus a Web-based self-report survey. The aim of this study was to examine whether responses varied as a result of item sensitivity, that is, the item’s susceptibility to exaggeration on underreporting and to assess whether certain subgroups demonstrated this effect to a greater extent. Methods A subsample of young people aged 16 to 25 years (N=101), recruited through the Second Australian Young and Well National Survey 2014, completed the identical items on two occasions: via CATI and via Web-based self-report survey. Respondents also rated perceived item sensitivity. Results When comparing CATI with the Web-based self-report survey, a Wilcoxon signed-rank analysis showed that respondents answered 14 of the 42 matched items in a significantly different way. Significant variation in responses (CATI vs Web-based) was more frequent if the item was also rated by the respondents as highly sensitive in nature. Specifically, 63% (5/8) of the high sensitivity items, 43% (3/7) of the neutral sensitivity items, and 0% (0/4) of the low sensitivity items were answered in a significantly different manner by respondents when comparing their matched CATI and Web-based question responses. The items that were perceived as highly sensitive by respondents and demonstrated response variability included the following: sexting activities, body image concerns, experience of diagnosis, and suicidal ideation. For high sensitivity items, a regression analysis showed respondents who were male (beta=−.19, P=.048) or who were not in employment, education, or training (NEET; beta=−.32, P=.001) were significantly more likely to provide different responses on matched items when responding in the CATI as compared with the Web-based self-report survey. The Web-based self-report survey, however, demonstrated some evidence of avidity and attrition bias. Conclusions Compared with CATI, Web-based self-report surveys are highly cost-effective and had higher rates of self-disclosure on sensitive items, particularly for respondents who identify as male and NEET. A drawback to Web-based surveying methodologies, however, includes the limited control over avidity bias and the greater incidence of attrition bias. These findings have important implications for further development of survey methods in the area of health and well-being, especially when considering research topics (in this case diagnosis, suicidal ideation, sexting, and body image) and groups that are being recruited (young people, males, and NEET). PMID:28951382
Retrieval orientation and the control of recollection: an fMRI study

PubMed Central

Morcom, Alexa M.; Rugg, Michael D.

2012-01-01

The present study used event-related fMRI to examine the impact of the adoption of different retrieval orientations on the neural correlates of recollection. In each of two study-test blocks, subjects encoded a mixed list of words and pictures, and then performed a recognition memory task with words as the test items. In one block, the requirement was to respond positively to test items corresponding to studied words, and to reject both new items and items corresponding to the studied pictures. In the other block, positive responses were made to test items corresponding to pictures, and items corresponding to words were classified along with the new items. Based on previous event-related potential (ERP) findings, we predicted that in the word task, recollection-related effects would be found for target information only. This prediction was fulfilled. In both tasks, targets elicited the characteristic pattern of recollection-related activity. By contrast, non-targets elicited this pattern in the picture task, but not in the word task. Importantly, the left angular gyrus was among the regions demonstrating this dissociation of non-target recollection effects according to retrieval orientation. The findings for the angular gyrus parallel prior findings for the `left-parietal' ERP old/new effect, and add to the evidence that the effect reflects recollection-related neural activity originating in left ventral parietal cortex. Thus, the results converge with the previous ERP findings to suggest that the processing of retrieval cues can be constrained to prevent the retrieval of goal-irrelevant information. PMID:23110678
Procrastination Revisited: The Constructive Use of Delayed Response.

ERIC Educational Resources Information Center

Subotnik, Rena F.; And Others

This study investigated patterns of procrastination in the domains of health, relationships, employment, and creative outlets in 19 former Westinghouse Science Talent Search winners, age 32 years. A model was synthesized from the available literature and an interview schedule of 14 open-ended items was developed to elicit self-assessments of…
Handling Dynamic Weights in Weighted Frequent Pattern Mining

NASA Astrophysics Data System (ADS)

Ahmed, Chowdhury Farhan; Tanbeer, Syed Khairuzzaman; Jeong, Byeong-Soo; Lee, Young-Koo

Even though weighted frequent pattern (WFP) mining is more effective than traditional frequent pattern mining because it can consider different semantic significances (weights) of items, existing WFP algorithms assume that each item has a fixed weight. But in real world scenarios, the weight (price or significance) of an item can vary with time. Reflecting these changes in item weight is necessary in several mining applications, such as retail market data analysis and web click stream analysis. In this paper, we introduce the concept of a dynamic weight for each item, and propose an algorithm, DWFPM (dynamic weighted frequent pattern mining), that makes use of this concept. Our algorithm can address situations where the weight (price or significance) of an item varies dynamically. It exploits a pattern growth mining technique to avoid the level-wise candidate set generation-and-test methodology. Furthermore, it requires only one database scan, so it is eligible for use in stream data mining. An extensive performance analysis shows that our algorithm is efficient and scalable for WFP mining using dynamic weights.
A Mixed Effects Randomized Item Response Model

ERIC Educational Resources Information Center

Fox, J.-P.; Wyrick, Cheryl

2008-01-01

The randomized response technique ensures that individual item responses, denoted as true item responses, are randomized before observing them and so-called randomized item responses are observed. A relationship is specified between randomized item response data and true item response data. True item response data are modeled with a (non)linear…
An examination of gender bias on the eighth-grade MEAP science test as it relates to the Hunter Gatherer Theory of Spatial Sex Differences

NASA Astrophysics Data System (ADS)

Armstrong-Hall, Judy Gail

The purpose of this study was to apply the Hunter-Gatherer Theory of sex spatial skills to responses to individual questions by eighth grade students on the Science component of the Michigan Educational Assessment Program (MEAP) to determine if sex bias was inherent in the test. The Hunter-Gatherer Theory on Spatial Sex Differences, an original theory, that suggested a spatial dimorphism concept with female spatial skill of pattern recall of unconnected items and male spatial skills requiring mental movement. This is the first attempt to apply the Hunter-Gatherer Theory on Spatial Sex Differences to a standardized test. An overall hypothesis suggested that the Hunter-Gatherer Theory of Spatial Sex Differences could predict that males would perform better on problems involving mental movement and females would do better on problems involving the pattern recall of unconnected items. Responses to questions on the 1994-95 MEAP requiring the use of male spatial skills and female spatial skills were analyzed for 5,155 eighth grade students. A panel composed of five educators and a theory developer determined which test items involved the use of male and female spatial skills. A MANOVA, using a random sample of 20% of the 5,155 students to compare male and female correct scores, was statistically significant, with males having higher scores on male spatial skills items and females having higher scores on female spatial skills items. Pearson product moment correlation analyses produced a positive correlation for both male and female performance on both types of spatial skills. The Hunter-Gatherer Theory of Spatial Sex Differences appears to be able to predict that males could perform better on the problems involving mental movement and females could perform better on problems involving the pattern recall of unconnected items. Recommendations for further research included: examination of male/female spatial skill differences at early elementary and high school levels to determine impact of gender on difficulties in solving spatial problems; investigation of the relationship between dominant female spatial skills for students diagnosed with ADHD; study effects of teaching male spatial skills to female students starting in early elementary school to determine the effect on standardized testing.

Comparison of Self-Reported Telephone Interviewing and Web-Based Survey Responses: Findings From the Second Australian Young and Well National Survey.

PubMed

Milton, Alyssa C; Ellis, Louise A; Davenport, Tracey A; Burns, Jane M; Hickie, Ian B

2017-09-26

Web-based self-report surveying has increased in popularity, as it can rapidly yield large samples at a low cost. Despite this increase in popularity, in the area of youth mental health, there is a distinct lack of research comparing the results of Web-based self-report surveys with the more traditional and widely accepted computer-assisted telephone interviewing (CATI). The Second Australian Young and Well National Survey 2014 sought to compare differences in respondent response patterns using matched items on CATI versus a Web-based self-report survey. The aim of this study was to examine whether responses varied as a result of item sensitivity, that is, the item's susceptibility to exaggeration on underreporting and to assess whether certain subgroups demonstrated this effect to a greater extent. A subsample of young people aged 16 to 25 years (N=101), recruited through the Second Australian Young and Well National Survey 2014, completed the identical items on two occasions: via CATI and via Web-based self-report survey. Respondents also rated perceived item sensitivity. When comparing CATI with the Web-based self-report survey, a Wilcoxon signed-rank analysis showed that respondents answered 14 of the 42 matched items in a significantly different way. Significant variation in responses (CATI vs Web-based) was more frequent if the item was also rated by the respondents as highly sensitive in nature. Specifically, 63% (5/8) of the high sensitivity items, 43% (3/7) of the neutral sensitivity items, and 0% (0/4) of the low sensitivity items were answered in a significantly different manner by respondents when comparing their matched CATI and Web-based question responses. The items that were perceived as highly sensitive by respondents and demonstrated response variability included the following: sexting activities, body image concerns, experience of diagnosis, and suicidal ideation. For high sensitivity items, a regression analysis showed respondents who were male (beta=-.19, P=.048) or who were not in employment, education, or training (NEET; beta=-.32, P=.001) were significantly more likely to provide different responses on matched items when responding in the CATI as compared with the Web-based self-report survey. The Web-based self-report survey, however, demonstrated some evidence of avidity and attrition bias. Compared with CATI, Web-based self-report surveys are highly cost-effective and had higher rates of self-disclosure on sensitive items, particularly for respondents who identify as male and NEET. A drawback to Web-based surveying methodologies, however, includes the limited control over avidity bias and the greater incidence of attrition bias. These findings have important implications for further development of survey methods in the area of health and well-being, especially when considering research topics (in this case diagnosis, suicidal ideation, sexting, and body image) and groups that are being recruited (young people, males, and NEET). ©Alyssa C Milton, Louise A Ellis, Tracey A Davenport, Jane M Burns, Ian B Hickie. Originally published in JMIR Mental Health (http://mental.jmir.org), 26.09.2017.
Dietary patterns and whole grain cereals in the Scandinavian countries--differences and similarities. The HELGA project.

PubMed

Engeset, Dagrun; Hofoss, Dag; Nilsson, Lena M; Olsen, Anja; Tjønneland, Anne; Skeie, Guri

2015-04-01

To identify dietary patterns with whole grains as a main focus to see if there is a similar whole grain pattern in the three Scandinavian countries; Denmark, Sweden and Norway. Another objective is to see if items suggested for a Nordic Food Index will form a typical Nordic pattern when using factor analysis. The HELGA study population is based on samples of existing cohorts: the Norwegian Women and Cancer Study, the Swedish Västerbotten cohort and the Danish Diet, Cancer and Health study. The HELGA study aims to generate knowledge about the health effects of whole grain foods. The study included a total of 119 913 participants. The associations among food variables from FFQ were investigated by principal component analysis. Only food groups common for all three cohorts were included. High factor loading of a food item shows high correlation of the item to the specific diet pattern. The main whole grain for Denmark and Sweden was rye, while Norway had highest consumption of wheat. Three similar patterns were found: a cereal pattern, a meat pattern and a bread pattern. However, even if the patterns look similar, the food items belonging to the patterns differ between countries. High loadings on breakfast cereals and whole grain oat were common in the cereal patterns for all three countries. Thus, the cereal pattern may be considered a common Scandinavian whole grain pattern. Food items belonging to a Nordic Food Index were distributed between different patterns.
Qualitative investigation of students' views about experimental physics

NASA Astrophysics Data System (ADS)

Hu, Dehui; Zwickl, Benjamin M.; Wilcox, Bethany R.; Lewandowski, H. J.

2017-12-01

This study examines students' reasoning surrounding seemingly contradictory Likert-scale responses within five items in the Colorado Learning Attitudes About Science Survey for Experimental Physics (E-CLASS). We administered the E-CLASS with embedded open-ended prompts, which asked students to provide explanations after making a Likert-scale selection. The quantitative scores on those items showed that our sample of the 216 students enrolled in first year and beyond first year physics courses demonstrated the same trends as previous national data. A qualitative analysis of students' open-ended responses was used to examine common reasoning patterns related to particular Likert-scale responses. When explaining responses to items regarding the role of experiments in confirming known results and also contributing to the growth of scientific knowledge, a common reasoning pattern suggested that confirming known results in a classroom experiment can help with understanding concepts. Thus, physics experiments contribute to students' personal scientific knowledge growth, while also confirming widely known results. Many students agreed that having correct formatting and making well-reasoned conclusions are the main goal for communicating experimental results. Students who focused on sections and formatting emphasized how it enables clear and efficient communication. However, very few students discussed the link between well-reasoned conclusions and effective scientific communication. Lastly, many students argued it was possible to complete experiments without understanding equations and physics concepts. The most common justification was that they could simply follow instructions to finish the lab without understanding. The findings suggest several implications for teaching physics laboratory courses, for example, incorporating some lab activities with outcomes that are unknown to the students might have a significant impact on students' understanding of experiments as an important approach for developing scientific knowledge.
The Effect of Mental Rotation on Surgical Pathological Diagnosis.

PubMed

Park, Heejung; Kim, Hyun Soo; Cha, Yoon Jin; Choi, Junjeong; Minn, Yangki; Kim, Kyung Sik; Kim, Se Hoon

2018-05-01

Pathological diagnosis involves very delicate and complex consequent processing that is conducted by a pathologist. The recognition of false patterns might be an important cause of misdiagnosis in the field of surgical pathology. In this study, we evaluated the influence of visual and cognitive bias in surgical pathologic diagnosis, focusing on the influence of "mental rotation." We designed three sets of the same images of uterine cervix biopsied specimens (original, left to right mirror images, and 180-degree rotated images), and recruited 32 pathologists to diagnose the 3 set items individually. First, the items found to be adequate for analysis by classical test theory, Generalizability theory, and item response theory. The results showed statistically no differences in difficulty, discrimination indices, and response duration time between the image sets. Mental rotation did not influence the pathologists' diagnosis in practice. Interestingly, outliers were more frequent in rotated image sets, suggesting that the mental rotation process may influence the pathological diagnoses of a few individual pathologists. © Copyright: Yonsei University College of Medicine 2018.
More reasons to be straightforward: findings and norms for two scales relevant to social anxiety.

PubMed

Rodebaugh, Thomas L; Heimberg, Richard G; Brown, Patrick J; Fernandez, Katya C; Blanco, Carlos; Schneier, Franklin R; Liebowitz, Michael R

2011-06-01

The validity of both the Social Interaction Anxiety Scale and Brief Fear of Negative Evaluation scale has been well-supported, yet the scales have a small number of reverse-scored items that may detract from the validity of their total scores. The current study investigates two characteristics of participants that may be associated with compromised validity of these items: higher age and lower levels of education. In community and clinical samples, the validity of each scale's reverse-scored items was moderated by age, years of education, or both. The straightforward items did not show this pattern. To encourage the use of the straightforward items of these scales, we provide normative data from the same samples as well as two large student samples. We contend that although response bias can be a substantial problem, the reverse-scored questions of these scales do not solve that problem and instead decrease overall validity. Copyright © 2011 Elsevier Ltd. All rights reserved.
Development and psychometric validation of a child Racial Attitudes Index (RAI).

PubMed

Clark, Khaya D; Yovanoff, Paul; Tate, Charlotte Ursula

2017-12-01

The Racial Attitudes Index (RAI) measures a child's racial attitudes. Designed for children aged 5-9 years, the RAI is delivered over the Internet using Audio Computer Assisted Self-Interviewing (ACASI). Unlike traditional binary forced-choice instruments, the RAI uses an expanded response format permitting a more nuanced understanding of patterns of children's racial attitudes. In addition to establishing psychometric evidence of the RAI technical adequacy, hypotheses about RAI item response patterns were tested. The racial attitudes of 336 Black and White children in grades K-3 were assessed using a forced-choice instrument (Preschool Racial Attitudes Measure II) and the RAI. Findings from this study indicate measures obtained with the RAI are technically adequate, and the measure functions invariantly across racial groups. Also, patterns of children's racial attitudes measured with the RAI are more nuanced than those obtained using the forced-choice response format.
Measuring the Quality of Life of Visually Impaired Children: First Stage Psychometric Evaluation of the Novel VQoL_CYP Instrument.

PubMed

Tadić, Valerija; Cooper, Andrew; Cumberland, Phillippa; Lewando-Hundt, Gillian; Rahi, Jugnoo S

2016-01-01

To report piloting and initial validation of the VQoL_CYP, a novel age-appropriate vision-related quality of life (VQoL) instrument for self-reporting by children with visual impairment (VI). Participants were a random patient sample of children with VI aged 10-15 years. 69 patients, drawn from patient databases at Great Ormond Street Hospital and Moorfields Eye Hospital, United Kingdom, participated in piloting of the draft 47-item VQoL instrument, which enabled preliminary item reduction. Subsequent administration of the instrument, alongside functional vision (FV) and generic health-related quality of life (HRQoL) self-report measures, to 101 children with VI comprising a nationally representative sample enabled further item reduction and evaluation of psychometric properties using Rasch analysis. Construct validity was assessed through Pearson correlation coefficients. Item reduction through piloting (8 items removed for skewness and individual item response pattern) and validation (1 item removed for skewness and 3 for misfit in Rasch) produced a 35-item scale, with fit values within acceptable limits, no notable differential item functioning, good measurement precision, ordered response categories and acceptable targeting in Rasch. The VQoL_CYP showed good construct validity, correlating strongly with HRQoL scores, moderately with FV scores but not with acuity. Robust child-appropriate self-report VQoL measures for children with VI are necessary for understanding the broader impacts of living with a visual disability, distinguishing these from limited functioning per se. Future planned use in larger patient samples will allow further psychometric development of the VQoL_CYP as an adjunct to objective outcomes assessment.
"Don't know" responses to risk perception measures: implications for underserved populations.

PubMed

Waters, Erika A; Hay, Jennifer L; Orom, Heather; Kiviniemi, Marc T; Drake, Bettina F

2013-02-01

Risk perceptions are legitimate targets for behavioral interventions because they can motivate medical decisions and health behaviors. However, some survey respondents may not know (or may not indicate) their risk perceptions. The scope of "don't know" (DK) responding is unknown. Examine the prevalence and correlates of responding DK to items assessing perceived risk of colorectal cancer. Two nationally representative, population-based, cross-sectional surveys (2005 National Health Interview Survey [NHIS]; 2005 Health Information National Trends Survey [HINTS]), and one primary care clinic-based survey comprised of individuals from low-income communities. Analyses included 31,202 (NHIS), 1,937 (HINTS), and 769 (clinic) individuals. Five items assessed perceived risk of colorectal cancer. Four of the items differed in format and/or response scale: comparative risk (NHIS, HINTS); absolute risk (HINTS, clinic), and "likelihood" and "chance" response scales (clinic). Only the clinic-based survey included an explicit DK response option. "Don't know" responding was 6.9% (NHIS), 7.5% (HINTS-comparative), and 8.7% (HINTS-absolute). "Don't know" responding was 49.1% and 69.3% for the "chance" and "likely" response options (clinic). Correlates of DK responding were characteristics generally associated with disparities (e.g., low education), but the pattern of results varied among samples, question formats, and response scales. The surveys were developed independently and employed different methodologies and items. Consequently, the results were not directly comparable. There may be multiple explanations for differences in the magnitude and characteristics of DK responding. "Don't know" responding is more prevalent in populations affected by health disparities. Either not assessing or not analyzing DK responses could further disenfranchise these populations and negatively affect the validity of research and the efficacy of interventions seeking to eliminate health disparities.
Temporal-contextual processing in working memory: evidence from delayed cued recall and delayed free recall tests.

PubMed

Loaiza, Vanessa M; McCabe, David P

2012-02-01

Three experiments are reported that addressed the nature of processing in working memory by investigating patterns of delayed cued recall and free recall of items initially studied during complex and simple span tasks. In Experiment 1, items initially studied during a complex span task (i.e., operation span) were more likely to be recalled after a delay in response to temporal-contextual cues, relative to items from subspan and supraspan list lengths in a simple span task (i.e., word span). In Experiment 2, items initially studied during operation span were more likely to be recalled from neighboring serial positions during delayed free recall than were items studied during word span trials. Experiment 3 demonstrated that the number of attentional refreshing opportunities strongly predicts episodic memory performance, regardless of whether the information is presented in a spaced or massed format in a modified operation span task. The results indicate that the content-context bindings created during complex span trials reflect attentional refreshing opportunities that are used to maintain items in working memory.
Immediate Small Number Perception : Evidence from a New Numerical Carry-Over Procedure

ERIC Educational Resources Information Center

Demeyere, Nele; Humphreys, Glyn W.

2012-01-01

Evidence is presented for the immediate apprehension of exact small quantities. Participants performed a quantification task (are the number of items greater or smaller than?), and carry-over effects were examined between numbers requiring the same response. Carry-over effects between small numbers were strongly affected by repeats of pattern and…
Lexical Choice and Language Selection in Bilingual Preschoolers

ERIC Educational Resources Information Center

Greene, Kai J.; Pena, Elizabeth D.; Bedore, Lisa M.

2013-01-01

This study examined single-word code-mixing produced by bilingual preschoolers in order to better understand lexical choice patterns in each language. Analysis included item-level code-mixed responses of 606 five-year-old children. Per parent report, children were separated by language dominance based on language exposure and use. Children were…
Challenges to the Use of Artificial Neural Networks for Diagnostic Classifications with Student Test Data

ERIC Educational Resources Information Center

Briggs, Derek C.; Circi, Ruhan

2017-01-01

Artificial Neural Networks (ANNs) have been proposed as a promising approach for the classification of students into different levels of a psychological attribute hierarchy. Unfortunately, because such classifications typically rely upon internally produced item response patterns that have not been externally validated, the instability of ANN…
Conceptualizing and Measuring Weekend versus Weekday Alcohol Use: Item Response Theory and Confirmatory Factor Analysis

PubMed Central

Handren, Lindsay; Crano, William D.

2018-01-01

Culturally, people tend to abstain from alcohol intake during the weekdays and wait to consume in greater frequency and quantity during the weekends. The current research sought to empirically justify the days representing weekday versus weekend alcohol consumption. In study 1 (N = 419), item response theory was applied to a two-parameter (difficulty and discrimination) model that evaluated the days of drinking (frequency) during the typical 7-day week. Item characteristic curves were most similar for Monday, Tuesday, and Wednesday (prototypical weekday) and for Friday and Saturday (prototypical weekend). Thursday and Sunday, however, exhibited item characteristics that bordered the properties of weekday and weekend consumption. In study 2 (N = 403), confirmatory factor analysis was applied to test six hypothesized measurement structures representing drinks per day (quantity) during the typical week. The measurement model producing the strongest fit indices was a correlated two-factor structure involving separate weekday and weekend factors that permitted Thursday and Sunday to double load on both dimensions. The proper conceptualization and accurate measurement of the days demarcating the normative boundaries of “dry” weekdays and “wet” weekends are imperative to inform research and prevention efforts targeting temporal alcohol intake patterns. PMID:27488456
Conceptualizing and Measuring Weekend versus Weekday Alcohol Use: Item Response Theory and Confirmatory Factor Analysis.

PubMed

Lac, Andrew; Handren, Lindsay; Crano, William D

2016-10-01

Culturally, people tend to abstain from alcohol intake during the weekdays and wait to consume in greater frequency and quantity during the weekends. The current research sought to empirically justify the days representing weekday versus weekend alcohol consumption. In study 1 (N = 419), item response theory was applied to a two-parameter (difficulty and discrimination) model that evaluated the days of drinking (frequency) during the typical 7-day week. Item characteristic curves were most similar for Monday, Tuesday, and Wednesday (prototypical weekday) and for Friday and Saturday (prototypical weekend). Thursday and Sunday, however, exhibited item characteristics that bordered the properties of weekday and weekend consumption. In study 2 (N = 403), confirmatory factor analysis was applied to test six hypothesized measurement structures representing drinks per day (quantity) during the typical week. The measurement model producing the strongest fit indices was a correlated two-factor structure involving separate weekday and weekend factors that permitted Thursday and Sunday to double load on both dimensions. The proper conceptualization and accurate measurement of the days demarcating the normative boundaries of "dry" weekdays and "wet" weekends are imperative to inform research and prevention efforts targeting temporal alcohol intake patterns.
Large-Scale Constraint-Based Pattern Mining

ERIC Educational Resources Information Center

Zhu, Feida

2009-01-01

We studied the problem of constraint-based pattern mining for three different data formats, item-set, sequence and graph, and focused on mining patterns of large sizes. Colossal patterns in each data formats are studied to discover pruning properties that are useful for direct mining of these patterns. For item-set data, we observed robustness of…
Electrophysiologically dissociating episodic preretrieval processing.

PubMed

Bridger, Emma K; Mecklinger, Axel

2012-06-01

Contrasts between ERPs elicited by new items from tests with distinct episodic retrieval requirements index preretrieval processing. Preretrieval operations are thought to facilitate the recovery of task-relevant information because they have been shown to correlate with response accuracy in tasks in which prioritizing the retrieval of this information could be a useful strategy. This claim was tested here by contrasting new item ERPs from two retrieval tasks, each designed to explicitly require the recovery of a different kind of mnemonic information. New item ERPs differed from 400 msec poststimulus, but the distribution of these effects varied markedly, depending upon participants' response accuracy: A protracted posteriorly located effect was present for higher performing participants, whereas an anteriorly distributed effect occurred for lower performing participants. The magnitude of the posterior effect from 400 to 800 msec correlated with response accuracy, supporting the claim that preretrieval processes facilitate the recovery of task-relevant information. Additional contrasts between ERPs from these tasks and an old/new recognition task operating as a relative baseline revealed task-specific effects with nonoverlapping scalp topographies, in line with the assumption that these new item ERP effects reflect qualitatively distinct retrieval operations. Similarities in these effects were also used to reason about preretrieval processes related to the general requirement to recover contextual details. These insights, alongside the distinct pattern of effects for the two accuracy groups, reveal the multifarious nature of preretrieval processing while indicating that only some of these classes of operation are systematically related to response accuracy in recognition memory tasks.
Time manages interference in visual short-term memory.

PubMed

Smith, Amy V; McKeown, Denis; Bunce, David

2017-09-01

Emerging evidence suggests that age-related declines in memory may reflect a failure in pattern separation, a process that is believed to reduce the encoding overlap between similar stimulus representations during memory encoding. Indeed, behavioural pattern separation may be indexed by a visual continuous recognition task in which items are presented in sequence and observers report for each whether it is novel, previously viewed (old), or whether it shares features with a previously viewed item (similar). In comparison to young adults, older adults show a decreased pattern separation when the number of items between "old" and "similar" items is increased. Yet the mechanisms of forgetting underpinning this type of recognition task are yet to be explored in a cognitively homogenous group, with careful control over the parameters of the task, including elapsing time (a critical variable in models of forgetting). By extending the inter-item intervals, number of intervening items and overall decay interval, we observed in a young adult sample (N = 35, M age = 19.56 years) that the critical factor governing performance was inter-item interval. We argue that tasks using behavioural continuous recognition to index pattern separation in immediate memory will benefit from generous inter-item spacing, offering protection from inter-item interference.
Enhanced visual processing contributes to matrix reasoning in autism

PubMed Central

Soulières, Isabelle; Dawson, Michelle; Samson, Fabienne; Barbeau, Elise B.; Sahyoun, Cherif; Strangman, Gary E.; Zeffiro, Thomas A.; Mottron, Laurent

2009-01-01

Recent behavioral investigations have revealed that autistics perform more proficiently on Raven's Standard Progressive Matrices (RSPM) than would be predicted by their Wechsler intelligence scores. A widely-used test of fluid reasoning and intelligence, the RSPM assays abilities to flexibly infer rules, manage goal hierarchies, and perform high-level abstractions. The neural substrates for these abilities are known to encompass a large frontoparietal network, with different processing models placing variable emphasis on the specific roles of the prefrontal or posterior regions. We used functional magnetic resonance imaging to explore the neural bases of autistics' RSPM problem solving. Fifteen autistic and eighteen non-autistic participants, matched on age, sex, manual preference and Wechsler IQ, completed 60 self-paced randomly-ordered RSPM items along with a visually similar 60-item pattern matching comparison task. Accuracy and response times did not differ between groups in the pattern matching task. In the RSPM task, autistics performed with similar accuracy, but with shorter response times, compared to their non-autistic controls. In both the entire sample and a subsample of participants additionally matched on RSPM performance to control for potential response time confounds, neural activity was similar in both groups for the pattern matching task. However, for the RSPM task, autistics displayed relatively increased task-related activity in extrastriate areas (BA18), and decreased activity in the lateral prefrontal cortex (BA9) and the medial posterior parietal cortex (BA7). Visual processing mechanisms may therefore play a more prominent role in reasoning in autistics. PMID:19530215
The representation of order information in auditory-verbal short-term memory.

PubMed

Kalm, Kristjan; Norris, Dennis

2014-05-14

Here we investigate how order information is represented in auditory-verbal short-term memory (STM). We used fMRI and a serial recall task to dissociate neural activity patterns representing the phonological properties of the items stored in STM from the patterns representing their order. For this purpose, we analyzed fMRI activity patterns elicited by different item sets and different orderings of those items. These fMRI activity patterns were compared with the predictions made by positional and chaining models of serial order. The positional models encode associations between items and their positions in a sequence, whereas the chaining models encode associations between successive items and retain no position information. We show that a set of brain areas in the postero-dorsal stream of auditory processing store associations between items and order as predicted by a positional model. The chaining model of order representation generates a different pattern similarity prediction, which was shown to be inconsistent with the fMRI data. Our results thus favor a neural model of order representation that stores item codes, position codes, and the mapping between them. This study provides the first fMRI evidence for a specific model of order representation in the human brain. Copyright © 2014 the authors 0270-6474/14/346879-08$15.00/0.
Partnering with patients using social media to develop a hypertension management instrument.

PubMed

Kear, Tamara; Harrington, Magdalena; Bhattacharya, Anand

2015-09-01

Hypertension is a lifelong condition; thus, long-term adherence to lifestyle modification, self-monitoring, and medication regimens remains a challenge for patients. The aim of this study was to develop a patient-reported hypertension instrument that measured attitudes, lifestyle behaviors, adherence, and barriers to hypertension management using patient-reported outcome data. The study was conducted using the Open Research Exchange software platform created by PatientsLikeMe. A total of 360 participants completed the psychometric phase of the study; incomplete responses were obtained from 147 patients, and 150 patients opted out. Principal component analysis with orthogonal (varimax) rotation was executed on a data set with all completed responses (N = 249) and applied to 43 items. Based on the review of the factor solution, eigenvalues, and item loadings, 16 items were eliminated and model with 29 items was tested. The process was repeated two more times until final model with 14 items was established. In interpreting the rotated factor pattern, an item was said to load on any given component if the factor loading was ≥0.40 for that component and was <0.40 for the other. In addition to the newly generated instrument, demographic and self-reported clinical characteristics of the study participants such as the type of prescribed hypertension medications, frequency of blood pressure monitoring, and comorbid conditions were examined. The Open Research Exchange platform allowed for ongoing input from patients through each stage of the 14-item instrument development. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

An international measure of awareness and beliefs about cancer: development and testing of the ABC

PubMed Central

Simon, Alice E; Forbes, Lindsay J L; Boniface, David; Warburton, Fiona; Brain, Kate E; Dessaix, Anita; Donnelly, Michael; Haynes, Kerry; Hvidberg, Line; Lagerlund, Magdalena; Petermann, Lisa; Tishelman, Carol; Vedsted, Peter; Vigmostad, Maria Nyre; Wardle, Jane; Ramirez, Amanda J

2012-01-01

Objectives To develop an internationally validated measure of cancer awareness and beliefs; the awareness and beliefs about cancer (ABC) measure. Design and setting Items modified from existing measures were assessed by a working group in six countries (Australia, Canada, Denmark, Norway, Sweden and the UK). Validation studies were completed in the UK, and cross-sectional surveys of the general population were carried out in the six participating countries. Participants Testing in UK English included cognitive interviewing for face validity (N=10), calculation of content validity indexes (six assessors), and assessment of test–retest reliability (N=97). Conceptual and cultural equivalence of modified (Canadian and Australian) and translated (Danish, Norwegian, Swedish and Canadian French) ABC versions were tested quantitatively for equivalence of meaning (≥4 assessors per country) and in bilingual cognitive interviews (three interviews per translation). Response patterns were assessed in surveys of adults aged 50+ years (N≥2000) in each country. Main outcomes Psychometric properties were evaluated through tests of validity and reliability, conceptual and cultural equivalence and systematic item analysis. Test–retest reliability used weighted-κ and intraclass correlations. Construction and validation of aggregate scores was by factor analysis for (1) beliefs about cancer outcomes, (2) beliefs about barriers to symptomatic presentation, and item summation for (3) awareness of cancer symptoms and (4) awareness of cancer risk factors. Results The English ABC had acceptable test–retest reliability and content validity. International assessments of equivalence identified a small number of items where wording needed adjustment. Survey response patterns showed that items performed well in terms of difficulty and discrimination across countries except for awareness of cancer outcomes in Australia. Aggregate scores had consistent factor structures across countries. Conclusions The ABC is a reliable and valid international measure of cancer awareness and beliefs. The methods used to validate and harmonise the ABC may serve as a methodological guide in international survey research. PMID:23253874
Cognitive and neural mechanisms of decision biases in recognition memory.

PubMed

Windmann, Sabine; Urbach, Thomas P; Kutas, Marta

2002-08-01

In recognition memory tasks, stimuli can be classified as "old" either on the basis of accurate memory or a bias to respond "old", yet bias has received little attention in the cognitive neuroscience literature. Here we examined the pattern and timing of bias-related effects in event-related brain potentials (ERPs) to determine whether the bias is linked more to memory retrieval or to response verification processes. Participants were divided into a High Bias and a Low Bias group according to their bias to respond "old". These groups did not differ in recognition accuracy or in the ERP pattern to items that actually were old versus new (Objective Old/New Effect). However, when the old/new distinction was based on each subject's perspective, i.e. when items judged "old" were compared with those judged "new" (Subjective Old/New Effect), significant group differences were observed over prefrontal sites with a timing (300-500 ms poststimulus) more consistent with bias acting early on memory retrieval processes than on post-retrieval response verification processes. In the standard old/new effect (Hits vs Correct Rejections), these group differences were intermediate to those for the Objective and the Subjective comparisons, indicating that such comparisons are confounded by response bias. We propose that these biases are top-down controlled processes mediated by prefrontal cortex areas.
The Disaggregation of Value-Added Test Scores to Assess Learning Outcomes in Economics Courses

ERIC Educational Resources Information Center

Walstad, William B.; Wagner, Jamie

2016-01-01

This study disaggregates posttest, pretest, and value-added or difference scores in economics into four types of economic learning: positive, retained, negative, and zero. The types are derived from patterns of student responses to individual items on a multiple-choice test. The micro and macro data from the "Test of Understanding in College…
[Eight-step structured decision-making process to assign criminal responsibility and seven focal points for describing relationship between psychopathology and offense].

PubMed

Okada, Takayuki

2013-01-01

The author suggested that it is essential for lawyers and psychiatrists to have a common understanding of the mutual division of roles between them when determining criminal responsibility (CR) and, for this purpose, proposed an 8-step structured CR decision-making process. The 8 steps are: (1) gathering of information related to mental function and condition, (2) recognition of mental function and condition,(3) psychiatric diagnosis, (4) description of the relationship between psychiatric symptom or psychopathology and index offense, (5) focus on capacities of differentiation between right and wrong and behavioral control, (6) specification of elements of cognitive/volitional prong in legal context, (7) legal evaluation of degree of cognitive/volitional prong, and (8) final interpretation of CR as a legal conclusion. The author suggested that the CR decision-making process should proceed not in a step-like pattern from (1) to (2) to (3) to (8), but in a step-like pattern from (1) to (2) to (4) to (5) to (6) to (7) to (8), and that not steps after (5), which require the interpretation or the application of section 39 of the Penal Code, but Step (4), must be the core of psychiatric expert evidence. When explaining the relationship between the mental disorder and offense described in Step (4), the Seven Focal Points (7FP) are often used. The author urged basic precautions to prevent the misuse of 7FP, which are: (a) the priority of each item is not equal and the relative importance differs from case to case; (b) each item is not exclusively independent, there may be overlap between items; (c) the criminal responsibility shall not be judged because one item is applicable or because a number of items are applicable, i. e., 7FP are not "criteria," for example, the aim is not to decide such things as 'the motive is understandable' or 'the conduct is appropriate', but should be to describe how psychopathological factors affected the offense specifically in the context of understandability of motive or appropriateness of conduct; (d) it is essential to evaluate each item from a neutral point of view rather than only from one perspective, for example, looking at the case from the aspects of both comprehensibility and incomprehensibility of motive or from aspects of both oriented, purposeful, organized behavior and disoriented, purposeless, disorganized behavior during the offense; (e) depending on the case, there are some items that do not require any consideration (there are some cases in which there are less than seven items); (f) 7FP are not exhaustive and there are instances in which, depending on the case, there should be a focus on points that are not included in these.
An Item Response Theory Analysis of DSM-IV Cannabis Abuse and Dependence Criteria in Adolescents

PubMed Central

Hartman, Christie A.; Gelhorn, Heather; Crowley, Thomas J.; Sakai, Joseph T.; Stallings, Michael; Young, Susan E.; Rhee, Soo Hyun; Corley, Robin; Hewitt, John K.; Hopfer, Christian J.

2008-01-01

Objective To examine three aspects of adolescent cannabis problems: 1) do DSM-IV cannabis abuse and dependence criteria represent two different levels of severity of substance involvement, 2) to what degree do each of the 11 abuse and dependence criteria assess adolescent cannabis problems, and 3) do the DSM-IV items function similarly across different adolescent populations? Method We examined 5587 adolescents aged 11–19, including 615 youth in treatment for substance use disorders, 179 adjudicated youth, and 4793 youth from the community. All subjects were assessed with a structured diagnostic interview. Item response theory was utilized to analyze symptom endorsement patterns. Results Abuse and dependence criteria were not found to represent different levels of severity of problem cannabis use in any of the samples. Among the 11 abuse and dependence criteria, Problems cutting down and Legal problems were the least informative for distinguishing problem users. Two dependence criteria and three of the four abuse criteria indicated different severities of cannabis problems across samples. Conclusions We found little evidence to support the idea that abuse and dependence are separate constructs for adolescent cannabis problems. Furthermore, certain abuse criteria may indicate severe substance problems while specific dependence items may indicate less severe problems. The abuse items in particular need further study. These results have implications for the refinement of the current substance use disorder criteria for DSM-V. PMID:18176333
Conditional recall and the frequency effect in the serial recall task: an examination of item-to-item associativity.

PubMed

Miller, Leonie M; Roodenrys, Steven

2012-11-01

The frequency effect in short-term serial recall is influenced by the composition of lists. In pure lists, a robust advantage in the recall of high-frequency (HF) words is observed, yet in alternating mixed lists, HF and low-frequency (LF) words are recalled equally well. It has been argued that the preexisting associations between all list items determine a single, global level of supportive activation that assists item recall. Preexisting associations between items are assumed to be a function of language co-occurrence; HF-HF associations are high, LF-LF associations are low, and mixed associations are intermediate in activation strength. This account, however, is based on results when alternating lists with equal numbers of HF and LF words were used. It is possible that directional association between adjacent list items is responsible for the recall patterns reported. In the present experiment, the recall of three forms of mixed lists-those with equal numbers of HF and LF items and pure lists-was examined to test the extent to which item-to-item associations are present in serial recall. Furthermore, conditional probabilities were used to examine more closely the evidence for a contribution, since correct-in-position scoring may mask recall that is dependent on the recall of prior items. The results suggest that an item-to-item effect is clearly present for early but not late list items, and they implicate an additional factor, perhaps the availability of resources at output, in the recall of late list items.
Dietary quality among men and women in 187 countries in 1990 and 2010: a systematic assessment

PubMed Central

Imamura, Fumiaki; Micha, Renata; Khatibzadeh, Shahab; Fahimi, Saman; Shi, Peilin; Powles, John; Mozaffarian, Dariush

2015-01-01

Summary Background Healthy dietary patterns are a global priority to reduce non-communicable diseases. Yet neither worldwide patterns of diets nor their trends with time are well established. We aimed to characterise global changes (or trends) in dietary patterns nationally and regionally and to assess heterogeneity by age, sex, national income, and type of dietary pattern. Methods In this systematic assessment, we evaluated global consumption of key dietary items (foods and nutrients) by region, nation, age, and sex in 1990 and 2010. Consumption data were evaluated from 325 surveys (71·7% nationally representative) covering 88·7% of the global adult population. Two types of dietary pattern were assessed: one reflecting greater consumption of ten healthy dietary items and the other based on lesser consumption of seven unhealthy dietary items. The mean intakes of each dietary factor were divided into quintiles, and each quintile was assigned an ordinal score, with higher scores being equivalent to healthier diets (range 0–100). The dietary patterns were assessed by hierarchical linear regression including country, age, sex, national income, and time as exploratory variables. Findings From 1990 to 2010, diets based on healthy items improved globally (by 2·2 points, 95% uncertainty interval (UI) 0·9 to 3·5), whereas diets based on unhealthy items worsened (−2·5, −3·3 to −1·7). In 2010, the global mean scores were 44·0 (SD 10·5) for the healthy pattern and 52·1 (18·6) for the unhealthy pattern, with weak intercorrelation (r=–0·08) between countries. On average, better diets were seen in older adults compared with younger adults, and in women compared with men (p<0·0001 each). Compared with low-income nations, high-income nations had better diets based on healthy items (+2·5 points, 95% UI 0·3 to 4·1), but substantially poorer diets based on unhealthy items (−33·0, −37·8 to −28·3). Diets and their trends were very heterogeneous across the world regions. For example, both types of dietary patterns improved in high-income countries, but worsened in some low-income countries in Africa and Asia. Middle-income countries showed the largest improvement in dietary patterns based on healthy items, but the largest deterioration in dietary patterns based on unhealthy items. Interpretation Consumption of healthy items improved, while consumption of unhealthy items worsened across the world, with heterogeneity across regions and countries. These global data provide the best estimates to date of nutrition transitions across the world and inform policies and priorities for reducing the health and economic burdens of poor diet quality. Funding The Bill & Melinda Gates Foundation and Medical Research Council. PMID:25701991
Three pedagogical approaches to introductory physics labs and their effects on student learning outcomes

NASA Astrophysics Data System (ADS)

Chambers, Timothy

This dissertation presents the results of an experiment that measured the learning outcomes associated with three different pedagogical approaches to introductory physics labs. These three pedagogical approaches presented students with the same apparatus and covered the same physics content, but used different lab manuals to guide students through distinct cognitive processes in conducting their laboratory investigations. We administered post-tests containing multiple-choice conceptual questions and free-response quantitative problems one week after students completed these laboratory investigations. In addition, we collected data from the laboratory practical exam taken by students at the end of the semester. Using these data sets, we compared the learning outcomes for the three curricula in three dimensions of ability: conceptual understanding, quantitative problem-solving skill, and laboratory skills. Our three pedagogical approaches are as follows. Guided labs lead students through their investigations via a combination of Socratic-style questioning and direct instruction, while students record their data and answers to written questions in the manual during the experiment. Traditional labs provide detailed written instructions, which students follow to complete the lab objectives. Open labs provide students with a set of apparatus and a question to be answered, and leave students to devise and execute an experiment to answer the question. In general, we find that students performing Guided labs perform better on some conceptual assessment items, and that students performing Open labs perform significantly better on experimental tasks. Combining a classical test theory analysis of post-test results with in-lab classroom observations allows us to identify individual components of the laboratory manuals and investigations that are likely to have influenced the observed differences in learning outcomes associated with the different pedagogical approaches. Due to the novel nature of this research and the large number of item-level results we produced, we recommend additional research to determine the reproducibility of our results. Analyzing the data with item response theory yields additional information about the performance of our students on both conceptual questions and quantitative problems. We find that performing lab activities on a topic does lead to better-than-expected performance on some conceptual questions regardless of pedagogical approach, but that this acquired conceptual understanding is strongly context-dependent. The results also suggest that a single "Newtonian reasoning ability" is inadequate to explain student response patterns to items from the Force Concept Inventory. We develop a framework for applying polytomous item response theory to the analysis of quantitative free-response problems and for analyzing how features of student solutions are influenced by problem-solving ability. Patterns in how students at different abilities approach our post-test problems are revealed, and we find hints as to how features of a free-response problem influence its item parameters. The item-response theory framework we develop provides a foundation for future development of quantitative free-response research instruments. Chapter 1 of the dissertation presents a brief history of physics education research and motivates the present study. Chapter 2 describes our experimental methodology and discusses the treatments applied to students and the instruments used to measure their learning. Chapter 3 provides an introduction to the statistical and analytical methods used in our data analysis. Chapter 4 presents the full data set, analyzed using both classical test theory and item response theory. Chapter 5 contains a discussion of the implications of our results and a data-driven analysis of our experimental methods. Chapter 6 describes the importance of this work to the field and discusses the relevance of our research to curriculum development and to future work in physics education research.
Distribution of Total Depressive Symptoms Scores and Each Depressive Symptom Item in a Sample of Japanese Employees.

PubMed

Tomitaka, Shinichiro; Kawasaki, Yohei; Ide, Kazuki; Yamada, Hiroshi; Miyake, Hirotsugu; Furukawa, Toshiaki A; Furukaw, Toshiaki A

2016-01-01

In a previous study, we reported that the distribution of total depressive symptoms scores according to the Center for Epidemiologic Studies Depression Scale (CES-D) in a general population is stable throughout middle adulthood and follows an exponential pattern except for at the lowest end of the symptom score. Furthermore, the individual distributions of 16 negative symptom items of the CES-D exhibit a common mathematical pattern. To confirm the reproducibility of these findings, we investigated the distribution of total depressive symptoms scores and 16 negative symptom items in a sample of Japanese employees. We analyzed 7624 employees aged 20-59 years who had participated in the Northern Japan Occupational Health Promotion Centers Collaboration Study for Mental Health. Depressive symptoms were assessed using the CES-D. The CES-D contains 20 items, each of which is scored in four grades: "rarely," "some," "much," and "most of the time." The descriptive statistics and frequency curves of the distributions were then compared according to age group. The distribution of total depressive symptoms scores appeared to be stable from 30-59 years. The right tail of the distribution for ages 30-59 years exhibited a linear pattern with a log-normal scale. The distributions of the 16 individual negative symptom items of the CES-D exhibited a common mathematical pattern which displayed different distributions with a boundary at "some." The distributions of the 16 negative symptom items from "some" to "most" followed a linear pattern with a log-normal scale. The distributions of the total depressive symptoms scores and individual negative symptom items in a Japanese occupational setting show the same patterns as those observed in a general population. These results show that the specific mathematical patterns of the distributions of total depressive symptoms scores and individual negative symptom items can be reproduced in an occupational population.
On the Equivalence of a Likelihood Ratio of Drasgow, Levine, and Zickar (1996) and the Statistic Based on the Neyman-Pearson Lemma of Belov (2016).

PubMed

Sinharay, Sandip

2017-03-01

Levine and Drasgow (1988) suggested an approach based on the Neyman-Pearson lemma to detect examinees whose response patterns are "aberrant" due to cheating, language issues, and so on. Belov (2016) used the approach of Levine and Drasgow (1988) to suggest a statistic based on the Neyman-Pearson Lemma (SBNPL) to detect item preknowledge when the investigator knows which items are compromised. This brief report proves that the SBNPL of Belov (2016) is equivalent to a statistic suggested for the same purpose by Drasgow, Levine, and Zickar 20 years ago.
Development of an Inconsistent Responding Scale for the Triarchic Psychopathy Measure.

PubMed

Mowle, Elyse N; Kelley, Shannon E; Edens, John F; Donnellan, M Brent; Smith, Shannon Toney; Wygant, Dustin B; Sellbom, Martin

2017-08-01

Inconsistent or careless responding to self-report measures is estimated to occur in approximately 10% of university research participants and may be even more common among offender populations. Inconsistent responding may be a result of a number of factors including inattentiveness, reading or comprehension difficulties, and cognitive impairment. Many stand-alone personality scales used in applied and research settings, however, do not include validity indicators to help identify inattentive response patterns. Using multiple archival samples, the current study describes the development of an inconsistent responding scale for the Triarchic Psychopathy Measure (TriPM; Patrick, 2010), a widely used self-report measure of psychopathy. We first identified pairs of correlated TriPM items in a derivation sample (N = 2,138) and then created a total score based on the sum of the absolute value of the differences for each item pair. The resulting scale, the Triarchic Assessment Procedure for Inconsistent Responding (TAPIR), strongly differentiated between genuine TriPM protocols and randomly generated TriPM data (N = 1,000), as well as between genuine protocols and those in which 50% of the original data were replaced with random item responses. TAPIR scores demonstrated fairly consistent patterns of association with some theoretically relevant correlates (e.g., inconsistency scales embedded in other personality inventories), although not others (e.g., measures of conscientiousness) across our cross-validation samples. Tentative TAPIR cut scores that may discriminate between attentively and carelessly completed protocols are presented. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Investigating Assessment Bias for Constructed Response Explanation Tasks: Implications for Evaluating Performance Expectations for Scientific Practice

NASA Astrophysics Data System (ADS)

Federer, Meghan Rector

Assessment is a key element in the process of science education teaching and research. Understanding sources of performance bias in science assessment is a major challenge for science education reforms. Prior research has documented several limitations of instrument types on the measurement of students' scientific knowledge (Liu et al., 2011; Messick, 1995; Popham, 2010). Furthermore, a large body of work has been devoted to reducing assessment biases that distort inferences about students' science understanding, particularly in multiple-choice [MC] instruments. Despite the above documented biases, much has yet to be determined for constructed response [CR] assessments in biology and their use for evaluating students' conceptual understanding of scientific practices (such as explanation). Understanding differences in science achievement provides important insights into whether science curricula and/or assessments are valid representations of student abilities. Using the integrative framework put forth by the National Research Council (2012), this dissertation aimed to explore whether assessment biases occur for assessment practices intended to measure students' conceptual understanding and proficiency in scientific practices. Using a large corpus of undergraduate biology students' explanations, three studies were conducted to examine whether known biases of MC instruments were also apparent in a CR instrument designed to assess students' explanatory practice and understanding of evolutionary change (ACORNS: Assessment of COntextual Reasoning about Natural Selection). The first study investigated the challenge of interpreting and scoring lexically ambiguous language in CR answers. The incorporation of 'multivalent' terms into scientific discourse practices often results in statements or explanations that are difficult to interpret and can produce faulty inferences about student knowledge. The results of this study indicate that many undergraduate biology majors frequently incorporate multivalent concepts into explanations of change, resulting in explanatory practices that were scientifically non-normative. However, use of follow-up question approaches was found to resolve this source of bias and thereby increase the validity of inferences about student understanding. The second study focused on issues of item and instrument structure, specifically item feature effects and item position effects, which have been shown to influence measures of student performance across assessment tasks. Results indicated that, along the instrument item sequence, items with similar surface features produced greater sequencing effects than sequences of items with dissimilar surface features. This bias could be addressed by use of a counterbalanced design (i.e., Latin Square) at the population level of analysis. Explanation scores were also highly correlated with student verbosity, despite verbosity being an intrinsically trivial aspect of explanation quality. Attempting to standardize student response length was one proposed solution to the verbosity bias. The third study explored gender differences in students' performance on constructed-response explanation tasks using impact (i.e., mean raw scores) and differential item function (i.e., item difficulties) patterns. While prior research in science education has suggested that females tend to perform better on constructed-response items, the results of this study revealed no overall differences in gender achievement. However, evaluation of specific item features patterns suggested that female respondents have a slight advantage on unfamiliar explanation tasks. That is, male students tended to incorporate fewer scientifically normative concepts (i.e., key concepts) than females for unfamiliar taxa. Conversely, females tended to incorporate more scientifically non-normative ideas (i.e., naive ideas) than males for familiar taxa. Together these results indicate that gender achievement differences for this CR instrument may be a result of differences in how males and females interpret and respond to combinations of item features. Overall, the results presented in the subsequent chapters suggest that as science education shifts toward the evaluation of fused scientific knowledge and practice (e.g., explanation), it is essential that educators and researchers investigate potential sources of bias inherent to specific assessment practices. This dissertation revealed significant sources of CR assessment bias, and provided solutions to address these problems.
General personality and psychopathology in referred and nonreferred children and adolescents: an investigation of continuity, pathoplasty, and complication models.

PubMed

De Bolle, Marleen; Beyers, Wim; De Clercq, Barbara; De Fruyt, Filip

2012-11-01

This study investigated the continuity, pathoplasty, and complication models as plausible explanations for personality-psychopathology relations in a combined sample of community (n = 571) and referred (n = 146) children and adolescents. Multivariate structural equation modeling was used to examine the structural relations between latent personality and psychopathology change across a 2-year period. Item response theory models were fitted as an additional test of the continuity hypothesis. Even after correcting for item overlap, the results provided strong support for the continuity model, demonstrating that personality and psychopathology displayed dynamic change patterns across time. Item response theory models further supported the continuity conceptualization for understanding the association between internalizing problems and emotional stability and extraversion as well as between externalizing problems and benevolence and conscientiousness. In addition to the continuity model, particular personality and psychopathology combinations provided evidence for the pathoplasty and complication models. The theoretical and practical implications of these results are discussed, and suggestions for future research are provided. (PsycINFO Database Record (c) 2012 APA, all rights reserved).
Sensory processing patterns predict the integration of information held in visual working memory.

PubMed

Lowe, Matthew X; Stevenson, Ryan A; Wilson, Kristin E; Ouslis, Natasha E; Barense, Morgan D; Cant, Jonathan S; Ferber, Susanne

2016-02-01

Given the limited resources of visual working memory, multiple items may be remembered as an averaged group or ensemble. As a result, local information may be ill-defined, but these ensemble representations provide accurate diagnostics of the natural world by combining gist information with item-level information held in visual working memory. Some neurodevelopmental disorders are characterized by sensory processing profiles that predispose individuals to avoid or seek-out sensory stimulation, fundamentally altering their perceptual experience. Here, we report such processing styles will affect the computation of ensemble statistics in the general population. We identified stable adult sensory processing patterns to demonstrate that individuals with low sensory thresholds who show a greater proclivity to engage in active response strategies to prevent sensory overstimulation are less likely to integrate mean size information across a set of similar items and are therefore more likely to be biased away from the mean size representation of an ensemble display. We therefore propose the study of ensemble processing should extend beyond the statistics of the display, and should also consider the statistics of the observer. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
A Comparison of Hispanic and White Non-Hispanic Students' Omit Patterns on the Scholastic Aptitude Test.

ERIC Educational Resources Information Center

Rivera, Charlene; Schmitt, Alicia P.

Standardization methodology was used to analyze omitted responses of Hispanic examinees on the Scholastic Aptitude Test. Study or focal groups were 2,956 Mexican-Americans, 3,230 Puerto Ricans, and 278,009 White test-takers. Results indicate that both Mexican-Americans and Puerto Rican students omitted fewer items than White students of comparable…
Response Mixture Modeling: Accounting for Heterogeneity in Item Characteristics across Response Times.

PubMed

Molenaar, Dylan; de Boeck, Paul

2018-06-01

In item response theory modeling of responses and response times, it is commonly assumed that the item responses have the same characteristics across the response times. However, heterogeneity might arise in the data if subjects resort to different response processes when solving the test items. These differences may be within-subject effects, that is, a subject might use a certain process on some of the items and a different process with different item characteristics on the other items. If the probability of using one process over the other process depends on the subject's response time, within-subject heterogeneity of the item characteristics across the response times arises. In this paper, the method of response mixture modeling is presented to account for such heterogeneity. Contrary to traditional mixture modeling where the full response vectors are classified, response mixture modeling involves classification of the individual elements in the response vector. In a simulation study, the response mixture model is shown to be viable in terms of parameter recovery. In addition, the response mixture model is applied to a real dataset to illustrate its use in investigating within-subject heterogeneity in the item characteristics across response times.
A general theoretical framework for interpreting patient-reported outcomes estimated from ordinally scaled item responses.

PubMed

Massof, Robert W

2014-10-01

A simple theoretical framework explains patient responses to items in rating scale questionnaires. Fixed latent variables position each patient and each item on the same linear scale. Item responses are governed by a set of fixed category thresholds, one for each ordinal response category. A patient's item responses are magnitude estimates of the difference between the patient variable and the patient's estimate of the item variable, relative to his/her personally defined response category thresholds. Differences between patients in their personal estimates of the item variable and in their personal choices of category thresholds are represented by random variables added to the corresponding fixed variables. Effects of intervention correspond to changes in the patient variable, the patient's response bias, and/or latent item variables for a subset of items. Intervention effects on patients' item responses were simulated by assuming the random variables are normally distributed with a constant scalar covariance matrix. Rasch analysis was used to estimate latent variables from the simulated responses. The simulations demonstrate that changes in the patient variable and changes in response bias produce indistinguishable effects on item responses and manifest as changes only in the estimated patient variable. Changes in a subset of item variables manifest as intervention-specific differential item functioning and as changes in the estimated person variable that equals the average of changes in the item variables. Simulations demonstrate that intervention-specific differential item functioning produces inefficiencies and inaccuracies in computer adaptive testing. © The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
Science-Technology-Society literacy in college non-majors biology: Comparing problem/case studies based learning and traditional expository methods of instruction

NASA Astrophysics Data System (ADS)

Peters, John S.

This study used a multiple response model (MRM) on selected items from the Views on Science-Technology-Society (VOSTS) survey to examine science-technology-society (STS) literacy among college non-science majors' taught using Problem/Case Studies Based Learning (PBL/CSBL) and traditional expository methods of instruction. An initial pilot investigation of 15 VOSTS items produced a valid and reliable scoring model which can be used to quantitatively assess student literacy on a variety of STS topics deemed important for informed civic engagement in science related social and environmental issues. The new scoring model allows for the use of parametric inferential statistics to test hypotheses about factors influencing STS literacy. The follow-up cross-institutional study comparing teaching methods employed Hierarchical Linear Modeling (HLM) to model the efficiency and equitability of instructional methods on STS literacy. A cluster analysis was also used to compare pre and post course patterns of student views on the set of positions expressed within VOSTS items. HLM analysis revealed significantly higher instructional efficiency in the PBL/CSBL study group for 4 of the 35 STS attitude indices (characterization of media vs. school science; tentativeness of scientific models; cultural influences on scientific research), and more equitable effects of traditional instruction on one attitude index (interdependence of science and technology). Cluster analysis revealed generally stable patterns of pre to post course views across study groups, but also revealed possible teaching method effects on the relationship between the views expressed within VOSTS items with respect to (1) interdependency of science and technology; (2) anti-technology; (3) socioscientific decision-making; (4) scientific/technological solutions to environmental problems; (5) usefulness of school vs. media characterizations of science; (6) social constructivist vs. objectivist views of theories; (7) impact of cultural religious/ethical views on science; (8) tentativeness of scientific models, evidence and predictions; (9) civic control of technological developments. This analysis also revealed common relationships between student views which would not have been revealed under the original unique response model (URM) of VOSTS and also common viewpoint patterns that warrant further qualitative exploration.
The feeding practices and structure questionnaire: construction and initial validation in a sample of Australian first-time mothers and their 2-year olds.

PubMed

Jansen, Elena; Mallan, Kimberley M; Nicholson, Jan M; Daniels, Lynne A

2014-06-04

Early feeding practices lay the foundation for children's eating habits and weight gain. Questionnaires are available to assess parental feeding but overlapping and inconsistent items, subscales and terminology limit conceptual clarity and between study comparisons. Our aim was to consolidate a range of existing items into a parsimonious and conceptually robust questionnaire for assessing feeding practices with very young children (<3 years). Data were from 462 mothers and children (age 21-27 months) from the NOURISH trial. Items from five questionnaires and two study-specific items were submitted to a priori item selection, allocation and verification, before theoretically-derived factors were tested using Confirmatory Factor Analysis. Construct validity of the new factors was examined by correlating these with child eating behaviours and weight. Following expert review 10 factors were specified. Of these, 9 factors (40 items) showed acceptable model fit and internal reliability (Cronbach's α: 0.61-0.89). Four factors reflected non-responsive feeding practices: 'Distrust in Appetite', 'Reward for Behaviour', 'Reward for Eating', and 'Persuasive Feeding'. Five factors reflected structure of the meal environment and limits: 'Structured Meal Setting', 'Structured Meal Timing', 'Family Meal Setting', 'Overt Restriction' and 'Covert Restriction'. Feeding practices generally showed the expected pattern of associations with child eating behaviours but none with weight. The Feeding Practices and Structure Questionnaire (FPSQ) provides a new reliable and valid measure of parental feeding practices, specifically maternal responsiveness to children's hunger/satiety signals facilitated by routine and structure in feeding. Further validation in more diverse samples is required.
Investigating burden of informal caregivers in England, Finland and Greece: an analysis with the short form of the Burden Scale for Family Caregivers (BSFC-s).

PubMed

Konerding, Uwe; Bowen, Tom; Forte, Paul; Karampli, Eleftheria; Malmström, Tomi; Pavi, Elpida; Torkki, Paulus; Graessel, Elmar

2018-02-01

The burden of informal caregivers might show itself in different ways in different cultures. Understanding these differences is important for developing culture-specific measures aimed at alleviating caregiver burden. Hitherto, no findings regarding such cultural differences between different European countries were available. In this paper, differences between English, Finnish and Greek informal caregivers of people with dementia are investigated. A secondary analysis was performed with data from 36 English, 42 Finnish and 46 Greek caregivers obtained with the short form of the Burden Scale for Family Caregivers (BSFC-s). The probabilities of endorsing the BSFC-s items were investigated by computing a logit model with items and countries as categorical factors. Statistically significant deviation of data from this model was taken as evidence for country-specific response patterns. The two-factorial logit model explains the responses to the items quite well (McFadden's pseudo-R-square: 0.77). There are, however, also statistically significant deviations (p < 0.05). English caregivers have a stronger tendency to endorse items addressing impairments in individual well-being; Finnish caregivers have a stronger tendency to endorse items addressing the conflict between the demands resulting from care and demands resulting from the remaining social life and Greek caregivers have a stronger tendency to endorse items addressing impairments in physical health. Caregiver burden shows itself differently in English, Finnish and Greek caregivers. Accordingly, measures for alleviating caregiver burden in these three countries should address different aspects of the caregivers' lives.

Age and gender differences in depression across adolescence: real or 'bias'?

PubMed

van Beek, Yolanda; Hessen, David J; Hutteman, Roos; Verhulp, Esmée E; van Leuven, Mirande

2012-09-01

Since developmental psychologists are interested in explaining age and gender differences in depression across adolescence, it is important to investigate to what extent these observed differences can be attributed to measurement bias. Measurement bias may arise when the phenomenology of depression varies with age or gender, i.e., when younger versus older adolescents or girls versus boys differ in the way depression is experienced or expressed. The Children's Depression Inventory (CDI) was administered to a large school population (N = 4048) aged 8-17 years. A 4-factor model was selected by means of factor analyses for ordered categorical measures. For each of the four factor scales measurement invariance with respect to gender and age (late childhood, early and middle adolescence) was tested using item response theory analyses. Subsequently, to examine which items contributed to measurement bias, all items were studied for differential item functioning (DIF). Finally, it was investigated how developmental patterns changed if measurement biases were accounted for. For each of the factors Self-Deprecation, Dysphoria, School Problems, and Social Problems measurement bias with respect to both gender and age was found and many items showed DIF. Developmental patterns changed profoundly when measurement bias was taken into account. The CDI seemed to particularly overestimate depression in late childhood, and underestimate depression in middle adolescent boys. For scientific as well as clinical use of the CDI, measurement bias with respect to gender and age should be accounted for. © 2012 The Authors. Journal of Child Psychology and Psychiatry © 2012 Association for Child and Adolescent Mental Health.
Parent outcome expectancies for purchasing fruit and vegetables: a validation.

PubMed

Baranowski, Tom; Watson, Kathy; Missaghian, Mariam; Broadfoot, Alison; Baranowski, Janice; Cullen, Karen; Nicklas, Theresa; Fisher, Jennifer; O'Donnell, Sharon

2007-03-01

To validate four scales -- outcome expectancies for purchasing fruit and for purchasing vegetables, and comparative outcome expectancies for purchasing fresh fruit and for purchasing fresh vegetables versus other forms of fruit and vegetables (F&V). Survey instruments were administered twice, separated by 6 weeks. Recruited in front of supermarkets and grocery stores; interviews conducted by telephone. One hundred and sixty-one food shoppers with children (18 years or younger). Single dimension scales were specified for fruit and for vegetable purchasing outcome expectancies, and for comparative (fresh vs. other) fruit and vegetable purchasing outcome expectancies. Item Response Theory parameter estimates revealed easily interpreted patterns in the sequence of items by difficulty of response. Fruit and vegetable purchasing and fresh fruit comparative purchasing outcome expectancy scales were significantly correlated with home F&V availability, after controlling for social desirability of response. Comparative fresh vegetable outcome expectancy scale was significantly bivariately correlated with home vegetable availability, but not after controlling for social desirability. These scales are available to help better understand family F&V purchasing decisions.
Effects of mischievous responding on universal mental health screening: I love rum raisin ice cream, really I do!

PubMed

Furlong, Michael J; Fullchange, Aileen; Dowdy, Erin

2017-09-01

Student surveys are often used for school-based mental health screening; hence, it is critical to evaluate the authenticity of information obtained via the self-report format. The objective of this study was to examine the possible effects of mischievous response patterns on school-based screening results. The present study included 1,857 high school students who completed a schoolwide screening for complete mental health. Student responses were reviewed to detect possible mischievous responses and to examine their association with other survey results. Consistent with previous research, mischievous responding was evaluated by items that are legitimate to ask of all students (e.g., How much do you weigh? and How many siblings do you have?). Responses were considered "mischievous" when a student selected multiple extreme, unusual (less than 5% incidence) response options, such as weighing more than 225 pounds and having 10 or more siblings. Only 1.8% of the students responded in extreme ways to 2 or more of 7 mischievous response items. When compared with other students, the mischievous responders were less likely to declare that they answered items honestly, were more likely to finish the survey in less than 10 min, reported lower levels of life satisfaction and school connectedness, and reported higher levels of emotional and behavioral distress. When applying a dual-factor mental health screening framework to the responses, mischievous responders were less likely to be categorized as having complete mental health. Implications for school-based mental health screening are discussed. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
The Effects of Aging and IQ on Item and Associative Memory

PubMed Central

Ratcliff, Roger; Thapar, Anjali; McKoon, Gail

2011-01-01

The effects of aging and IQ on performance were examined in four memory tasks: item recognition, associative recognition, cued recall, and free recall. For item and associative recognition, accuracy and the response time distributions for correct and error responses were explained by Ratcliff’s (1978) diffusion model, at the level of individual participants. The values of the components of processing identified by the model for the recognition tasks, as well as accuracy for cued and free recall, were compared across levels of IQ ranging from 85 to 140 and age (college-age, 60-74 year olds, and 75-90 year olds). IQ had large effects on the quality of the evidence from memory on which decisions were based in the recognition tasks and accuracy in the recall tasks, except for the oldest participants for whom some of the measures were near floor values. Drift rates in the recognition tasks, accuracy in the recall tasks, and IQ all correlated strongly with each other. However, there was a small decline in drift rates for item recognition and a large decline for associative recognition and accuracy in cued recall (about 70 percent). In contrast, there were large age effects on boundary separation and nondecision time (which correlated across tasks), but little effect of IQ. The implications of these results for single- and dual- process models of item recognition are discussed and it is concluded that models that deal with both RTs and accuracy are subject to many more constraints than models that deal with only one of these measures. Overall, the results of the study show a complicated but interpretable pattern of interactions that present important targets for response time and memory models. PMID:21707207
EFL Learners' Grammatical Awareness through Accumulating Formulaic Sequences of Morphological Structure (-ing)

ERIC Educational Resources Information Center

Kashiwagi, Kazuko; Ito, Yukiko

2017-01-01

Even young EFL learners who have not yet learned L2 grammar will notice language patterns if, when retrieving exemplars ("item-based patterns"), they succeed in making form-meaning connections (FMCs). Item-based patterns, termed formulaic sequences (FS), serve as a basis for creative constructions. Although learners are implicitly…
Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions.

PubMed

Haberman, Shelby J; Sinharay, Sandip; Chon, Kyong Hee

2013-07-01

Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.
The medial temporal lobes distinguish between within-item and item-context relations during autobiographical memory retrieval.

PubMed

Sheldon, Signy; Levine, Brian

2015-12-01

During autobiographical memory retrieval, the medial temporal lobes (MTL) relate together multiple event elements, including object (within-item relations) and context (item-context relations) information, to create a cohesive memory. There is consistent support for a functional specialization within the MTL according to these relational processes, much of which comes from recognition memory experiments. In this study, we compared brain activation patterns associated with retrieving within-item relations (i.e., associating conceptual and sensory-perceptual object features) and item-context relations (i.e., spatial relations among objects) with respect to naturalistic autobiographical retrieval. We developed a novel paradigm that cued participants to retrieve information about past autobiographical events, non-episodic within-item relations, and non-episodic item-context relations with the perceptuomotor aspects of retrieval equated across these conditions. We used multivariate analysis techniques to extract common and distinct patterns of activity among these conditions within the MTL and across the whole brain, both in terms of spatial and temporal patterns of activity. The anterior MTL (perirhinal cortex and anterior hippocampus) was preferentially recruited for generating within-item relations later in retrieval whereas the posterior MTL (posterior parahippocampal cortex and posterior hippocampus) was preferentially recruited for generating item-context relations across the retrieval phase. These findings provide novel evidence for functional specialization within the MTL with respect to naturalistic memory retrieval. © 2015 Wiley Periodicals, Inc.
The Effect of Test and Examinee Characteristics on the Occurrence of Aberrant Response Patterns in a Computerized Adaptive Test

ERIC Educational Resources Information Center

Rizavi, Saba; Hariharan, Swaminathan

2001-01-01

The advantages that computer adaptive testing offers over linear tests have been well documented. The Computer Adaptive Test (CAT) design is more efficient than the Linear test design as fewer items are needed to estimate an examinee's proficiency to a desired level of precision. In the ideal situation, a CAT will result in examinees answering…
Traditional Culture versus Traditional Assessment for American Indian Students: An Investigation of Potential Test Item Bias

ERIC Educational Resources Information Center

Hagie, Marilyn Urquhart; Gallipo, Peggy L.; Svien, Lana

2003-01-01

The Bayley Scales of Infant Development II (BSID-II) and the Wechsler Intelligence Scale for Children-Third Edition (WISC-III) are frequently used across cultures in standard assessment batteries for learners between 6 and 17 years of age, respectively. Responses of American Indian students on the BSID-II and WISC-III were examined for patterns of…
Pre-Experimental Familiarization Increases Hippocampal Activity for Both Targets and Lures in Recognition Memory: An fMRI Study

ERIC Educational Resources Information Center

de Zubicaray, Greig I.; McMahon, Katie L.; Hayward, Lydia; Dunn, John C.

2011-01-01

In the present study, items pre-exposed in a familiarization series were included in a list discrimination task to manipulate memory strength. At test, participants were required to discriminate strong targets and strong lures from weak targets and new lures. This resulted in a concordant pattern of increased "old" responses to strong targets and…
Stimulus exposure and gaze bias: a further test of the gaze cascade model.

PubMed

Glaholt, Mackenzie G; Reingold, Eyal M

2009-04-01

We tested predictions derived from the gaze cascade model of preference decision making (Shimojo, Simion, Shimojo, & Scheier, 2003; Simion & Shimojo, 2006, 2007). In each trial, participants' eye movements were monitored while they performed an eight-alternative decision task in which four of the items in the array were preexposed prior to the trial. Replicating previous findings, we found a gaze bias toward the chosen item prior to the response. However, contrary to the prediction of the gaze cascade model, preexposure of stimuli decreased, rather than increased, the magnitude of the gaze bias in preference decisions. Furthermore, unlike the prediction of the model, preexposure did not affect the likelihood of an item being chosen, and the pattern of looking behavior in preference decisions and on a non preference control task was remarkably similar. Implications of the present findings in multistage models of decision making are discussed.
An NCME Instructional Module on Polytomous Item Response Theory Models

ERIC Educational Resources Information Center

Penfield, Randall David

2014-01-01

A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of…
Sexual orientation in the 2013 national health interview survey: a quality assessment.

PubMed

Dahlhamer, James M; Galinsky, Adena M; Joestl, Sarah S; Ward, Brian W

2014-12-01

Objective-This report presents a set of quality analyses of sexual orientation data collected in the 2013 National Health Interview Survey (NHIS). NHIS sexual orientation estimates are compared with those from the National Survey of Family Growth (NSFG) and the National Health and Nutrition Examination Survey (NHANES). Selected health outcomes by sexual orientation are compared between NHIS and NSFG. Assessments of item nonresponse, item response times, and responses to follow-up questions to the sexual orientation question are also presented. Methods-NHIS is a multipurpose health survey conducted continuously throughout the year by the Centers for Disease Control and Prevention's National Center for Health Statistics. Analyses in this report were based on NHIS data collected in 2013 from 34,557 adults aged 18 and over. Sampling weights were used to produce national estimates that are representative of the civilian noninstitutionalized U.S. adult population. Data from the 2006-2010 NSFG and 2009-2012 NHANES were used for the comparisons. Results-Based on the 2013 NHIS data, 96.6% of adults identified as straight, 1.6% identified as gay/lesbian, and 0.7% identified as bisexual. The remaining 1.1% of adults identified as ''something else,'' stated ''I don't know the answer,'' or refused to answer. Responses to follow-up questions suggest that the sexual orientation question is producing little classification error. In addition, largely similar patterns of association between sexual orientation and health were observed for NHIS and NSFG. Analyses of item nonresponse rates revealed few data quality issues, although item response times suggest possible shortcutting of the question and comprehension problems for select respondents. All material appearing in this report is in the public domain and may be reproduced or copied without permission; citation as to source, however, is appreciated.
Ramsay-Curve Item Response Theory for the Three-Parameter Logistic Item Response Model

ERIC Educational Resources Information Center

Woods, Carol M.

2008-01-01

In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters of a unidimensional item response model using marginal maximum likelihood estimation. This study evaluates RC-IRT for the three-parameter logistic (3PL) model with comparisons to the normal model and to the empirical…
Using the Nominal Response Model to Evaluate Response Category Discrimination in the PROMIS Emotional Distress Item Pools

ERIC Educational Resources Information Center

Preston, Kathleen; Reise, Steven; Cai, Li; Hays, Ron D.

2011-01-01

The authors used a nominal response item response theory model to estimate category boundary discrimination (CBD) parameters for items drawn from the Emotional Distress item pools (Depression, Anxiety, and Anger) developed in the Patient-Reported Outcomes Measurement Information Systems (PROMIS) project. For polytomous items with ordered response…
A Bifactor Multidimensional Item Response Theory Model for Differential Item Functioning Analysis on Testlet-Based Items

ERIC Educational Resources Information Center

Fukuhara, Hirotaka; Kamata, Akihito

2011-01-01

A differential item functioning (DIF) detection method for testlet-based data was proposed and evaluated in this study. The proposed DIF model is an extension of a bifactor multidimensional item response theory (MIRT) model for testlets. Unlike traditional item response theory (IRT) DIF models, the proposed model takes testlet effects into…
Item Response Models for Examinee-Selected Items

ERIC Educational Resources Information Center

Wang, Wen-Chung; Jin, Kuan-Yu; Qiu, Xue-Lan; Wang, Lei

2012-01-01

In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice…
Detecting Differential Item Discrimination (DID) and the Consequences of Ignoring DID in Multilevel Item Response Models

ERIC Educational Resources Information Center

Lee, Woo-yeol; Cho, Sun-Joo

2017-01-01

Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…
An NCME Instructional Module on Latent DIF Analysis Using Mixture Item Response Models

ERIC Educational Resources Information Center

Cho, Sun-Joo; Suh, Youngsuk; Lee, Woo-yeol

2016-01-01

The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called…
Forced-Choice Assessment of Work-Related Maladaptive Personality Traits: Preliminary Evidence From an Application of Thurstonian Item Response Modeling.

PubMed

Guenole, Nigel; Brown, Anna A; Cooper, Andrew J

2018-06-01

This article describes an investigation of whether Thurstonian item response modeling is a viable method for assessment of maladaptive traits. Forced-choice responses from 420 working adults to a broad-range personality inventory assessing six maladaptive traits were considered. The Thurstonian item response model's fit to the forced-choice data was adequate, while the fit of a counterpart item response model to responses to the same items but arranged in a single-stimulus design was poor. Monotrait heteromethod correlations indicated corresponding traits in the two formats overlapped substantially, although they did not measure equivalent constructs. A better goodness of fit and higher factor loadings for the Thurstonian item response model, coupled with a clearer conceptual alignment to the theoretical trait definitions, suggested that the single-stimulus item responses were influenced by biases that the independent clusters measurement model did not account for. Researchers may wish to consider forced-choice designs and appropriate item response modeling techniques such as Thurstonian item response modeling for personality questionnaire applications in industrial psychology, especially when assessing maladaptive traits. We recommend further investigation of this approach in actual selection situations and with different assessment instruments.

Revisiting the fear of snakes in children: the role of aposematic signalling.

PubMed

Souchet, Jérémie; Aubret, Fabien

2016-11-25

Why humans fear snakes is an old, yet unresolved debate. Its innate origin from evolutionary causes is debated against the powerful influence early experience, culture, media and religion may have on people's aversion to snakes. Here we show that the aversion to snakes in human beings may have been mistaken for an aversion to aposematic signals that are commonly displayed by snakes. A total of 635 children were asked to rate single item images as "nice" or "mean". Snakes, pets and smiley emoticon items were not rated as "mean" unless they displayed subtle aposematic signals in the form of triangular (rather than round) shapes. Another 722 children were shown images featuring two items and asked which item was "nice" and which item was "mean". This context dependent comparison triggered even sharper responses to aposematic signals. We hypothesise that early primates evolved an aversion for aposematic signals in the form of potentially harmful triangular shapes such as teeth, claws or spikes, not for snakes per se. Further, we hypothesise that this adaptation was in turn exploited by snakes in their anti-predatory threat display as a triangular head or dorsal zig-zag pattern, and is currently the basis for efficient international road-danger signalling.
Development of a Computer-Adaptive Physical Function Instrument for Social Security Administration Disability Determination

PubMed Central

Ni, Pengsheng; McDonough, Christine M.; Jette, Alan M.; Bogusz, Kara; Marfeo, Elizabeth E.; Rasch, Elizabeth K.; Brandt, Diane E.; Meterko, Mark; Chan, Leighton

2014-01-01

Objectives To develop and test an instrument to assess physical function (PF) for Social Security Administration (SSA) disability programs, the SSA-PF. Item Response Theory (IRT) analyses were used to 1) create a calibrated item bank for each of the factors identified in prior factor analyses, 2) assess the fit of the items within each scale, 3) develop separate Computer-Adaptive Test (CAT) instruments for each scale, and 4) conduct initial psychometric testing. Design Cross-sectional data collection; IRT analyses; CAT simulation. Setting Telephone and internet survey. Participants Two samples: 1,017 SSA claimants, and 999 adults from the US general population. Interventions None. Main Outcome Measure Model fit statistics, correlation and reliability coefficients, Results IRT analyses resulted in five unidimensional SSA-PF scales: Changing & Maintaining Body Position, Whole Body Mobility, Upper Body Function, Upper Extremity Fine Motor, and Wheelchair Mobility for a total of 102 items. High CAT accuracy was demonstrated by strong correlations between simulated CAT scores and those from the full item banks. Comparing the simulated CATs to the full item banks, very little loss of reliability or precision was noted, except at the lower and upper ranges of each scale. No difference in response patterns by age or sex was noted. The distributions of claimant scores were shifted to the lower end of each scale compared to those of a sample of US adults. Conclusions The SSA-PF instrument contributes important new methodology for measuring the physical function of adults applying to the SSA disability programs. Initial evaluation revealed that the SSA-PF instrument achieved considerable breadth of coverage in each content domain and demonstrated noteworthy psychometric properties. PMID:23578594
The feeding practices and structure questionnaire: construction and initial validation in a sample of Australian first-time mothers and their 2-year olds

PubMed Central

2014-01-01

Background Early feeding practices lay the foundation for children’s eating habits and weight gain. Questionnaires are available to assess parental feeding but overlapping and inconsistent items, subscales and terminology limit conceptual clarity and between study comparisons. Our aim was to consolidate a range of existing items into a parsimonious and conceptually robust questionnaire for assessing feeding practices with very young children (<3 years). Methods Data were from 462 mothers and children (age 21–27 months) from the NOURISH trial. Items from five questionnaires and two study-specific items were submitted to a priori item selection, allocation and verification, before theoretically-derived factors were tested using Confirmatory Factor Analysis. Construct validity of the new factors was examined by correlating these with child eating behaviours and weight. Results Following expert review 10 factors were specified. Of these, 9 factors (40 items) showed acceptable model fit and internal reliability (Cronbach’s α: 0.61-0.89). Four factors reflected non-responsive feeding practices: ‘Distrust in Appetite’, ‘Reward for Behaviour’, ‘Reward for Eating’, and ‘Persuasive Feeding’. Five factors reflected structure of the meal environment and limits: ‘Structured Meal Setting’, ‘Structured Meal Timing’, ‘Family Meal Setting’, ‘Overt Restriction’ and ‘Covert Restriction’. Feeding practices generally showed the expected pattern of associations with child eating behaviours but none with weight. Conclusion The Feeding Practices and Structure Questionnaire (FPSQ) provides a new reliable and valid measure of parental feeding practices, specifically maternal responsiveness to children’s hunger/satiety signals facilitated by routine and structure in feeding. Further validation in more diverse samples is required. PMID:24898364
Development of a computer-adaptive physical function instrument for Social Security Administration disability determination.

PubMed

Ni, Pengsheng; McDonough, Christine M; Jette, Alan M; Bogusz, Kara; Marfeo, Elizabeth E; Rasch, Elizabeth K; Brandt, Diane E; Meterko, Mark; Haley, Stephen M; Chan, Leighton

2013-09-01

To develop and test an instrument to assess physical function for Social Security Administration (SSA) disability programs, the SSA-Physical Function (SSA-PF) instrument. Item response theory (IRT) analyses were used to (1) create a calibrated item bank for each of the factors identified in prior factor analyses, (2) assess the fit of the items within each scale, (3) develop separate computer-adaptive testing (CAT) instruments for each scale, and (4) conduct initial psychometric testing. Cross-sectional data collection; IRT analyses; CAT simulation. Telephone and Internet survey. Two samples: SSA claimants (n=1017) and adults from the U.S. general population (n=999). None. Model fit statistics, correlation, and reliability coefficients. IRT analyses resulted in 5 unidimensional SSA-PF scales: Changing & Maintaining Body Position, Whole Body Mobility, Upper Body Function, Upper Extremity Fine Motor, and Wheelchair Mobility for a total of 102 items. High CAT accuracy was demonstrated by strong correlations between simulated CAT scores and those from the full item banks. On comparing the simulated CATs with the full item banks, very little loss of reliability or precision was noted, except at the lower and upper ranges of each scale. No difference in response patterns by age or sex was noted. The distributions of claimant scores were shifted to the lower end of each scale compared with those of a sample of U.S. adults. The SSA-PF instrument contributes important new methodology for measuring the physical function of adults applying to the SSA disability programs. Initial evaluation revealed that the SSA-PF instrument achieved considerable breadth of coverage in each content domain and demonstrated noteworthy psychometric properties. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
A Quasi-Parametric Method for Fitting Flexible Item Response Functions

ERIC Educational Resources Information Center

Liang, Longjuan; Browne, Michael W.

2015-01-01

If standard two-parameter item response functions are employed in the analysis of a test with some newly constructed items, it can be expected that, for some items, the item response function (IRF) will not fit the data well. This lack of fit can also occur when standard IRFs are fitted to personality or psychopathology items. When investigating…
A knowledge-based theory of rising scores on "culture-free" tests.

PubMed

Fox, Mark C; Mitchum, Ainsley L

2013-08-01

Secular gains in intelligence test scores have perplexed researchers since they were documented by Flynn (1984, 1987). Gains are most pronounced on abstract, so-called culture-free tests, prompting Flynn (2007) to attribute them to problem-solving skills availed by scientifically advanced cultures. We propose that recent-born individuals have adopted an approach to analogy that enables them to infer higher level relations requiring roles that are not intrinsic to the objects that constitute initial representations of items. This proposal is translated into item-specific predictions about differences between cohorts in pass rates and item-response patterns on the Raven's Matrices (Flynn, 1987), a seemingly culture-free test that registers the largest Flynn effect. Consistent with predictions, archival data reveal that individuals born around 1940 are less able to map objects at higher levels of relational abstraction than individuals born around 1990. Polytomous Rasch models verify predicted violations of measurement invariance, as raw scores are found to underestimate the number of analogical rules inferred by members of the earlier cohort relative to members of the later cohort who achieve the same overall score. The work provides a plausible cognitive account of the Flynn effect, furthers understanding of the cognition of matrix reasoning, and underscores the need to consider how test-takers select item responses. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Qualitative Development of the PROMIS® Pediatric Stress Response Item Banks

PubMed Central

Gardner, William; Pajer, Kathleen; Riley, Anne W.; Forrest, Christopher B.

2013-01-01

Objective To describe the qualitative development of the Patient-Reported Outcome Measurement Information System (PROMIS®) Pediatric Stress Response item banks. Methods Stress response concepts were specified through a literature review and interviews with content experts, children, and parents. A library comprising 2,677 items derived from 71 instruments was developed. Items were classified into conceptual categories; new items were written and redundant items were removed. Items were then revised based on cognitive interviews (n = 39 children), readability analyses, and translatability reviews. Results 2 pediatric Stress Response sub-domains were identified: somatic experiences (43 items) and psychological experiences (64 items). Final item pools cover the full range of children’s stress experiences. Items are comprehensible among children aged ≥8 years and ready for translation. Conclusions Child- and parent-report versions of the item banks assess children’s somatic and psychological states when demands tax their adaptive capabilities. PMID:23124904
On the Complexity of Item Response Theory Models.

PubMed

Bonifay, Wes; Cai, Li

2017-01-01

Complexity in item response theory (IRT) has traditionally been quantified by simply counting the number of freely estimated parameters in the model. However, complexity is also contingent upon the functional form of the model. We examined four popular IRT models-exploratory factor analytic, bifactor, DINA, and DINO-with different functional forms but the same number of free parameters. In comparison, a simpler (unidimensional 3PL) model was specified such that it had 1 more parameter than the previous models. All models were then evaluated according to the minimum description length principle. Specifically, each model was fit to 1,000 data sets that were randomly and uniformly sampled from the complete data space and then assessed using global and item-level fit and diagnostic measures. The findings revealed that the factor analytic and bifactor models possess a strong tendency to fit any possible data. The unidimensional 3PL model displayed minimal fitting propensity, despite the fact that it included an additional free parameter. The DINA and DINO models did not demonstrate a proclivity to fit any possible data, but they did fit well to distinct data patterns. Applied researchers and psychometricians should therefore consider functional form-and not goodness-of-fit alone-when selecting an IRT model.
Measuring coping in parents of children with disabilities: a rasch model approach.

PubMed

Gothwal, Vijaya K; Bharani, Seelam; Reddy, Shailaja P

2015-01-01

Parents of a child with disability must cope with greater demands than those living with a healthy child. Coping refers to a person's cognitive or behavioral efforts to manage the demands of a stressful situation. The Coping Health Inventory for Parents (CHIP) is a well-recognized measure of coping among parents of chronically ill children and assesses different coping patterns using its three subscales. The purpose of this study was to provide further insights into the psychometric properties of the CHIP subscales in a sample of parents of children with disabilities. In this cross-sectional study, 220 parents (mean age, 33.4 years; 85% mothers) caring for a child with disability enrolled in special schools as well as in mainstream schools completed the 45-item CHIP. Rasch analysis was applied to the CHIP data and the psychometric performance of each of the three subscales was tested. Subscale revision was performed in the context of Rasch analysis statistics. Response categories were not used as intended, necessitating combining categories, thereby reducing the number from 4 to 3. The subscale - 'maintaining social support' satisfied all the Rasch model expectations. Four item misfit the Rasch model in the subscale -maintaining family integration', but their deletion resulted in a 15-item scale with items that fit the Rasch model well. The remaining subscale - 'understanding the healthcare situation' lacked adequate measurement precision (<2.0 logits). The current Rasch analyses add to the evidence of measurement properties of the CHIP and show that the two of its subscales (one original and the other revised) have good psychometric properties and work well to measure coping patterns in parents of children with disabilities. However the third subscale is limited by its inadequate measurement precision and requires more items.
Practical methods for dealing with 'not applicable' item responses in the AMC Linear Disability Score project

PubMed Central

Holman, Rebecca; Glas, Cees AW; Lindeboom, Robert; Zwinderman, Aeilko H; de Haan, Rob J

2004-01-01

Background Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. Methods The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. Results The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. Conclusions The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used. PMID:15200681
Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns.

PubMed

Wolfe, Edward W; McGill, Michael T

2011-01-01

This article summarizes a simulation study of the performance of five item quality indicators (the weighted and unweighted versions of the mean square and standardized mean square fit indices and the point-measure correlation) under conditions of relatively high and low amounts of missing data under both random and conditional patterns of missing data for testing contexts such as those encountered in operational administrations of a computerized adaptive certification or licensure examination. The results suggest that weighted fit indices, particularly the standardized mean square index, and the point-measure correlation provide the most consistent information between random and conditional missing data patterns and that these indices perform more comparably for items near the passing score than for items with extreme difficulty values.
Development of the functional vision questionnaire for children and young people with visual impairment: the FVQ_CYP.

PubMed

Tadić, Valerija; Cooper, Andrew; Cumberland, Phillippa; Lewando-Hundt, Gillian; Rahi, Jugnoo S

2013-12-01

To develop a novel age-appropriate measure of functional vision (FV) for self-reporting by visually impaired (VI) children and young people. Questionnaire development. A representative patient sample of VI children and young people aged 10 to 15 years, visual acuity of the logarithm of the minimum angle of resolution (logMAR) worse than 0.48, and a school-based (nonrandom) expert group sample of VI students aged 12 to 17 years. A total of 32 qualitative semistructured interviews supplemented by narrative feedback from 15 eligible VI children and young people were used to generate draft instrument items. Seventeen VI students were consulted individually on item relevance and comprehensibility, instrument instructions, format, and administration methods. The resulting draft instrument was piloted with 101 VI children and young people comprising a nationally representative sample, drawn from 21 hospitals in the United Kingdom. Initial item reduction was informed by presence of missing data and individual item response pattern. Exploratory factor analysis (FA) and parallel analysis (PA), and Rasch analysis (RA) were applied to test the instrument's psychometric properties. Psychometric indices and validity assessment of the Functional Vision Questionnaire for Children and Young People (FVQ_CYP). A total of 712 qualitative statements became a 56-item draft scale, capturing the level of difficulty in performing vision-dependent activities. After piloting, items were removed iteratively as follows: 11 for high percentage of missing data, 4 for skewness, and 1 for inadequate item infit and outfit values in RA, 3 having shown differential item functioning across age groups and 1 across gender in RA. The remaining 36 items showed item fit values within acceptable limits, good measurement precision and targeting, and ordered response categories. The reduced scale has a clear unidimensional structure, with all items having a high factor loading on the single factor in FA and PA. The summary scores correlated significantly with visual acuity. We have developed a novel, psychometrically robust self-report questionnaire for children and young people-the FVQ_CYP-that captures the functional impact of visual disability from their perspective. The 36-item, 4-point unidimensional scale has potential as a complementary adjunct to objective clinical assessments in routine pediatric ophthalmology practice and in research. Copyright © 2013 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
A Monte Carlo Study Investigating the Influence of Item Discrimination, Category Intersection Parameters, and Differential Item Functioning Patterns on the Detection of Differential Item Functioning in Polytomous Items

ERIC Educational Resources Information Center

Thurman, Carol

2009-01-01

The increased use of polytomous item formats has led assessment developers to pay greater attention to the detection of differential item functioning (DIF) in these items. DIF occurs when an item performs differently for two contrasting groups of respondents (e.g., males versus females) after controlling for differences in the abilities of the…
Effects of hemisphere speech dominance and seizure focus on patterns of behavioral response errors for three types of stimuli.

PubMed

Rausch, R; MacDonald, K

1997-03-01

We used a protocol consisting of a continuous presentation of stimuli with associated response requests during an intracarotid sodium amobarbital procedure (IAP) to study the effects of hemisphere injected (speech dominant vs. nondominant) and seizure focus (left temporal lobe vs. right temporal lobe) on the pattern of behavioral response errors for three types of visual stimuli (pictures of common objects, words, and abstract forms). Injection of the left speech dominant hemisphere compared to the right nondominant hemisphere increased overall errors and affected the pattern of behavioral errors. The presence of a seizure focus in the contralateral hemisphere increased overall errors, particularly for the right temporal lobe seizure patients, but did not affect the pattern of behavioral errors. Left hemisphere injections disrupted both naming and reading responses at a rate similar to that of matching-to-sample performance. Also, a short-term memory deficit was observed with all three stimuli. Long-term memory testing following the left hemisphere injection indicated that only for pictures of common objects were there fewer errors during the early postinjection period than for the later long-term memory testing. Therefore, despite the inability to respond to picture stimuli, picture items, but not words or forms, could be sufficiently encoded for later recall. In contrast, right hemisphere injections resulted in few errors, with a pattern suggesting a mild general cognitive decrease. A selective weakness in learning unfamiliar forms was found. Our findings indicate that different patterns of behavioral deficits occur following the left vs. right hemisphere injections, with selective patterns specific to stimulus type.
Stochastic Approximation Methods for Latent Regression Item Response Models

ERIC Educational Resources Information Center

von Davier, Matthias; Sinharay, Sandip

2010-01-01

This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items

ERIC Educational Resources Information Center

Aybek, Eren Can; Demirtasli, R. Nukhet

2017-01-01

This article aims to provide a theoretical framework for computerized adaptive tests (CAT) and item response theory models for polytomous items. Besides that, it aims to introduce the simulation and live CAT software to the related researchers. Computerized adaptive test algorithm, assumptions of item response theory models, nominal response…
An Evaluation of "Intentional" Weighting of Extended-Response or Constructed-Response Items in Tests with Mixed Item Types.

ERIC Educational Resources Information Center

Ito, Kyoko; Sykes, Robert C.

This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single…
Sources of Response Bias in Older Ethnic Minorities: A Case of Korean American Elderly

PubMed Central

Kim, Miyong T.; Ko, Jisook; Yoon, Hyunwoo; Kim, Kim B.; Jang, Yuri

2015-01-01

The present study was undertaken to investigate potential sources of response bias in empirical research involving older ethnic minorities and to identify prudent strategies to reduce those biases, using Korean American elderly (KAE) as an example. Data were obtained from three independent studies of KAE (N=1,297; age ≥60) in three states (Florida, New York, and Maryland) from 2000 to 2008. Two common measures, Pearlin’s Mastery Scale and the CES-D scale, were selected for a series of psychometric tests based on classical measurement theory. Survey items were analyzed in depth, using psychometric properties generated from both exploratory factor analysis and confirmatory factor analysis as well as correlational analysis. Two types of potential sources of bias were identified as the most significant contributors to increases in error variances for these psychological instruments. Error variances were most prominent when (1) items were not presented in a manner that was culturally or contextually congruent with respect to the target population and/or (2) the response anchors for items were mixed (e.g., positive vs. negative). The systemic patterns and magnitudes of the biases were also cross-validated for the three studies. The results demonstrate sources and impacts of measurement biases in studies of older ethnic minorities. The identified response biases highlight the need for re-evaluation of current measurement practices, which are based on traditional recommendations that response anchors should be mixed or that the original wording of instruments should be rigidly followed. Specifically, systematic guidelines for accommodating cultural and contextual backgrounds into instrument design are warranted. PMID:26049971
Sources of Response Bias in Older Ethnic Minorities: A Case of Korean American Elderly.

PubMed

Kim, Miyong T; Lee, Ju-Young; Ko, Jisook; Yoon, Hyunwoo; Kim, Kim B; Jang, Yuri

2015-09-01

The present study was undertaken to investigate potential sources of response bias in empirical research involving older ethnic minorities and to identify prudent strategies to reduce those biases, using Korean American elderly (KAE) as an example. Data were obtained from three independent studies of KAE (N = 1,297; age ≥60) in three states (Florida, New York, and Maryland) from 2000 to 2008. Two common measures, Pearlin's Mastery Scale and the CES-D scale, were selected for a series of psychometric tests based on classical measurement theory. Survey items were analyzed in depth, using psychometric properties generated from both exploratory factor analysis and confirmatory factor analysis as well as correlational analysis. Two types of potential sources of bias were identified as the most significant contributors to increases in error variances for these psychological instruments. Error variances were most prominent when (1) items were not presented in a manner that was culturally or contextually congruent with respect to the target population and/or (2) the response anchors for items were mixed (e.g., positive vs. negative). The systemic patterns and magnitudes of the biases were also cross-validated for the three studies. The results demonstrate sources and impacts of measurement biases in studies of older ethnic minorities. The identified response biases highlight the need for re-evaluation of current measurement practices, which are based on traditional recommendations that response anchors should be mixed or that the original wording of instruments should be rigidly followed. Specifically, systematic guidelines for accommodating cultural and contextual backgrounds into instrument design are warranted.
Item Feature Effects in Evolution Assessment

ERIC Educational Resources Information Center

Nehm, Ross H.; Ha, Minsu

2011-01-01

Despite concerted efforts by science educators to understand patterns of evolutionary reasoning in science students and teachers, the vast majority of evolution education studies have failed to carefully consider or control for item feature effects in knowledge measurement. Our study explores whether robust contextualization patterns emerge within…

Sequential dependencies in recall of sequences: filling in the blanks.

PubMed

Farrell, Simon; Hurlstone, Mark J; Lewandowsky, Stephan

2013-08-01

Sequential dependencies can provide valuable information about the processes supporting memory, particularly memory for serial order. Earlier analyses have suggested that anticipation errors-reporting items ahead of their correct position in the sequence-tend to be followed by recall of the displaced item, consistent with primacy gradient models of serial recall. However, a more recent analysis instead suggests that anticipation errors are followed by further anticipation errors, consistent with chaining models. We report analyses of 21 conditions from published serial recall data sets, in which we observed a systematic pattern whereby anticipations tended to be followed by the "filling in" of displaced items. We note that cases where a different pattern held tended to apply to recall of longer lists under serial learning conditions or to conditions where participants were free to skip over items. Although the different patterns that can be observed might imply a dissociation (e.g., between short- and long-term memory), we show that these different patterns are naturally predicted by Farrell's (Psychological Review 119:223-271, 2012) model of short-term and episodic memory and relate to whether or not spontaneously formed groups of items can be skipped over during recall.
A Comparison of Linking and Concurrent Calibration under the Graded Response Model.

ERIC Educational Resources Information Center

Kim, Seock-Ho; Cohen, Allan S.

Applications of item response theory to practical testing problems including equating, differential item functioning, and computerized adaptive testing, require that item parameter estimates be placed onto a common metric. In this study, two methods for developing a common metric for the graded response model under item response theory were…
Writing, Evaluating and Assessing Data Response Items in Economics.

ERIC Educational Resources Information Center

Trotman-Dickenson, D. I.

1989-01-01

Describes some of the problems in writing data response items in economics for use by A Level and General Certificate of Secondary Education (GCSE) students. Examines the experience of two series of workshops on writing items, evaluating them and assessing responses from schools. Offers suggestions for producing packages of data response items as…
Item Response Modeling with Sum Scores

ERIC Educational Resources Information Center

Johnson, Timothy R.

2013-01-01

One of the distinctions between classical test theory and item response theory is that the former focuses on sum scores and their relationship to true scores, whereas the latter concerns item responses and their relationship to latent scores. Although item response theory is often viewed as the richer of the two theories, sum scores are still…
A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means

ERIC Educational Resources Information Center

Polak, Marike; De Rooij, Mark; Heiser, Willem J.

2012-01-01

In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) "criterion…
Investigation of relative risk estimates from studies of the same population with contrasting response rates and designs.

PubMed

Mealing, Nicole M; Banks, Emily; Jorm, Louisa R; Steel, David G; Clements, Mark S; Rogers, Kris D

2010-04-01

There is little empirical evidence regarding the generalisability of relative risk estimates from studies which have relatively low response rates or are of limited representativeness. The aim of this study was to investigate variation in exposure-outcome relationships in studies of the same population with different response rates and designs by comparing estimates from the 45 and Up Study, a population-based cohort study (self-administered postal questionnaire, response rate 18%), and the New South Wales Population Health Survey (PHS) (computer-assisted telephone interview, response rate ~60%). Logistic regression analysis of questionnaire data from 45 and Up Study participants (n = 101,812) and 2006/2007 PHS participants (n = 14,796) was used to calculate prevalence estimates and odds ratios (ORs) for comparable variables, adjusting for age, sex and remoteness. ORs were compared using Wald tests modelling each study separately, with and without sampling weights. Prevalence of some outcomes (smoking, private health insurance, diabetes, hypertension, asthma) varied between the two studies. For highly comparable questionnaire items, exposure-outcome relationship patterns were almost identical between the studies and ORs for eight of the ten relationships examined did not differ significantly. For questionnaire items that were only moderately comparable, the nature of the observed relationships did not differ materially between the two studies, although many ORs differed significantly. These findings show that for a broad range of risk factors, two studies of the same population with varying response rate, sampling frame and mode of questionnaire administration yielded consistent estimates of exposure-outcome relationships. However, ORs varied between the studies where they did not use identical questionnaire items.
Prevalence of responsible hospitality policies in licensed premises that are associated with alcohol-related harm.

PubMed

Daly, Justine B; Campbell, Elizabeth M; Wiggers, John H; Considine, Robyn J

2002-06-01

This study aimed to determine the prevalence of responsible hospitality policies in a group of licensed premises associated with alcohol-related harm. During March 1999, 108 licensed premises with one or more police-identified alcohol-related incidents in the previous 3 months received a visit from a police officer. A 30-item audit checklist was used to determine the responsible hospitality policies being undertaken by each premises within eight policy domains: display required signage (three items); responsible host practices to prevent intoxication and under-age drinking (five items); written policies and guidelines for responsible service (three items); discouraging inappropriate promotions (three items); safe transport (two items); responsible management issues (seven items); physical environment (three items) and entry conditions (four items). No premises were undertaking all 30 items. Eighty per cent of the premises were undertaking 20 of the 30 items. All premises were undertaking at least 17 of the items. The proportion of premises undertaking individual items ranged from 16% to 100%. Premises were less likely to report having and providing written responsible hospitality documentation to staff, using door charges and having entry/re-entry rules. Significant differences between rural and urban premises were evident for four policies. Clubs were significantly more likely than hotels to have a written responsible service of alcohol policy and to clearly display codes of dress and conditions of entry. This study provides an indication of the extent and nature of responsible hospitality policies in a sample of licensed premises that are associated with a broad range of alcohol related harms. The finding that a large majority of such premises appear to adopt responsible hospitality policies suggests a need to assess the validity and reliability of tools used in the routine assessment of such policies, and of the potential for harm from licensed premises.
Item Response Data Analysis Using Stata Item Response Theory Package

ERIC Educational Resources Information Center

Yang, Ji Seung; Zheng, Xiaying

2018-01-01

The purpose of this article is to introduce and review the capability and performance of the Stata item response theory (IRT) package that is available from Stata v.14, 2015. Using a simulated data set and a publicly available item response data set extracted from Programme of International Student Assessment, we review the IRT package from…
Item Response Models for Local Dependence among Multiple Ratings

ERIC Educational Resources Information Center

Wang, Wen-Chung; Su, Chi-Ming; Qiu, Xue-Lan

2014-01-01

Ratings given to the same item response may have a stronger correlation than those given to different item responses, especially when raters interact with one another before giving ratings. The rater bundle model was developed to account for such local dependence by forming multiple ratings given to an item response as a bundle and assigning…
Item response theory - A first approach

NASA Astrophysics Data System (ADS)

Nunes, Sandra; Oliveira, Teresa; Oliveira, Amílcar

2017-07-01

The Item Response Theory (IRT) has become one of the most popular scoring frameworks for measurement data, frequently used in computerized adaptive testing, cognitively diagnostic assessment and test equating. According to Andrade et al. (2000), IRT can be defined as a set of mathematical models (Item Response Models - IRM) constructed to represent the probability of an individual giving the right answer to an item of a particular test. The number of Item Responsible Models available to measurement analysis has increased considerably in the last fifteen years due to increasing computer power and due to a demand for accuracy and more meaningful inferences grounded in complex data. The developments in modeling with Item Response Theory were related with developments in estimation theory, most remarkably Bayesian estimation with Markov chain Monte Carlo algorithms (Patz & Junker, 1999). The popularity of Item Response Theory has also implied numerous overviews in books and journals, and many connections between IRT and other statistical estimation procedures, such as factor analysis and structural equation modeling, have been made repeatedly (Van der Lindem & Hambleton, 1997). As stated before the Item Response Theory covers a variety of measurement models, ranging from basic one-dimensional models for dichotomously and polytomously scored items and their multidimensional analogues to models that incorporate information about cognitive sub-processes which influence the overall item response process. The aim of this work is to introduce the main concepts associated with one-dimensional models of Item Response Theory, to specify the logistic models with one, two and three parameters, to discuss some properties of these models and to present the main estimation procedures.
Tracking competition and cognitive control during language comprehension with multi-voxel pattern analysis

PubMed Central

Musz, Elizabeth; Thompson-Schill, Sharon L.

2017-01-01

To successfully comprehend a sentence that contains a homonym, readers must select the ambiguous word’s context-appropriate meaning. The outcome of this process is influenced both by top-down contextual support and bottom-up, word-specific characteristics. We examined how these factors jointly affect the neural signatures of lexical ambiguity resolution. We measured the similarity between multi-voxel patterns evoked by the same homonym in two distinct linguistic contexts: once after subjects read sentences that biased interpretation toward each homonym’s most frequent, dominant meaning, and again after interpretation was biased toward a weaker, subordinate meaning. We predicted that, following a subordinate-biasing context, the dominant yet inappropriate meaning would nevertheless compete for activation, manifesting in increased similarity between the neural patterns evoked by the two word meanings. In left anterior temporal lobe (ATL), degree of within-word pattern similarity was positively predicted by the association strength of each homonym’s dominant meaning. Further, within-word pattern similarity in left ATL was negatively predicted by item-specific responses in a region of left ventrolateral prefrontal cortex (VLPFC) sensitive to semantic conflict. These findings have implications for psycholinguistic models of lexical ambiguity resolution, and for the role of left VLPFC function during this process. Moreover, these findings demonstrate the utility of item-level, similarity-based analyses of fMRI data for our understanding of competition between co-activated word meanings during language comprehension. PMID:27898341
Exploring the role of contextual information in bloodstain pattern analysis: A qualitative approach.

PubMed

Osborne, Nikola K P; Taylor, Michael C; Zajac, Rachel

2016-03-01

During Bloodstain Pattern Analysis (BPA), an analyst may encounter various sources of contextual information. Although contextual bias has emerged as a valid concern for the discipline, little is understood about how contextual information informs BPA. To address this issue, we asked 15 experienced bloodstain pattern analysts from New Zealand and Australia to think aloud as they classified bloodstain patterns from two homicide cases. Analysts could request items of contextual information, and were required to state how each item would inform their analysis. Pathology reports and additional photographs of the scene were the most commonly requested items of information. We coded analysts' reasons for requesting contextual information--and the way in which they integrated this information--according to thematic analysis. We identified considerable variation in both of these variables, raising important questions about the role and necessity of contextual information in decisions about bloodstain pattern evidence. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Colour vision and response bias in a coral reef fish.

PubMed

Cheney, Karen L; Newport, Cait; McClure, Eva C; Marshall, N Justin

2013-08-01

Animals use coloured signals for a variety of communication purposes, including to attract potential mates, recognize individuals, defend territories and warn predators of secondary defences (aposematism). To understand the mechanisms that drive the evolution and design of such visual signals, it is important to understand the visual systems and potential response biases of signal receivers. Here, we provide raw data on the spectral capabilities of a coral reef fish, the Picasso triggerfish Rhinecanthus aculeatus, which is potentially trichromatic with three cone sensitivities of 413 nm (single cone), 480 nm (double cone, medium sensitivity) and 528 nm (double cone, long sensitivity), and a rod sensitivity of 498 nm. The ocular media have a 50% transmission cut off at 405 nm. Behavioural experiments confirmed colour vision over their spectral range; triggerfish were significantly more likely to choose coloured stimuli over grey distractors, irrespective of luminance. We then examined whether response biases existed towards coloured and patterned stimuli to provide insight into how visual signals - in particular, aposematic colouration - may evolve. Triggerfish showed a preferential foraging response bias to red and green stimuli, in contrast to blue and yellow, irrespective of pattern. There was no response bias to patterned over monochromatic non-patterned stimuli. A foraging response bias towards red in fish differs from that of avian predators, who often avoid red food items. Red is frequently associated with warning colouration in terrestrial environments (ladybirds, snakes, frogs), whilst blue is used in aquatic environments (blue-ringed octopus, nudibranchs); whether the design of warning (aposematic) displays is a cause or consequence of response biases is unclear.
A Multidimensional Ideal Point Item Response Theory Model for Binary Data

ERIC Educational Resources Information Center

Maydeu-Olivares, Albert; Hernandez, Adolfo; McDonald, Roderick P.

2006-01-01

We introduce a multidimensional item response theory (IRT) model for binary data based on a proximity response mechanism. Under the model, a respondent at the mode of the item response function (IRF) endorses the item with probability one. The mode of the IRF is the ideal point, or in the multidimensional case, an ideal hyperplane. The model…
A brief survey of patients' first impression after CPAP titration predicts future CPAP adherence: a pilot study.

PubMed

Balachandran, Jay S; Yu, Xiaohong; Wroblewski, Kristen; Mokhlesi, Babak

2013-03-15

CPAP adherence patterns are often established very early in the course of therapy. Our objective was to quantify patients' perception of CPAP therapy using a 6-item questionnaire administered in the morning following CPAP titration. We hypothesized that questionnaire responses would independently predict CPAP adherence during the first 30 days of therapy. We retrospectively reviewed the CPAP perception questionnaires of 403 CPAP-naïve adults who underwent in-laboratory titration and who had daily CPAP adherence data available for the first 30 days of therapy. Responses to the CPAP perception questionnaire were analyzed for their association with mean CPAP adherence and with changes in daily CPAP adherence over 30 days. Patients were aged 52 ± 14 years, 53% were women, 54% were African American, the mean body mass index (BMI) was 36.3 ± 9.1 kg/m(2), and most patients had moderate-severe OSA. Four of 6 items from the CPAP perception questionnaire- regarding difficulty tolerating CPAP, discomfort with CPAP pressure, likelihood of wearing CPAP, and perceived health benefit-were significantly correlated with mean 30-day CPAP adherence, and a composite score from these 4 questions was found to be internally consistent. Stepwise linear regression modeling demonstrated that 3 variables were significant and independent predictors of reduced mean CPAP adherence: worse score on the 4-item questionnaire, African American race, and non-sleep specialist ordering polysomnogram and CPAP therapy. Furthermore, a worse score on the 4-item CPAP perception questionnaire was consistently associated with decreased mean daily CPAP adherence over the first 30 days of therapy. In this pilot study, responses to a 4-item CPAP perception questionnaire administered to patients immediately following CPAP titration independently predicted mean CPAP adherence during the first 30 days. Further prospective validation of this questionnaire in different patient populations is warranted.
Mapping the cortical representation of speech sounds in a syllable repetition task.

PubMed

Markiewicz, Christopher J; Bohland, Jason W

2016-11-01

Speech repetition relies on a series of distributed cortical representations and functional pathways. A speaker must map auditory representations of incoming sounds onto learned speech items, maintain an accurate representation of those items in short-term memory, interface that representation with the motor output system, and fluently articulate the target sequence. A "dorsal stream" consisting of posterior temporal, inferior parietal and premotor regions is thought to mediate auditory-motor representations and transformations, but the nature and activation of these representations for different portions of speech repetition tasks remains unclear. Here we mapped the correlates of phonetic and/or phonological information related to the specific phonemes and syllables that were heard, remembered, and produced using a series of cortical searchlight multi-voxel pattern analyses trained on estimates of BOLD responses from individual trials. Based on responses linked to input events (auditory syllable presentation), predictive vowel-level information was found in the left inferior frontal sulcus, while syllable prediction revealed significant clusters in the left ventral premotor cortex and central sulcus and the left mid superior temporal sulcus. Responses linked to output events (the GO signal cueing overt production) revealed strong clusters of vowel-related information bilaterally in the mid to posterior superior temporal sulcus. For the prediction of onset and coda consonants, input-linked responses yielded distributed clusters in the superior temporal cortices, which were further informative for classifiers trained on output-linked responses. Output-linked responses in the Rolandic cortex made strong predictions for the syllables and consonants produced, but their predictive power was reduced for vowels. The results of this study provide a systematic survey of how cortical response patterns covary with the identity of speech sounds, which will help to constrain and guide theoretical models of speech perception, speech production, and phonological working memory. Copyright © 2016 Elsevier Inc. All rights reserved.
The failing measurement of attitudes: How semantic determinants of individual survey responses come to replace measures of attitude strength.

PubMed

Arnulf, Jan Ketil; Larsen, Kai Rune; Martinsen, Øyvind Lund; Egeland, Thore

2018-01-12

The traditional understanding of data from Likert scales is that the quantifications involved result from measures of attitude strength. Applying a recently proposed semantic theory of survey response, we claim that survey responses tap two different sources: a mixture of attitudes plus the semantic structure of the survey. Exploring the degree to which individual responses are influenced by semantics, we hypothesized that in many cases, information about attitude strength is actually filtered out as noise in the commonly used correlation matrix. We developed a procedure to separate the semantic influence from attitude strength in individual response patterns, and compared these results to, respectively, the observed sample correlation matrices and the semantic similarity structures arising from text analysis algorithms. This was done with four datasets, comprising a total of 7,787 subjects and 27,461,502 observed item pair responses. As we argued, attitude strength seemed to account for much information about the individual respondents. However, this information did not seem to carry over into the observed sample correlation matrices, which instead converged around the semantic structures offered by the survey items. This is potentially disturbing for the traditional understanding of what survey data represent. We argue that this approach contributes to a better understanding of the cognitive processes involved in survey responses. In turn, this could help us make better use of the data that such methods provide.
A Two-Decision Model for Responses to Likert-Type Items

ERIC Educational Resources Information Center

Thissen-Roe, Anne; Thissen, David

2013-01-01

Extreme response set, the tendency to prefer the lowest or highest response option when confronted with a Likert-type response scale, can lead to misfit of item response models such as the generalized partial credit model. Recently, a series of intrinsically multidimensional item response models have been hypothesized, wherein tendency toward…
Grouping Influences Output Interference in Short-term Memory: A Mixture Modeling Study.

PubMed

Kang, Min-Suk; Oh, Byung-Il

2016-01-01

Output interference is a source of forgetting induced by recalling. We investigated how grouping influences output interference in short-term memory. In Experiment 1, the participants were asked to remember four colored items. Those items were grouped by temporal coincidence as well as spatial alignment: two items were presented in the first memory array and two were presented in the second, and the items in both arrays were either vertically or horizontally aligned as well. The participants then performed two recall tasks in sequence by selecting a color presented at a cued location from a color wheel. In the same-group condition, the participants reported both items from the same memory array; however, in the different-group condition, the participants reported one item from each memory array. We analyzed participant responses with a mixture model, which yielded two measures: guess rate and precision of recalled memories. The guess rate in the second recall was higher for the different-group condition than for the same-group condition; however, the memory precisions obtained for both conditions were similarly degraded in the second recall. In Experiment 2, we varied the probability of the same- and different-group conditions with a ratio of 3 to 7. We expected output interference to be higher in the same-group condition than in the different-group condition. This is because items of the other group are more likely to be probed in the second recall phase and, thus, protecting those items during the first recall phase leads to a better performance. Nevertheless, the same pattern of results was robustly reproduced, suggesting grouping shields the grouped items from output interference because of the secured accessibility. We discussed how grouping influences output interference.
Reevaluation of the Amsterdam Inventory for Auditory Disability and Handicap Using Item Response Theory.

PubMed

Boeschen Hospers, J Mirjam; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B; Kramer, Sophia E

2016-04-01

We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Cross-sectional data from 2,352 adults with and without hearing impairment, ages 18-70 years, were analyzed. They completed the AIADH in the web-based prospective cohort study "Netherlands Longitudinal Study on Hearing." A graded response model was fitted to the AIADH data. Category response curves, item information curves, and the standard error as a function of self-reported hearing ability were plotted. The graded response model showed a good fit. Item information curves were most reliable for adults who reported having hearing disability and less reliable for adults with normal hearing. The standard error plot showed that self-reported hearing ability is most reliably measured for adults reporting mild up to moderate hearing disability. This is one of the few item response theory studies on audiological self-reports. All AIADH items could be hierarchically placed on the self-reported hearing ability continuum, meaning they measure the same construct. This provides a promising basis for developing a clinically useful computerized adaptive test, where item selection adapts to the hearing ability of individuals, resulting in efficient assessment of hearing disability.

On the Relationship Between Classical Test Theory and Item Response Theory: From One to the Other and Back.

PubMed

Raykov, Tenko; Marcoulides, George A

2016-04-01

The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete nature of the observed items. Two distinct observational equivalence approaches are outlined that render the item response models from corresponding classical test theory-based models, and can each be used to obtain the former from the latter models. Similarly, classical test theory models can be furnished using the reverse application of either of those approaches from corresponding item response models.
[Instrument to measure adherence in hypertensive patients: contribution of Item Response Theory].

PubMed

Rodrigues, Malvina Thaís Pacheco; Moreira, Thereza Maria Magalhaes; Vasconcelos, Alexandre Meira de; Andrade, Dalton Francisco de; Silva, Daniele Braz da; Barbetta, Pedro Alberto

2013-06-01

To analyze, by means of "Item Response Theory", an instrument to measure adherence to t treatment for hypertension. Analytical study with 406 hypertensive patients with associated complications seen in primary care in Fortaleza, CE, Northeastern Brazil, 2011 using "Item Response Theory". The stages were: dimensionality test, calibrating the items, processing data and creating a scale, analyzed using the gradual response model. A study of the dimensionality of the instrument was conducted by analyzing the polychoric correlation matrix and factor analysis of complete information. Multilog software was used to calibrate items and estimate the scores. Items relating to drug therapy are the most directly related to adherence while those relating to drug-free therapy need to be reworked because they have less psychometric information and low discrimination. The independence of items, the small number of levels in the scale and low explained variance in the adjustment of the models show the main weaknesses of the instrument analyzed. The "Item Response Theory" proved to be a relevant analysis technique because it evaluated respondents for adherence to treatment for hypertension, the level of difficulty of the items and their ability to discriminate between individuals with different levels of adherence, which generates a greater amount of information. The instrument analyzed is limited in measuring adherence to hypertension treatment, by analyzing the "Item Response Theory" of the item, and needs adjustment. The proper formulation of the items is important in order to accurately measure the desired latent trait.
Assessing the Item Response Theory with Covariate (IRT-C) Procedure for Ascertaining Differential Item Functioning

ERIC Educational Resources Information Center

Tay, Louis; Vermunt, Jeroen K.; Wang, Chun

2013-01-01

We evaluate the item response theory with covariates (IRT-C) procedure for assessing differential item functioning (DIF) without preknowledge of anchor items (Tay, Newman, & Vermunt, 2011). This procedure begins with a fully constrained baseline model, and candidate items are tested for uniform and/or nonuniform DIF using the Wald statistic.…
On Multidimensional Item Response Theory: A Coordinate-Free Approach. Research Report. ETS RR-07-30

ERIC Educational Resources Information Center

Antal, Tamás

2007-01-01

A coordinate-free definition of complex-structure multidimensional item response theory (MIRT) for dichotomously scored items is presented. The point of view taken emphasizes the possibilities and subtleties of understanding MIRT as a multidimensional extension of the classical unidimensional item response theory models. The main theorem of the…
Missouri Assessment Program (MAP), Spring 2000: Elementary Health/Physical Education, Released Items, Grade 5.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to fifth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
Reevaluation of the Amsterdam Inventory for Auditory Disability and Handicap Using Item Response Theory

ERIC Educational Resources Information Center

Hospers, J. Mirjam Boeschen; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B.; Kramer, Sophia E.

2016-01-01

Purpose: We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Method: Cross-sectional data from 2,352 adults with and without hearing…
The Relationship of Expert-System Scored Constrained Free-Response Items to Multiple-Choice and Open-Ended Items.

ERIC Educational Resources Information Center

Bennett, Randy Elliot; And Others

1990-01-01

The relationship of an expert-system-scored constrained free-response item type to multiple-choice and free-response items was studied using data for 614 students on the College Board's Advanced Placement Computer Science (APCS) Examination. Implications for testing and the APCS test are discussed. (SLD)
Asymmetric effects of emotion on mnemonic interference

PubMed Central

Leal, Stephanie L.; Tighe, Sarah K.; Yassa, Michael A.

2014-01-01

Emotional experiences can strengthen memories so that they can be used to guide future behavior. Emotional arousal, mediated by the amygdala, is thought to modulate storage by the hippocampus, which may encode unique episodic memories via pattern separation – the process by which similar memories are stored using non-overlapping representations. While prior work has examined mnemonic interference due to similarity and emotional modulation of memory independently, examining the mechanisms by which emotion influences mnemonic interference has not been previously accomplished in humans. To this end, we developed an emotional memory task where emotional content and stimulus similarity were varied to examine the effect of emotion on fine mnemonic discrimination (a putative behavioral correlate of hippocampal pattern separation). When tested immediately after encoding, discrimination was reduced for similar emotional items compared to similar neutral items, consistent with a reduced bias towards pattern separation. After 24 h, recognition of emotional target items was preserved compared to neutral items, whereas similar emotional item discrimination was further diminished. This suggests a potential mechanism for the emotional modulation of memory with a selective remembering of gist, as well as a selective forgetting of detail, indicating an emotion-induced reduction in pattern separation. This can potentially increase the effective signal-to-noise ratio in any given situation to promote survival. Furthermore, we found that individuals with depressive symptoms hyper-discriminate negative items, which correlated with their symptom severity. This suggests that utilizing mnemonic discrimination paradigms allows us to tease apart the nuances of disorders with aberrant emotional mnemonic processing. PMID:24607286
A Diffusion Model Analysis of Decision Biases Affecting Delayed Recognition of Emotional Stimuli.

PubMed

Bowen, Holly J; Spaniol, Julia; Patel, Ronak; Voss, Andreas

2016-01-01

Previous empirical work suggests that emotion can influence accuracy and cognitive biases underlying recognition memory, depending on the experimental conditions. The current study examines the effects of arousal and valence on delayed recognition memory using the diffusion model, which allows the separation of two decision biases thought to underlie memory: response bias and memory bias. Memory bias has not been given much attention in the literature but can provide insight into the retrieval dynamics of emotion modulated memory. Participants viewed emotional pictorial stimuli; half were given a recognition test 1-day later and the other half 7-days later. Analyses revealed that emotional valence generally evokes liberal responding, whereas high arousal evokes liberal responding only at a short retention interval. The memory bias analyses indicated that participants experienced greater familiarity with high-arousal compared to low-arousal items and this pattern became more pronounced as study-test lag increased; positive items evoke greater familiarity compared to negative and this pattern remained stable across retention interval. The findings provide insight into the separate contributions of valence and arousal to the cognitive mechanisms underlying delayed emotion modulated memory.
Psychometric properties and a latent class analysis of the 12-item World Health Organization Disability Assessment Schedule 2.0 (WHODAS 2.0) in a pooled dataset of community samples.

PubMed

MacLeod, Melissa A; Tremblay, Paul F; Graham, Kathryn; Bernards, Sharon; Rehm, Jürgen; Wells, Samantha

2016-12-01

The 12-item World Health Organization Disability Assessment Schedule 2.0 (WHODAS 2.0) is a brief measurement tool used cross-culturally to capture the multi-dimensional nature of disablement through six domains, including: understanding and interacting with the world; moving and getting around; self-care; getting on with people; life activities; and participation in society. Previous psychometric research supports that the WHODAS 2.0 functions as a general factor of disablement. In a pooled dataset from community samples of adults (N = 447) we used confirmatory factor analysis to confirm a one-factor structure. Latent class analysis was used to identify subgroups of individuals based on their patterns of responses. We identified four distinct classes, or patterns of disablement: (1) pervasive disability; (2) physical disability; (3) emotional, cognitive, or interpersonal disability; (4) no/low disability. Convergent validity of the latent class subgroups was found with respect to socio-demographic characteristics, number of days affected by disabilities, stress, mental health, and substance use. These classes offer a simple and meaningful way to classify people with disabilities based on the 12-item WHODAS 2.0. Focusing on individuals with a high probability of being in the first three classes may help guide interventions. Copyright © 2016 John Wiley & Sons, Ltd.
Perceived freedom-responsibility covariation among Cypriot adolescents.

PubMed

Frangou, Georgia; Wilkerson, Keith; McGahan, Joseph R

2008-04-01

Participants were 67 Cypriot adolescents who responded to propositions regarding positive, negative, and noncontingent relations between freedom and responsibility. The authors framed items so that half dealt with freedom given responsibility, and the other half dealt with responsibility given freedom. Results indicated participants were more likely to endorse positive-contingency items than they were negative and noncontingency items when items were framed around freedom given responsibility. However, when items were framed around responsibility given freedom, no such differences emerged. The authors discuss results relative to cultural and sociopolitical differences and similarities between children in Cypress and participants in the United States and implications concerning the present study and previous studies regarding these constructs.
Dust control technology usage patterns in the drywall finishing industry.

PubMed

Young-Corbett, Deborah E; Nussbaum, Maury A

2009-06-01

A telephone survey was conducted to quantify drywall finishing industry usage rates of dust control technology, identify barriers to technology adoption, and explore firm owner perception of risk. Industry use of the following technologies was described: wet methods, respiratory protection, pole sanders, ventilated sanders, and low-dust joint compound. A survey instrument composed of both Likert-type scaled items and open-ended items was developed and administered by telephone to the census population of the owners of member firms of trade associations: Finishing Contractors Association and Association of the Wall and Ceiling Industries. Of 857 firms, 264 interviews were completed. Along with descriptive statistics, results were analyzed to examine effects of firm size and union affiliation on responses. Responses to open-ended items were analyzed using content analysis procedures. Firm owners rated the risk of dust to productivity and customer satisfaction as low-moderate. Half rated the dust as having some impact on worker health, with higher impacts indicated by owners of small firms. Among the available control technologies, respiratory protection was used most frequently. Several barriers to implementation of the more effective control technologies were identified. Barriers associated with technology usability, productivity, and cost, as well as misperceptions of risk, should be addressed to improve dust control in the drywall finishing industry.
Boundary curves of individual items in the distribution of total depressive symptom scores approximate an exponential pattern in a general population.

PubMed

Tomitaka, Shinichiro; Kawasaki, Yohei; Ide, Kazuki; Akutagawa, Maiko; Yamada, Hiroshi; Furukawa, Toshiaki A; Ono, Yutaka

2016-01-01

Previously, we proposed a model for ordinal scale scoring in which individual thresholds for each item constitute a distribution by each item. This lead us to hypothesize that the boundary curves of each depressive symptom score in the distribution of total depressive symptom scores follow a common mathematical model, which is expressed as the product of the frequency of the total depressive symptom scores and the probability of the cumulative distribution function of each item threshold. To verify this hypothesis, we investigated the boundary curves of the distribution of total depressive symptom scores in a general population. Data collected from 21,040 subjects who had completed the Center for Epidemiologic Studies Depression Scale (CES-D) questionnaire as part of a national Japanese survey were analyzed. The CES-D consists of 20 items (16 negative items and four positive items). The boundary curves of adjacent item scores in the distribution of total depressive symptom scores for the 16 negative items were analyzed using log-normal scales and curve fitting. The boundary curves of adjacent item scores for a given symptom approximated a common linear pattern on a log normal scale. Curve fitting showed that an exponential fit had a markedly higher coefficient of determination than either linear or quadratic fits. With negative affect items, the gap between the total score curve and boundary curve continuously increased with increasing total depressive symptom scores on a log-normal scale, whereas the boundary curves of positive affect items, which are not considered manifest variables of the latent trait, did not exhibit such increases in this gap. The results of the present study support the hypothesis that the boundary curves of each depressive symptom score in the distribution of total depressive symptom scores commonly follow the predicted mathematical model, which was verified to approximate an exponential mathematical pattern.
Boundary curves of individual items in the distribution of total depressive symptom scores approximate an exponential pattern in a general population

PubMed Central

Kawasaki, Yohei; Akutagawa, Maiko; Yamada, Hiroshi; Furukawa, Toshiaki A.; Ono, Yutaka

2016-01-01

Background Previously, we proposed a model for ordinal scale scoring in which individual thresholds for each item constitute a distribution by each item. This lead us to hypothesize that the boundary curves of each depressive symptom score in the distribution of total depressive symptom scores follow a common mathematical model, which is expressed as the product of the frequency of the total depressive symptom scores and the probability of the cumulative distribution function of each item threshold. To verify this hypothesis, we investigated the boundary curves of the distribution of total depressive symptom scores in a general population. Methods Data collected from 21,040 subjects who had completed the Center for Epidemiologic Studies Depression Scale (CES-D) questionnaire as part of a national Japanese survey were analyzed. The CES-D consists of 20 items (16 negative items and four positive items). The boundary curves of adjacent item scores in the distribution of total depressive symptom scores for the 16 negative items were analyzed using log-normal scales and curve fitting. Results The boundary curves of adjacent item scores for a given symptom approximated a common linear pattern on a log normal scale. Curve fitting showed that an exponential fit had a markedly higher coefficient of determination than either linear or quadratic fits. With negative affect items, the gap between the total score curve and boundary curve continuously increased with increasing total depressive symptom scores on a log-normal scale, whereas the boundary curves of positive affect items, which are not considered manifest variables of the latent trait, did not exhibit such increases in this gap. Discussion The results of the present study support the hypothesis that the boundary curves of each depressive symptom score in the distribution of total depressive symptom scores commonly follow the predicted mathematical model, which was verified to approximate an exponential mathematical pattern. PMID:27761346
Dealing with Omitted and Not-Reached Items in Competence Tests: Evaluating Approaches Accounting for Missing Responses in Item Response Theory Models

ERIC Educational Resources Information Center

Pohl, Steffi; Gräfe, Linda; Rose, Norman

2014-01-01

Data from competence tests usually show a number of missing responses on test items due to both omitted and not-reached items. Different approaches for dealing with missing responses exist, and there are no clear guidelines on which of those to use. While classical approaches rely on an ignorable missing data mechanism, the most recently developed…
Age differences in visual search for compound patterns: long- versus short-range grouping.

PubMed

Burack, J A; Enns, J T; Iarocci, G; Randolph, B

2000-11-01

Visual search for compound patterns was examined in observers aged 6, 8, 10, and 22 years. The main question was whether age-related improvement in search rate (response time slope over number of items) was different for patterns defined by short- versus long-range spatial relations. Perceptual access to each type of relation was varied by using elements of same contrast (easy to access) or mixed contrast (hard to access). The results showed large improvements with age in search rate for long-range targets; search rate for short-range targets was fairly constant across age. This pattern held regardless of whether perceptual access to a target was easy or hard, supporting the hypothesis that different processes are involved in perceptual grouping at these two levels. The results also point to important links between ontogenic and microgenic change in perception (H. Werner, 1948, 1957).
Generalizing attentional control across dimensions and tasks: evidence from transfer of proportion-congruent effects.

PubMed

Wühr, Peter; Duthoo, Wout; Notebaert, Wim

2015-01-01

Three experiments investigated transfer of list-wide proportion congruent (LWPC) effects from a set of congruent and incongruent items with different frequency (inducer task) to a set of congruent and incongruent items with equal frequency (diagnostic task). Experiments 1 and 2 mixed items from horizontal and vertical Simon tasks. Tasks always involved different stimuli that varied on the same dimension (colour) in Experiment 1 and on different dimensions (colour, shape) in Experiment 2. Experiment 3 mixed trials from a manual Simon task with trials from a vocal Stroop task, with colour being the relevant stimulus in both tasks. There were two major results. First, we observed transfer of LWPC effects in Experiments 1 and 3, when tasks shared the relevant dimension, but not in Experiment 2. Second, sequential modulations of congruency effects transferred in Experiment 1 only. Hence, the different transfer patterns suggest that LWPC effects and sequential modulations arise from different mechanisms. Moreover, the observation of transfer supports an account of LWPC effects in terms of list-wide cognitive control, while being at odds with accounts in terms of stimulus-response (contingency) learning and item-specific control.
Cutoffs, Norms, and Patterns of Comorbid Difficulties in Children with Developmental Disabilities on the Baby and Infant Screen for Children with aUtIsm Traits (BISCUIT-Part 2)

ERIC Educational Resources Information Center

Matson, Johnny L.; Fodstad, Jill C.; Mahan, Sara

2009-01-01

Behavioral symptoms of comorbid psychopathology of 651 children 17-37 months of age who were at risk for developmental disabilities were studied using the BISCUIT-Part 2. In Study 1, norms and cutoff scores were established for this new scale on this sample. In Study 2, frequency of response on the 52 items measured was reported. Problems in…
External validity of the pediatric cardiac quality of life inventory

PubMed Central

Marino, Bradley S.; Drotar, Dennis; Cassedy, Amy; Davis, Richard; Tomlinson, Ryan S.; Mellion, Katelyn; Mussatto, Kathleen; Mahony, Lynn; Newburger, Jane W.; Tong, Elizabeth; Cohen, Mitchell I.; Helfaer, Mark A.; Kazak, Anne E.; Wray, Jo; Wernovsky, Gil; Shea, Judy A.; Ittenbach, Richard

2012-01-01

Purpose The Pediatric Cardiac Quality of Life Inventory (PCQLI) is a disease-specific, health-related quality of life (HRQOL) measure for pediatric heart disease (HD). The purpose of this study was to demonstrate the external validity of PCQLI scores. Methods The PCQLI development site (Development sample) and six geographically diverse centers in the United States (Composite sample) recruited pediatric patients with acquired or congenital HD. Item response option variability, scores [Total (TS); Disease Impact (DI) and Psychosocial Impact (PI) subscales], patterns of correlation, and internal consistency were compared between samples. Results A total of 3,128 patients and parent participants (1,113 Development; 2,015 Composite) were analyzed. Response option variability patterns of all items in both samples were acceptable. Inter-sample score comparisons revealed no differences. Median item–total (Development, 0.57; Composite, 0.59) and item–subscale (Development, DI 0.58, PI 0.59; Composite, DI 0.58, PI 0.56) correlations were moderate. Subscale–subscale (0.79 for both samples) and subscale–total (Development, DI 0.95, PI 0.95; Composite, DI 0.95, PI 0.94) correlations and internal consistency (Development, TS 0.93, DI 0.90, PI 0.84; Composite, TS 0.93, DI 0.89, PI 0.85) were high in both samples. Conclusion PCQLI scores are externally valid across the US pediatric HD population and may be used for multi-center HRQOL studies. PMID:21188538
Patterns of hand preference for pairs of actions and the classification of handedness.

PubMed

Annett, Marian

2009-08-01

Pairs of actions such as write x throw and throw x racquet were examined for items of the Annett hand preference questionnaire (AHPQ). Right (R) and left (L) responses were described for frequencies of RR, RL, LR, and LL pairings (write x throw etc.) in a large representative combined sample with the aim of discovering the distribution over the population as a whole. The frequencies of RL pairings varied significantly over the different item pairs but the frequencies of LR pairings were fairly constant. An important difference was found between primary actions (originally write, throw, racquet, match, toothbrush, hammer with the later addition of scissors for right-handers) and non-primary actions (needle and thread, broom, spade, dealing playing cards, and unscrewing the lid of a jar). For primary actions, there were similar numbers of right and left writers using the 'other' hand. For non-primary actions more right-handers used the left hand than for primary actions but more left-handers did not use the right hand. That is, different frequencies of response to primary versus non-primary actions were found for right-handers but not for left-handers. The pattern of findings was repeated for a corresponding analysis of left-handed throwing x AHPQ actions. The findings have implications for the classification of hand preferences and for analyses of the nature of hand skill.

A Comparison of Limited-Information and Full-Information Methods in M"plus" for Estimating Item Response Theory Parameters for Nonnormal Populations

ERIC Educational Resources Information Center

DeMars, Christine E.

2012-01-01

In structural equation modeling software, either limited-information (bivariate proportions) or full-information item parameter estimation routines could be used for the 2-parameter item response theory (IRT) model. Limited-information methods assume the continuous variable underlying an item response is normally distributed. For skewed and…
Estimation of Item Response Theory Parameters in the Presence of Missing Data

ERIC Educational Resources Information Center

Finch, Holmes

2008-01-01

Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same…
Examination of Different Item Response Theory Models on Tests Composed of Testlets

ERIC Educational Resources Information Center

Kogar, Esin Yilmaz; Kelecioglu, Hülya

2017-01-01

The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and…
A Semiparametric Model for Jointly Analyzing Response Times and Accuracy in Computerized Testing

ERIC Educational Resources Information Center

Wang, Chun; Fan, Zhewen; Chang, Hua-Hua; Douglas, Jeffrey A.

2013-01-01

The item response times (RTs) collected from computerized testing represent an underutilized type of information about items and examinees. In addition to knowing the examinees' responses to each item, we can investigate the amount of time examinees spend on each item. Current models for RTs mainly focus on parametric models, which have the…
Missouri Assessment Program (MAP), Spring 2000: High School Health/Physical Education, Released Items, Grade 9.

ERIC Educational Resources Information Center

Missouri State Dept. of Elementary and Secondary Education, Jefferson City.

This document presents 10 released items from the Health/Physical Education Missouri Assessment Program (MAP) test given in the spring of 2000 to ninth graders. Items from the test sessions include: selected-response (multiple choice), constructed-response, and a performance event. The selected-response items consist of individual questions…
Bi-dimensional acculturation and cultural response set in CES-D among Korean immigrants

PubMed Central

Kim, Eunjung; Seo, Kumin; Cain, Kevin C.

2017-01-01

This study examined a cultural response set to positive affect items and depressive symptom items in CES-D among 172 Korean immigrants. Bi-dimensional acculturation approach, which considers maintenance of Korean Orientation and adoption of American Orientation, was utilized. As Korean immigrants increased American Orientation, they tended to score higher on positive affect items, while no changes occurred in depressive symptom items. Korean Orientation was not related to either positive affect items or depressive symptom items. Korean immigrants have response bias toward positive affect items in CES-D, which decreases as they adopt more American Orientation. CES-D lacks cultural equivalence for Korean immigrants. PMID:20701420
Vegetable parenting practices scale. Item response modeling analyses

PubMed Central

Chen, Tzu-An; O’Connor, Teresia; Hughes, Sheryl; Beltran, Alicia; Baranowski, Janice; Diep, Cassandra; Baranowski, Tom

2015-01-01

Objective To evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We also tested for differences in the ways item function (called differential item functioning) across child’s gender, ethnicity, age, and household income groups. Method Parents of 3–5 year old children completed a self-reported vegetable parenting practices scale online. Vegetable parenting practices consisted of 14 effective vegetable parenting practices and 12 ineffective vegetable parenting practices items, each with three subscales (responsiveness, structure, and control). Multidimensional polytomous item response modeling was conducted separately on effective vegetable parenting practices and ineffective vegetable parenting practices. Results One effective vegetable parenting practice item did not fit the model well in the full sample or across demographic groups, and another was a misfit in differential item functioning analyses across child’s gender. Significant differential item functioning was detected across children’s age and ethnicity groups, and more among effective vegetable parenting practices than ineffective vegetable parenting practices items. Wright maps showed items only covered parts of the latent trait distribution. The harder- and easier-to-respond ends of the construct were not covered by items for effective vegetable parenting practices and ineffective vegetable parenting practices, respectively. Conclusions Several effective vegetable parenting practices and ineffective vegetable parenting practices scale items functioned differently on the basis of child’s demographic characteristics; therefore, researchers should use these vegetable parenting practices scales with caution. Item response modeling should be incorporated in analyses of parenting practice questionnaires to better assess differences across demographic characteristics. PMID:25895694
A Model-Free Diagnostic for Single-Peakedness of Item Responses Using Ordered Conditional Means.

PubMed

Polak, Marike; de Rooij, Mark; Heiser, Willem J

2012-09-01

In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone & Chave's (1929) criterion of irrelevance, which is a graphical, exploratory method for evaluating the "relevance" of dichotomous attitude items. We generalized this criterion to graded response items and quantified the relevance by fitting a unimodal smoother. The resulting goodness-of-fit was used to determine item fit and aggregated scale fit. Based on a simulation procedure, cutoff values were proposed for the measures of item fit. These cutoff values showed high power rates and acceptable Type I error rates. We present 2 applications of the OCM method. First, we apply the OCM method to personality data from the Developmental Profile; second, we analyze attitude data collected by Roberts and Laughlin (1996) concerning opinions of capital punishment.
Confirming the cognition of rising scores: Fox and Mitchum (2013) predicts violations of measurement invariance in series completion between age-matched cohorts.

PubMed

Fox, Mark C; Mitchum, Ainsley L

2014-01-01

The trend of rising scores on intelligence tests raises important questions about the comparability of variation within and between time periods. Descriptions of the processes that mediate selection of item responses provide meaningful psychological criteria upon which to base such comparisons. In a recent paper, Fox and Mitchum presented and tested a cognitive theory of rising scores on analogical and inductive reasoning tests that is specific enough to make novel predictions about cohort differences in patterns of item responses for tests such as the Raven's Matrices. In this paper we extend the same proposal in two important ways by (1) testing it against a dataset that enables the effects of cohort to be isolated from those of age, and (2) applying it to two other inductive reasoning tests that exhibit large Flynn effects: Letter Series and Word Series. Following specification and testing of a confirmatory item response model, predicted violations of measurement invariance are observed between two age-matched cohorts that are separated by only 20 years, as members of the later cohort are found to map objects at higher levels of abstraction than members of the earlier cohort who possess the same overall level of ability. Results have implications for the Flynn effect and cognitive aging while underscoring the value of establishing psychological criteria for equating members of distinct groups who achieve the same scores.
Item response theory analysis of the Pain Self-Efficacy Questionnaire.

PubMed

Costa, Daniel S J; Asghari, Ali; Nicholas, Michael K

2017-01-01

The Pain Self-Efficacy Questionnaire (PSEQ) is a 10-item instrument designed to assess the extent to which a person in pain believes s/he is able to accomplish various activities despite their pain. There is strong evidence for the validity and reliability of both the full-length PSEQ and a 2-item version. The purpose of this study is to further examine the properties of the PSEQ using an item response theory (IRT) approach. We used the two-parameter graded response model to examine the category probability curves, and location and discrimination parameters of the 10 PSEQ items. In item response theory, responses to a set of items are assumed to be probabilistically determined by a latent (unobserved) variable. In the graded-response model specifically, item response threshold (the value of the latent variable for which adjacent response categories are equally likely) and discrimination parameters are estimated for each item. Participants were 1511 mixed, chronic pain patients attending for initial assessment at a tertiary pain management centre. All items except item 7 ('I can cope with my pain without medication') performed well in IRT analysis, and the category probability curves suggested that participants used the 7-point response scale consistently. Items 6 ('I can still do many of the things I enjoy doing, such as hobbies or leisure activity, despite pain'), 8 ('I can still accomplish most of my goals in life, despite the pain') and 9 ('I can live a normal lifestyle, despite the pain') captured higher levels of the latent variable with greater precision. The results from this IRT analysis add to the body of evidence based on classical test theory illustrating the strong psychometric properties of the PSEQ. Despite the relatively poor performance of Item 7, its clinical utility warrants its retention in the questionnaire. The strong psychometric properties of the PSEQ support its use as an effective tool for assessing self-efficacy in people with pain. Copyright © 2016 Scandinavian Association for the Study of Pain. Published by Elsevier B.V. All rights reserved.
Item analysis of three Spanish naming tests: a cross-cultural investigation.

PubMed

Marquez de la Plata, Carlos; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C Munro

2009-01-01

Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test's construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (136 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided.
Overview of Classical Test Theory and Item Response Theory for Quantitative Assessment of Items in Developing Patient-Reported Outcome Measures

PubMed Central

Cappelleri, Joseph C.; Lundy, J. Jason; Hays, Ron D.

2014-01-01

Introduction The U.S. Food and Drug Administration’s patient-reported outcome (PRO) guidance document defines content validity as “the extent to which the instrument measures the concept of interest” (FDA, 2009, p. 12). “Construct validity is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity” (Strauss & Smith, 2009, p. 7). Hence both qualitative and quantitative information are essential in evaluating the validity of measures. Methods We review classical test theory and item response theory approaches to evaluating PRO measures including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized “difficulty” (severity) order of items is represented by observed responses. Conclusion Classical test theory and item response theory can be useful in providing a quantitative assessment of items and scales during the content validity phase of patient-reported outcome measures. Depending on the particular type of measure and the specific circumstances, either one or both approaches should be considered to help maximize the content validity of PRO measures. PMID:24811753
Item Response Theory Using Hierarchical Generalized Linear Models

ERIC Educational Resources Information Center

Ravand, Hamdollah

2015-01-01

Multilevel models (MLMs) are flexible in that they can be employed to obtain item and person parameters, test for differential item functioning (DIF) and capture both local item and person dependence. Papers on the MLM analysis of item response data have focused mostly on theoretical issues where applications have been add-ons to simulation…
Item Response Theory Equating Using Bayesian Informative Priors.

ERIC Educational Resources Information Center

de la Torre, Jimmy; Patz, Richard J.

This paper seeks to extend the application of Markov chain Monte Carlo (MCMC) methods in item response theory (IRT) to include the estimation of equating relationships along with the estimation of test item parameters. A method is proposed that incorporates estimation of the equating relationship in the item calibration phase. Item parameters from…
Instrument Formatting with Computer Data Entry in Mind.

ERIC Educational Resources Information Center

Boser, Judith A.; And Others

Different formats for four types of research items were studied for ease of computer data entry. The types were: (1) numeric response items; (2) individual multiple choice items; (3) multiple choice items with the same response items; and (4) card column indicator placement. Each of the 13 experienced staff members of a major university's Data…
Missing data methods for dealing with missing items in quality of life questionnaires. A comparison by simulation of personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques applied to the SF-36 in the French 2003 decennial health survey.

PubMed

Peyre, Hugo; Leplège, Alain; Coste, Joël

2011-03-01

Missing items are common in quality of life (QoL) questionnaires and present a challenge for research in this field. It remains unclear which of the various methods proposed to deal with missing data performs best in this context. We compared personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques using various realistic simulation scenarios of item missingness in QoL questionnaires constructed within the framework of classical test theory. Samples of 300 and 1,000 subjects were randomly drawn from the 2003 INSEE Decennial Health Survey (of 23,018 subjects representative of the French population and having completed the SF-36) and various patterns of missing data were generated according to three different item non-response rates (3, 6, and 9%) and three types of missing data (Little and Rubin's "missing completely at random," "missing at random," and "missing not at random"). The missing data methods were evaluated in terms of accuracy and precision for the analysis of one descriptive and one association parameter for three different scales of the SF-36. For all item non-response rates and types of missing data, multiple imputation and full information maximum likelihood appeared superior to the personal mean score and especially to hot deck in terms of accuracy and precision; however, the use of personal mean score was associated with insignificant bias (relative bias <2%) in all studied situations. Whereas multiple imputation and full information maximum likelihood are confirmed as reference methods, the personal mean score appears nonetheless appropriate for dealing with items missing from completed SF-36 questionnaires in most situations of routine use. These results can reasonably be extended to other questionnaires constructed according to classical test theory.
Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Applied Cognition – General Concerns, Short Forms in Ethnically Diverse Groups

PubMed Central

Fieo, Robert; Ocepek-Welikson, Katja; Kleinman, Marjorie; Eimicke, Joseph P.; Crane, Paul K.; Cella, David; Teresi, Jeanne A.

2017-01-01

Aims The goals of these analyses were to examine the psychometric properties and measurement equivalence of a self-reported cognition measure, the Patient Reported Outcome Measurement Information System® (PROMIS®) Applied Cognition – General Concerns short form. These items are also found in the PROMIS Cognitive Function (version 2) item bank. This scale consists of eight items related to subjective cognitive concerns. Differential item functioning (DIF) analyses of gender, education, race, age, and (Spanish) language were performed using an ethnically diverse sample (n = 5,477) of individuals with cancer. This is the first analysis examining DIF in this item set across ethnic and racial groups. Methods DIF hypotheses were derived by asking content experts to indicate whether they posited DIF for each item and to specify the direction. The principal DIF analytic model was item response theory (IRT) using the graded response model for polytomous data, with accompanying Wald tests and measures of magnitude. Sensitivity analyses were conducted using ordinal logistic regression (OLR) with a latent conditioning variable. IRT-based reliability, precision and information indices were estimated. Results DIF was identified consistently only for the item, brain not working as well as usual. After correction for multiple comparisons, this item showed significant DIF for both the primary and sensitivity analyses. Black respondents and Hispanics in comparison to White non-Hispanic respondents evidenced a lower conditional probability of endorsing the item, brain not working as well as usual. The same pattern was observed for the education grouping variable: as compared to those with a graduate degree, conditioning on overall level of subjective cognitive concerns, those with less than high school education also had a lower probability of endorsing this item. DIF was also observed for age for two items after correction for multiple comparisons for both the IRT and OLR-based models: “I have had to work really hard to pay attention or I would make a mistake” and “I have had trouble shifting back and forth between different activities that require thinking”. For both items, conditional on cognitive complaints, older respondents had a higher likelihood than younger respondents of endorsing the item in the cognitive complaints direction. The magnitude and impact of DIF was minimal. The scale showed high precision along much of the subjective cognitive concerns continuum; the overall IRT-based reliability estimate for the total sample was 0.88 and the estimates for subgroups ranged from 0.87 to 0.92. Conclusion Little DIF of high magnitude or impact was observed in the PROMIS Applied Cognition – General Concerns short form item set. One item, “It has seemed like my brain was not working as well as usual” might be singled out for further study. However, in general the short form item set was highly reliable, informative, and invariant across differing race/ethnic, educational, age, gender, and language groups. PMID:28523238
Measurement Equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Applied Cognition - General Concerns, Short Forms in Ethnically Diverse Groups.

PubMed

Fieo, Robert; Ocepek-Welikson, Katja; Kleinman, Marjorie; Eimicke, Joseph P; Crane, Paul K; Cella, David; Teresi, Jeanne A

2016-01-01

The goals of these analyses were to examine the psychometric properties and measurement equivalence of a self-reported cognition measure, the Patient Reported Outcome Measurement Information System ® (PROMIS ® ) Applied Cognition - General Concerns short form. These items are also found in the PROMIS Cognitive Function (version 2) item bank. This scale consists of eight items related to subjective cognitive concerns. Differential item functioning (DIF) analyses of gender, education, race, age, and (Spanish) language were performed using an ethnically diverse sample ( n = 5,477) of individuals with cancer. This is the first analysis examining DIF in this item set across ethnic and racial groups. DIF hypotheses were derived by asking content experts to indicate whether they posited DIF for each item and to specify the direction. The principal DIF analytic model was item response theory (IRT) using the graded response model for polytomous data, with accompanying Wald tests and measures of magnitude. Sensitivity analyses were conducted using ordinal logistic regression (OLR) with a latent conditioning variable. IRT-based reliability, precision and information indices were estimated. DIF was identified consistently only for the item, brain not working as well as usual. After correction for multiple comparisons, this item showed significant DIF for both the primary and sensitivity analyses. Black respondents and Hispanics in comparison to White non-Hispanic respondents evidenced a lower conditional probability of endorsing the item, brain not working as well as usual. The same pattern was observed for the education grouping variable: as compared to those with a graduate degree, conditioning on overall level of subjective cognitive concerns, those with less than high school education also had a lower probability of endorsing this item. DIF was also observed for age for two items after correction for multiple comparisons for both the IRT and OLR-based models: "I have had to work really hard to pay attention or I would make a mistake" and "I have had trouble shifting back and forth between different activities that require thinking". For both items, conditional on cognitive complaints, older respondents had a higher likelihood than younger respondents of endorsing the item in the cognitive complaints direction. The magnitude and impact of DIF was minimal. The scale showed high precision along much of the subjective cognitive concerns continuum; the overall IRT-based reliability estimate for the total sample was 0.88 and the estimates for subgroups ranged from 0.87 to 0.92. Little DIF of high magnitude or impact was observed in the PROMIS Applied Cognition - General Concerns short form item set. One item, "It has seemed like my brain was not working as well as usual" might be singled out for further study. However, in general the short form item set was highly reliable, informative, and invariant across differing race/ethnic, educational, age, gender, and language groups.
Consequences of Ignoring Guessing when Estimating the Latent Density in Item Response Theory

ERIC Educational Resources Information Center

Woods, Carol M.

2008-01-01

In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters. In extant Monte Carlo evaluations of RC-IRT, the item response function (IRF) used to fit the data is the same one used to generate the data. The present simulation study examines RC-IRT when the IRF is imperfectly…
Asymptotic Properties of Induced Maximum Likelihood Estimates of Nonlinear Models for Item Response Variables: The Finite-Generic-Item-Pool Case.

ERIC Educational Resources Information Center

Jones, Douglas H.

The progress of modern mental test theory depends very much on the techniques of maximum likelihood estimation, and many popular applications make use of likelihoods induced by logistic item response models. While, in reality, item responses are nonreplicate within a single examinee and the logistic models are only ideal, practitioners make…

Limits on Log Cross-Product Ratios for Item Response Models. Research Report. ETS RR-06-10

ERIC Educational Resources Information Center

Haberman, Shelby J.; Holland, Paul W.; Sinharay, Sandip

2006-01-01

Bounds are established for log cross-product ratios (log odds ratios) involving pairs of items for item response models. First, expressions for bounds on log cross-product ratios are provided for unidimensional item response models in general. Then, explicit bounds are obtained for the Rasch model and the two-parameter logistic (2PL) model.…
Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary.

PubMed

Petscher, Yaacov; Mitchell, Alison M; Foorman, Barbara R

2015-01-01

A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed.
Improving the Reliability of Student Scores from Speeded Assessments: An Illustration of Conditional Item Response Theory Using a Computer-Administered Measure of Vocabulary

PubMed Central

Petscher, Yaacov; Mitchell, Alison M.; Foorman, Barbara R.

2016-01-01

A growing body of literature suggests that response latency, the amount of time it takes an individual to respond to an item, may be an important factor to consider when using assessment data to estimate the ability of an individual. Considering that tests of passage and list fluency are being adapted to a computer administration format, it is possible that accounting for individual differences in response times may be an increasingly feasible option to strengthen the precision of individual scores. The present research evaluated the differential reliability of scores when using classical test theory and item response theory as compared to a conditional item response model which includes response time as an item parameter. Results indicated that the precision of student ability scores increased by an average of 5 % when using the conditional item response model, with greater improvements for those who were average or high ability. Implications for measurement models of speeded assessments are discussed. PMID:27721568
Metric equivalence assessment in cross-cultural research: using an example of the Center for Epidemiological Studies--Depression Scale.

PubMed

Kim, Miyong; Han, Hae-Ra; Phillips, Linda

2003-01-01

Metric equivalence is a quantitative way to assess cross-cultural equivalences of translated instruments by examining the patterns of psychometric properties based on cross-cultural data derived from both versions of the instrument. Metric equivalence checks at item and instrument levels can be used as a valuable tool to refine cross-cultural instruments. Korean and English versions of the Center for Epidemiological Studies-Depression Scale (CES-D) were administered to 154 Korean Americans and 151 Anglo Americans to illustrate approaches to assessing their metric equivalence. Inter-item and item-total correlations, Cronbach's alpha coefficients, and factor analysis were used for metric equivalence checks. The alpha coefficient for the Korean-American sample was 0.85 and 0.92 for the Anglo American sample. Although all items of the CES-D surpassed the desirable minimum of 0.30 in the Anglo American sample, four items did not meet the standard in the Korean American sample. Differences in average inter-item correlations were also noted between the two groups (0.25 for Korean Americans and 0.37 for Anglo Americans). Factor analysis identified two factors for both groups, and factor loadings showed similar patterns and congruence coefficients. Results of the item analysis procedures suggest the possibility of bias in certain items that may influence the sensitivity of the Korean version of the CES-D. These item biases also provide a possible explanation for the alpha differences. Although factor loadings showed similar patterns for the Korean and English versions of the CES-D, factorial similarity alone is not sufficient for testing the universality of the structure underlying an instrument.
An Investigation of the Effects of Different Pulse Patterns of Transcutaneous Electrical Nerve Stimulation (TENS) on Perceptual Embodiment of a Rubber Hand in Healthy Human Participants With Intact Limbs.

PubMed

Mulvey, Matthew R; Fawkner, Helen J; Johnson, Mark I

2015-12-01

The aim of this study was to investigate the strength of perceptual embodiment achieved during an adapted version of the rubber hand illusion (RHI) in response to a series of modified transcutaneous electrical nerve stimulation (TENS) pulse patterns with dynamic temporal and spatial characteristics which are more akin to the mechanical brush stroke in the original RHI. A repeated-measures counterbalanced experimental study was conducted where each participant was exposed to four TENS interventions: continuous pattern TENS; burst pattern TENS (fixed frequency of 2 bursts per second of 100 pulses per second); amplitude-modulated pattern TENS (intensity increasing from zero to a preset level, then back to zero again in a cyclical fashion); and sham (no current) TENS. Participants rated the intensity of the RHI using a three-item numerical rating scale (each item was ranked from 0 to 10). Friedman's analysis of ranks (one-factor repeated measure) was used to test the differences in perceptual embodiment between TENS innervations; alpha was set at p ≤ 0.05. There were statistically significant differences in the intensity of misattribution and perceptual embodiment between sham and active TENS interventions, but no significant differences between the three active TENS conditions (amplitude-modulated TENS, burst TENS, and continuous TENS). Amplitude-modulated and burst TENS produced significantly higher intensity scores for misattribution sensation and perceptual embodiment compared with sham (no current) TENS, whereas continuous TENS did not. The findings provide tentative, but not definitive, evidence that TENS parameters with dynamic spatial and temporal characteristics may produce more intense misattribution sensations and intense perceptual embodiment than parameters with static characteristics (e.g., continuous pulse patterns). © 2015 International Neuromodulation Society.
Lost in the supermarket: Quantifying the cost of partitioning memory sets in hybrid search.

PubMed

Boettcher, Sage E P; Drew, Trafton; Wolfe, Jeremy M

2018-01-01

The items on a memorized grocery list are not relevant in every aisle; for example, it is useless to search for the cabbage in the cereal aisle. It might be beneficial if one could mentally partition the list so only the relevant subset was active, so that vegetables would be activated in the produce section. In four experiments, we explored observers' abilities to partition memory searches. For example, if observers held 16 items in memory, but only eight of the items were relevant, would response times resemble a search through eight or 16 items? In Experiments 1a and 1b, observers were not faster for the partition set; however, they suffered relatively small deficits when "lures" (items from the irrelevant subset) were presented, indicating that they were aware of the partition. In Experiment 2 the partitions were based on semantic distinctions, and again, observers were unable to restrict search to the relevant items. In Experiments 3a and 3b, observers attempted to remove items from the list one trial at a time but did not speed up over the course of a block, indicating that they also could not limit their memory searches. Finally, Experiments 4a, 4b, 4c, and 4d showed that observers were able to limit their memory searches when a subset was relevant for a run of trials. Overall, observers appear to be unable or unwilling to partition memory sets from trial to trial, yet they are capable of restricting search to a memory subset that remains relevant for several trials. This pattern is consistent with a cost to switching between currently relevant memory items.
The Effect of Response Format on the Psychometric Properties of the Narcissistic Personality Inventory: Consequences for Item Meaning and Factor Structure.

PubMed

Ackerman, Robert A; Donnellan, M Brent; Roberts, Brent W; Fraley, R Chris

2016-04-01

The Narcissistic Personality Inventory (NPI) is currently the most widely used measure of narcissism in social/personality psychology. It is also relatively unique because it uses a forced-choice response format. We investigate the consequences of changing the NPI's response format for item meaning and factor structure. Participants were randomly assigned to one of three conditions: 40 forced-choice items (n = 2,754), 80 single-stimulus dichotomous items (i.e., separate true/false responses for each item; n = 2,275), or 80 single-stimulus rating scale items (i.e., 5-point Likert-type response scales for each item; n = 2,156). Analyses suggested that the "narcissistic" and "nonnarcissistic" response options from the Entitlement and Superiority subscales refer to independent personality dimensions rather than high and low levels of the same attribute. In addition, factor analyses revealed that although the Leadership dimension was evident across formats, dimensions with entitlement and superiority were not as robust. Implications for continued use of the NPI are discussed. © The Author(s) 2015.
Comparison of methods for dealing with missing values in the EPV-R.

PubMed

Paniagua, David; Amor, Pedro J; Echeburúa, Enrique; Abad, Francisco J

2017-08-01

The development of an effective instrument to assess the risk of partner violence is a topic of great social relevance. This study evaluates the scale of “Predicción del Riesgo de Violencia Grave Contra la Pareja” –Revisada– (EPV-R - Severe Intimate Partner Violence Risk Prediction Scale-Revised), a tool developed in Spain, which is facing the problem of how to treat the high rate of missing values, as is usual in this type of scale. First, responses to the EPV-R in a sample of 1215 male abusers who were reported to the police were used to analyze the patterns of occurrence of missing values, as well as the factor structure. Second, we analyzed the performance of various imputation methods using simulated data that emulates the missing data mechanism found in the empirical database. The imputation procedure originally proposed by the authors of the scale provides acceptable results, although the application of a method based on the Item Response Theory could provide greater accuracy and offers some additional advantages. Item Response Theory appears to be a useful tool for imputing missing data in this type of questionnaire.
Asymptotic Standard Errors for Item Response Theory True Score Equating of Polytomous Items

ERIC Educational Resources Information Center

Cher Wong, Cheow

2015-01-01

Building on previous works by Lord and Ogasawara for dichotomous items, this article proposes an approach to derive the asymptotic standard errors of item response theory true score equating involving polytomous items, for equivalent and nonequivalent groups of examinees. This analytical approach could be used in place of empirical methods like…
Evaluation of Northwest University, Kano Post-UTME Test Items Using Item Response Theory

ERIC Educational Resources Information Center

Bichi, Ado Abdu; Hafiz, Hadiza; Bello, Samira Abdullahi

2016-01-01

High-stakes testing is used for the purposes of providing results that have important consequences. Validity is the cornerstone upon which all measurement systems are built. This study applied the Item Response Theory principles to analyse Northwest University Kano Post-UTME Economics test items. The developed fifty (50) economics test items was…
Examination of Polytomous Items' Psychometric Properties According to Nonparametric Item Response Theory Models in Different Test Conditions

ERIC Educational Resources Information Center

Sengul Avsar, Asiye; Tavsancil, Ezel

2017-01-01

This study analysed polytomous items' psychometric properties according to nonparametric item response theory (NIRT) models. Thus, simulated datasets--three different test lengths (10, 20 and 30 items), three sample distributions (normal, right and left skewed) and three samples sizes (100, 250 and 500)--were generated by conducting 20…
Rasch Measurement and Item Banking: Theory and Practice.

ERIC Educational Resources Information Center

Nakamura, Yuji

The Rasch Model is an item response theory, one parameter model developed that states that the probability of a correct response on a test is a function of the difficulty of the item and the ability of the candidate. Item banking is useful for language testing. The Rasch Model provides estimates of item difficulties that are meaningful,…
Item Response Theory Models for Wording Effects in Mixed-Format Scales

ERIC Educational Resources Information Center

Wang, Wen-Chung; Chen, Hui-Fang; Jin, Kuan-Yu

2015-01-01

Many scales contain both positively and negatively worded items. Reverse recoding of negatively worded items might not be enough for them to function as positively worded items do. In this study, we commented on the drawbacks of existing approaches to wording effect in mixed-format scales and used bi-factor item response theory (IRT) models to…
Measuring subjective response to aircraft noise: the effects of survey context.

PubMed

Kroesen, Maarten; Molin, Eric J E; van Wee, Bert

2013-01-01

In applied research, noise annoyance is often used as indicator of subjective reaction to aircraft noise in residential areas. The present study aims to show that the meaning which respondents attach to the concept of aircraft noise annoyance is partly a function of survey context. To this purpose a survey is conducted among residents living near Schiphol Airport, the largest airport in the Netherlands. In line with the formulated hypotheses it is shown that different sets of preceding questionnaire items influence the response distribution of aircraft noise annoyance as well as the correlational patterns between aircraft noise annoyance and other relevant scales.
Diet patterns of island foxes on San Nicolas Island relative to feral cat removal

USGS Publications Warehouse

Cypher, Brian L.; Kelly, Erica C.; Ferrara, Francesca J.; Drost, Charles A.; Westall, Tory L.; Hudgens, Brian

2017-01-01

Island foxes (Urocyon littoralis) are a species of conservation concern that occur on six of the Channel Islands off the coast of southern California. We analysed island fox diet on San Nicolas Island during 2006–12 to assess the influence of the removal of feral cats (Felis catus) on the food use by foxes. Our objective was to determine whether fox diet patterns shifted in response to the cat removal conducted during 2009–10, thus indicating that cats were competing with foxes for food items. We also examined the influence of annual precipitation patterns and fox abundance on fox diet. On the basis of an analysis of 1975 fox scats, use of vertebrate prey – deer mice (Peromyscus maniculatus), birds, and lizards – increased significantly during and after the complete removal of cats (n = 66) from the island. Deer mouse abundance increased markedly during and after cat removal and use of mice by foxes was significantly related to mouse abundance. The increase in mice and shift in item use by the foxes was consistent with a reduction in exploitative competition associated with the cat removal. However, fox abundance declined markedly coincident with the removal of cats and deer mouse abundance was negatively related to fox numbers. Also, annual precipitation increased markedly during and after cat removal and deer mouse abundance closely tracked precipitation. Thus, our results indicate that other confounding factors, particularly precipitation, may have had a greater influence on fox diet patterns.
Vegetable parenting practices scale: Item response modeling analyses

USDA-ARS?s Scientific Manuscript database

Our objective was to evaluate the psychometric properties of a vegetable parenting practices scale using multidimensional polytomous item response modeling which enables assessing item fit to latent variables and the distributional characteristics of the items in comparison to the respondents. We al...
A HO-IRT Based Diagnostic Assessment System with Constructed Response Items

ERIC Educational Resources Information Center

Yang, Chih-Wei; Kuo, Bor-Chen; Liao, Chen-Huei

2011-01-01

The aim of the present study was to develop an on-line assessment system with constructed response items in the context of elementary mathematics curriculum. The system recorded the problem solving process of constructed response items and transfered the process to response codes for further analyses. An inference mechanism based on artificial…
Structural Equation Model Approach to the Use of Response Times for Improving Estimation in Item Response Models

ERIC Educational Resources Information Center

Sen, Rohini

2012-01-01

In the last five decades, research on the uses of response time has extended into the field of psychometrics (Schnikpe & Scrams, 1999; van der Linden, 2006; van der Linden, 2007), where interest has centered around the usefulness of response time information in item calibration and person measurement within an item response theory. framework.…
A Primer on the 2- and 3-Parameter Item Response Theory Models.

ERIC Educational Resources Information Center

Thornton, Artist

Item response theory (IRT) is a useful and effective tool for item response measurement if used in the proper context. This paper discusses the sets of assumptions under which responses can be modeled while exploring the framework of the IRT models relative to response testing. The one parameter model, or one parameter logistic model, is perhaps…
Analysis Test of Understanding of Vectors with the Three-Parameter Logistic Model of Item Response Theory and Item Response Curves Technique

ERIC Educational Resources Information Center

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-01-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming…

Social competence: evaluation of assertiveness in Spanish adolescents.

PubMed

Castedo, Antonio López; Juste, Margarita Pino; Alonso, José Domínguez

2015-02-01

Relations between assertiveness in adolescents' social behavior and demographic variables were assessed in 4,943 Spanish adolescents, ages 12 to 17 years, enrolled in 32 schools for Compulsory Secondary Education. Province of residence, school size, age, grade, and academic focus were statistically significant sources of variance in assertiveness scores. All effects were small. Patterns in responses indicate the items should be reviewed to improve the measure for adolescents, and as a tool for addressing teens' social competence in real life situations.
Developing a short version of the Toronto Structured Interview for Alexithymia using item response theory.

PubMed

Sekely, Angela; Taylor, Graeme J; Bagby, R Michael

2018-03-17

The Toronto Structured Interview for Alexithymia (TSIA) was developed to provide a structured interview method for assessing alexithymia. One drawback of this instrument is the amount of time it takes to administer and score. The current study used item response theory (IRT) methods to analyze data from a large heterogeneous multi-language sample (N = 842) to investigate whether a subset of items could be selected to create a short version of the instrument. Samejima's (1969) graded response model was used to fit the item responses. Items providing maximum information were retained in the short model, resulting in the elimination of 12-items from the original 24-items. Despite the 50% reduction in the number of items, 65.22% of the information was retained. Further studies are needed to validate the short version. A short version of the TSIA is potentially of practical value to clinicians and researchers with time constraints. Copyright © 2018. Published by Elsevier B.V.
Contextual behavior and neural circuits

PubMed Central

Lee, Inah; Lee, Choong-Hee

2013-01-01

Animals including humans engage in goal-directed behavior flexibly in response to items and their background, which is called contextual behavior in this review. Although the concept of context has long been studied, there are differences among researchers in defining and experimenting with the concept. The current review aims to provide a categorical framework within which not only the neural mechanisms of contextual information processing but also the contextual behavior can be studied in more concrete ways. For this purpose, we categorize contextual behavior into three subcategories as follows by considering the types of interactions among context, item, and response: contextual response selection, contextual item selection, and contextual item–response selection. Contextual response selection refers to the animal emitting different types of responses to the same item depending on the context in the background. Contextual item selection occurs when there are multiple items that need to be chosen in a contextual manner. Finally, when multiple items and multiple contexts are involved, contextual item–response selection takes place whereby the animal either chooses an item or inhibits such a response depending on item–context paired association. The literature suggests that the rhinal cortical regions and the hippocampal formation play key roles in mnemonically categorizing and recognizing contextual representations and the associated items. In addition, it appears that the fronto-striatal cortical loops in connection with the contextual information-processing areas critically control the flexible deployment of adaptive action sets and motor responses for maximizing goals. We suggest that contextual information processing should be investigated in experimental settings where contextual stimuli and resulting behaviors are clearly defined and measurable, considering the dynamic top-down and bottom-up interactions among the neural systems for contextual behavior. PMID:23675321
Item Response Theory Analysis of the Psychopathic Personality Inventory-Revised.

PubMed

Eichenbaum, Alexander E; Marcus, David K; French, Brian F

2017-06-01

This study examined item and scale functioning in the Psychopathic Personality Inventory-Revised (PPI-R) using an item response theory analysis. PPI-R protocols from 1,052 college student participants (348 male, 704 female) were analyzed. Analyses were conducted on the 131 self-report items comprising the PPI-R's eight content scales, using a graded response model. Scales collected a majority of their information about respondents possessing higher than average levels of the traits being measured. Each scale contained at least some items that evidenced limited ability to differentiate between respondents with differing levels of the trait being measured. Moreover, 80 items (61.1%) yielded significantly different responses between men and women presumably possessing similar levels of the trait being measured. Item performance was also influenced by the scoring format (directly scored vs. reverse-scored) of the items. Overall, the results suggest that the PPI-R, despite identifying psychopathic personality traits in individuals possessing high levels of those traits, may not identify these traits equally well for men and women, and scores are likely influenced by the scoring format of the individual item and scale.
A Machine Learning Recommender System to Tailor Preference Assessments to Enhance Person-Centered Care Among Nursing Home Residents.

PubMed

Gannod, Gerald C; Abbott, Katherine M; Van Haitsma, Kimberly; Martindale, Nathan; Heppner, Alexandra

2018-05-21

Nursing homes (NHs) using the Preferences for Everyday Living Inventory (PELI-NH) to assess important preferences and provide person-centered care find the number of items (72) to be a barrier to using the assessment. Using a sample of n = 255 NH resident responses to the PELI-NH, we used the 16 preference items from the MDS 3.0 Section F to develop a machine learning recommender system to identify additional PELI-NH items that may be important to specific residents. Much like the Netflix recommender system, our system is based on the concept of collaborative filtering whereby insights and predictions (e.g., filters) are created using the interests and preferences of many users. The algorithm identifies multiple sets of "you might also like" patterns called association rules, based upon responses to the 16 MDS preferences that recommends an additional set of preferences with a high likelihood of being important to a specific resident. In the evaluation of the combined apriori and logistic regression approach, we obtained a high recall performance (i.e., the ratio of correctly predicted preferences compared with all predicted preferences and nonpreferences) and high precision (i.e., the ratio of correctly predicted rules with respect to the rules predicted to be true) of 80.2% and 79.2%, respectively. The recommender system successfully provides guidance on how to best tailor the preference items asked of residents and can support preference capture in busy clinical environments, contributing to the feasibility of delivering person-centered care.
Decoding the content of recollection within the core recollection network and beyond.

PubMed

Thakral, Preston P; Wang, Tracy H; Rugg, Michael D

2017-06-01

Recollection - retrieval of qualitative information about a past event - is associated with enhanced neural activity in a consistent set of neural regions (the 'core recollection network') seemingly regardless of the nature of the recollected content. Here, we employed multi-voxel pattern analysis (MVPA) to assess whether retrieval-related functional magnetic resonance imaging (fMRI) activity in core recollection regions - including the hippocampus, angular gyrus, medial prefrontal cortex, retrosplenial/posterior cingulate cortex, and middle temporal gyrus - contain information about studied content and thus demonstrate retrieval-related 'reinstatement' effects. During study, participants viewed objects and concrete words that were subjected to different encoding tasks. Test items included studied words, the names of studied objects, or unstudied words. Participants judged whether the items were recollected, familiar, or new by making 'remember', 'know', and 'new' responses, respectively. The study history of remembered test items could be reliably decoded using MVPA in most regions, as well as from the dorsolateral prefrontal cortex, a region where univariate recollection effects could not be detected. The findings add to evidence that members of the core recollection network, as well as at least one neural region where mean signal is insensitive to recollection success, carry information about recollected content. Importantly, the study history of recognized items endorsed with a 'know' response could be decoded with equal accuracy. The results thus demonstrate a striking dissociation between mean signal and multi-voxel indices of recollection. Moreover, they converge with prior findings in suggesting that, as it is operationalized by classification-based MVPA, reinstatement is not uniquely a signature of recollection. Copyright © 2016 Elsevier Ltd. All rights reserved.
The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory

ERIC Educational Resources Information Center

Sahin, Alper; Anil, Duygu

2017-01-01

This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of…
Investigating Separate and Concurrent Approaches for Item Parameter Drift in 3PL Item Response Theory Equating

ERIC Educational Resources Information Center

Arce-Ferrer, Alvaro J.; Bulut, Okan

2017-01-01

This study examines separate and concurrent approaches to combine the detection of item parameter drift (IPD) and the estimation of scale transformation coefficients in the context of the common item nonequivalent groups design with the three-parameter item response theory equating. The study uses real and synthetic data sets to compare the two…
Item Response Theory with Covariates (IRT-C): Assessing Item Recovery and Differential Item Functioning for the Three-Parameter Logistic Model

ERIC Educational Resources Information Center

Tay, Louis; Huang, Qiming; Vermunt, Jeroen K.

2016-01-01

In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…
Numerical Differentiation Methods for Computing Error Covariance Matrices in Item Response Theory Modeling: An Evaluation and a New Proposal

ERIC Educational Resources Information Center

Tian, Wei; Cai, Li; Thissen, David; Xin, Tao

2013-01-01

In item response theory (IRT) modeling, the item parameter error covariance matrix plays a critical role in statistical inference procedures. When item parameters are estimated using the EM algorithm, the parameter error covariance matrix is not an automatic by-product of item calibration. Cai proposed the use of Supplemented EM algorithm for…
Modeling the Severity of Drinking Consequences in First-Year College Women: An Item Response Theory Analysis of the Rutgers Alcohol Problem Index*

PubMed Central

Cohn, Amy M.; Hagman, Brett T.; Graff, Fiona S.; Noel, Nora E.

2011-01-01

Objective: The present study examined the latent continuum of alcohol-related negative consequences among first-year college women using methods from item response theory and classical test theory. Method: Participants (N = 315) were college women in their freshman year who reported consuming any alcohol in the past 90 days and who completed assessments of alcohol consumption and alcohol-related negative consequences using the Rutgers Alcohol Problem Index. Results: Item response theory analyses showed poor model fit for five items identified in the Rutgers Alcohol Problem Index. Two-parameter item response theory logistic models were applied to the remaining 18 items to examine estimates of item difficulty (i.e., severity) and discrimination parameters. The item difficulty parameters ranged from 0.591 to 2.031, and the discrimination parameters ranged from 0.321 to 2.371. Classical test theory analyses indicated that the omission of the five misfit items did not significantly alter the psychometric properties of the construct. Conclusions: Findings suggest that those consequences that had greater severity and discrimination parameters may be used as screening items to identify female problem drinkers at risk for an alcohol use disorder. PMID:22051212
Generalizability in Item Response Modeling

ERIC Educational Resources Information Center

Briggs, Derek C.; Wilson, Mark

2007-01-01

An approach called generalizability in item response modeling (GIRM) is introduced in this article. The GIRM approach essentially incorporates the sampling model of generalizability theory (GT) into the scaling model of item response theory (IRT) by making distributional assumptions about the relevant measurement facets. By specifying a random…
Identifying and addressing the limitations of safety climate surveys.

PubMed

O'Connor, Paul; Buttrey, Samuel E; O'Dea, Angela; Kennedy, Quinn

2011-08-01

There are a variety of qualitative and quantitative tools for measuring safety climate. However, questionnaires are by far the most commonly used methodology. This paper reports the descriptive analysis of a large sample of safety climate survey data (n=110,014) collected over 10 years from U.S. Naval aircrew using the Command Safety Assessment Survey (CSAS). The analysis demonstrated that there was substantial non-random response bias associated with the data (the reverse worded items had a unique pattern of responses, there was a increasing tendency over time to only provide a modal response, the responses to the same item towards the beginning and end of the questionnaire did not correlate as highly as might be expected, and the faster the questionnaire was completed the higher the frequency of modal responses). It is suggested that the non-random responses bias was due to the negative effect on participant motivation of a number of factors (questionnaire design, lack of a belief in the importance of the response, participant fatigue, and questionnaire administration). Researchers must consider the factors that increase the likelihood of non-random measurement error in safety climate survey data and cease to rely on data that are solely collected using a long and complex questionnaire. In the absence of valid and reliable data it will not be possible for organizations to take the measures required to improve safety climate. Copyright © 2011 Elsevier B.V. All rights reserved.
Mechanisms supporting superior source memory for familiar items: A multi-voxel pattern analysis study

PubMed Central

Poppenk, Jordan; Norman, Kenneth A.

2012-01-01

Recent cognitive research has revealed better source memory performance for familiar relative to novel stimuli. Here we consider two possible explanations for this finding. The source memory advantage for familiar stimuli could arise because stimulus novelty induces attention to stimulus features at the expense of contextual processing, resulting in diminished overall levels of contextual processing at study for novel (vs. familiar) stimuli. Another possibility is that stimulus information retrieved from long-term memory (LTM) provides scaffolding that facilitates the formation of item-context associations. If contextual features are indeed more effectively bound to familiar (vs. novel) items, the relationship between contextual processing at study and subsequent source memory should be stronger for familiar items. We tested these possibilities by applying multi-voxel pattern analysis (MVPA) to a recently collected functional magnetic resonance imaging (fMRI) dataset, with the goal of measuring contextual processing at study and relating it to subsequent source memory performance. Participants were scanned with fMRI while viewing novel proverbs, repeated proverbs (previously novel proverbs that were shown in a pre-study phase), and previously known proverbs in the context of one of two experimental tasks. After scanning was complete, we evaluated participants’ source memory for the task associated with each proverb. Drawing upon fMRI data from the study phase, we trained a classifier to detect on-task processing (i.e., how strongly was the correct task set activated). On-task processing was greater for previously known than novel proverbs and similar for repeated and novel proverbs. However, both within- and across participants, the relationship between on-task processing and subsequent source memory was stronger for repeated than novel proverbs and similar for previously known and novel proverbs. Finally, focusing on the repeated condition, we found that higher levels of hippocampal activity during the pre-study phase, which we used as an index of episodic encoding, led to a stronger relationship between on-task processing at study and subsequent memory. Together, these findings suggest different mechanisms may be primarily responsible for superior source memory for repeated and previously known stimuli. Specifically, they suggest that prior stimulus knowledge enhances memory by boosting the overall level of contextual processing, whereas stimulus repetition enhances the probability that contextual features will be successfully bound to item features. Several possible theoretical explanations for this pattern are discussed. PMID:22820636
Quantifying Local, Response Dependence between Two Polytomous Items Using the Rasch Model

ERIC Educational Resources Information Center

Andrich, David; Humphry, Stephen M.; Marais, Ida

2012-01-01

Models of modern test theory imply statistical independence among responses, generally referred to as "local independence." One violation of local independence occurs when the response to one item governs the response to a subsequent item. Expanding on a formulation of this kind of violation as a process in the dichotomous Rasch model,…
Using Response Times for Item Selection in Adaptive Testing

ERIC Educational Resources Information Center

van der Linden, Wim J.

2008-01-01

Response times on items can be used to improve item selection in adaptive testing provided that a probabilistic model for their distribution is available. In this research, the author used a hierarchical modeling framework with separate first-level models for the responses and response times and a second-level model for the distribution of the…
The Influence of Item Response Indecision on the Self-Directed Search

ERIC Educational Resources Information Center

Sampson, James P., Jr.; Shy, Jonathan D.; Hartley, Sarah Lucas; Reardon, Robert C.; Peterson, Gary W.

2009-01-01

Students (N = 247) responded to Self-Directed Search (SDS) per the standard response format and were also instructed to record a question mark (?) for items about which they were uncertain (item response indecision [IRI]). The initial responses of the 114 participants with a (?) were then reversed and a second SDS summary code was obtained and…
Improving measurement of injection drug risk behavior using item response theory.

PubMed

Janulis, Patrick

2014-03-01

Recent research highlights the multiple steps to preparing and injecting drugs and the resultant viral threats faced by drug users. This research suggests that more sensitive measurement of injection drug HIV risk behavior is required. In addition, growing evidence suggests there are gender differences in injection risk behavior. However, the potential for differential item functioning between genders has not been explored. To explore item response theory as an improved measurement modeling technique that provides empirically justified scaling of injection risk behavior and to examine for potential gender-based differential item functioning. Data is used from three studies in the National Institute on Drug Abuse's Criminal Justice Drug Abuse Treatment Studies. A two-parameter item response theory model was used to scale injection risk behavior and logistic regression was used to examine for differential item functioning. Item fit statistics suggest that item response theory can be used to scale injection risk behavior and these models can provide more sensitive estimates of risk behavior. Additionally, gender-based differential item functioning is present in the current data. Improved measurement of injection risk behavior using item response theory should be encouraged as these models provide increased congruence between construct measurement and the complexity of injection-related HIV risk. Suggestions are made to further improve injection risk behavior measurement. Furthermore, results suggest direct comparisons of composite scores between males and females may be misleading and future work should account for differential item functioning before comparing levels of injection risk behavior.
Measuring sexual orientation in adolescent health surveys: evaluation of eight school-based surveys.

PubMed

Saewyc, Elizabeth M; Bauer, Greta R; Skay, Carol L; Bearinger, Linda H; Resnick, Michael D; Reis, Elizabeth; Murphy, Aileen

2004-10-01

To examine the performance of various items measuring sexual orientation within 8 school-based adolescent health surveys in the United States and Canada from 1986 through 1999. Analyses examined nonresponse and unsure responses to sexual orientation items compared with other survey items, demographic differences in responses, tests for response set bias, and congruence of responses to multiple orientation items; analytical methods included frequencies, contingency tables with Chi-square, and ANOVA with least significant differences (LSD)post hoc tests; all analyses were conducted separately by gender. In all surveys, nonresponse rates for orientation questions were similar to other sexual questions, but not higher; younger students, immigrants, and students with learning disabilities were more likely to skip items or select "unsure." Sexual behavior items had the lowest nonresponse, but fewer than half of all students reported sexual behavior, limiting its usefulness for indicating orientation. Item placement in the survey, wording, and response set bias all appeared to influence nonresponse and unsure rates. Specific recommendations include standardizing wording across future surveys, and pilot testing items with diverse ages and ethnic groups of teens before use. All three dimensions of orientation should be assessed where possible; when limited to single items, sexual attraction may be the best choice. Specific wording suggestions are offered for future surveys.
ITEM ANALYSIS OF THREE SPANISH NAMING TESTS: A CROSS-CULTURAL INVESTIGATION

PubMed Central

de la Plata, Carlos Marquez; Arango-Lasprilla, Juan Carlos; Alegret, Montse; Moreno, Alexander; Tárraga, Luis; Lara, Mar; Hewlitt, Margaret; Hynan, Linda; Cullum, C. Munro

2009-01-01

Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test’s construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (126 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided. PMID:19208960

Application of a General Polytomous Testlet Model to the Reading Section of a Large-Scale English Language Assessment. Research Report. ETS RR-10-21

ERIC Educational Resources Information Center

Li, Yanmei; Li, Shuhong; Wang, Lin

2010-01-01

Many standardized educational tests include groups of items based on a common stimulus, known as "testlets". Standard unidimensional item response theory (IRT) models are commonly used to model examinees' responses to testlet items. However, it is known that local dependence among testlet items can lead to biased item parameter estimates…
Assessing the Utility of Item Response Theory Models: Differential Item Functioning.

ERIC Educational Resources Information Center

Scheuneman, Janice Dowd

The current status of item response theory (IRT) is discussed. Several IRT methods exist for assessing whether an item is biased. Focus is on methods proposed by L. M. Rudner (1975), F. M. Lord (1977), D. Thissen et al. (1988) and R. L. Linn and D. Harnisch (1981). Rudner suggested a measure of the area lying between the two item characteristic…
A Comparison of the One-, the Modified Three-, and the Three-Parameter Item Response Theory Models in the Test Development Item Selection Process.

ERIC Educational Resources Information Center

Eignor, Daniel R.; Douglass, James B.

This paper attempts to provide some initial information about the use of a variety of item response theory (IRT) models in the item selection process; its purpose is to compare the information curves derived from the selection of items characterized by several different IRT models and their associated parameter estimation programs. These…
Item Response Modeling of Multivariate Count Data with Zero Inflation, Maximum Inflation, and Heaping

ERIC Educational Resources Information Center

Magnus, Brooke E.; Thissen, David

2017-01-01

Questionnaires that include items eliciting count responses are becoming increasingly common in psychology. This study proposes methodological techniques to overcome some of the challenges associated with analyzing multivariate item response data that exhibit zero inflation, maximum inflation, and heaping at preferred digits. The modeling…
Nested Logit Models for Multiple-Choice Item Response Data

ERIC Educational Resources Information Center

Suh, Youngsuk; Bolt, Daniel M.

2010-01-01

Nested logit item response models for multiple-choice data are presented. Relative to previous models, the new models are suggested to provide a better approximation to multiple-choice items where the application of a solution strategy precedes consideration of response options. In practice, the models also accommodate collapsibility across all…
The Dutch Identity: A New Tool for the Study of Item Response Models.

ERIC Educational Resources Information Center

Holland, Paul W.

1990-01-01

The Dutch Identity is presented as a useful tool for expressing the basic equations of item response models that relate the manifest probabilities to the item response functions and the latent trait distribution. Ways in which the identity may be exploited are suggested and illustrated. (SLD)
Validation of the Dutch language version of the Safety Attitudes Questionnaire (SAQ-NL).

PubMed

Haerkens, Marck Htm; van Leeuwen, Wouter; Sexton, J Bryan; Pickkers, Peter; van der Hoeven, Johannes G

2016-08-15

As the first objective of caring for patients is to do no harm, patient safety is a priority in delivering clinical care. An essential component of safe care in a clinical department is its safety climate. Safety climate correlates with safety-specific behaviour, injury rates, and accidents. Safety climate in healthcare can be assessed by the Safety Attitudes Questionnaire (SAQ), which provides insight by scoring six dimensions: Teamwork Climate, Job Satisfaction, Safety Climate, Stress Recognition, Working Conditions and Perceptions of Management. The objective of this study was to assess the psychometric properties of the Dutch language version of the SAQ in a variety of clinical departments in Dutch hospitals. The Dutch version (SAQ-NL) of the SAQ was back translated, and analyzed for semantic characteristics and content. From October 2010 to November 2015 SAQ-NL surveys were carried out in 17 departments in two university and seven large non-university teaching hospitals in the Netherlands, prior to a Crew Resource Management human factors intervention. Statistical analyses were used to examine response patterns, mean scores, correlations, internal consistency reliability and model fit. Cronbach's α's and inter-item correlations were calculated to examine internal consistency reliability. One thousand three hundred fourteen completed questionnaires were returned from 2113 administered to health care workers, resulting in a response rate of 62 %. Confirmatory Factor Analysis revealed the 6-factor structure fit the data adequately. Response patterns were similar for professional positions, departments, physicians and nurses, and university and non-university teaching hospitals. The SAQ-NL showed strong internal consistency (α = .87). Exploratory analysis revealed differences in scores on the SAQ dimensions when comparing different professional positions, when comparing physicians to nurses and when comparing university to non-university hospitals. The SAQ-NL demonstrated good psychometric properties and is therefore a useful instrument to measure patient safety climate in Dutch clinical work settings. As removal of one item resulted in an increased reliability of the Working Conditions dimension, revision or deletion of this item should be considered. The results from this study provide researchers and practitioners with insight into safety climate in a variety of departments and functional positions in Dutch hospitals.
Item response theory analysis of the mechanics baseline test

NASA Astrophysics Data System (ADS)

Cardamone, Caroline N.; Abbott, Jonathan E.; Rayyan, Saif; Seaton, Daniel T.; Pawl, Andrew; Pritchard, David E.

2012-02-01

Item response theory is useful in both the development and evaluation of assessments and in computing standardized measures of student performance. In item response theory, individual parameters (difficulty, discrimination) for each item or question are fit by item response models. These parameters provide a means for evaluating a test and offer a better measure of student skill than a raw test score, because each skill calculation considers not only the number of questions answered correctly, but the individual properties of all questions answered. Here, we present the results from an analysis of the Mechanics Baseline Test given at MIT during 2005-2010. Using the item parameters, we identify questions on the Mechanics Baseline Test that are not effective in discriminating between MIT students of different abilities. We show that a limited subset of the highest quality questions on the Mechanics Baseline Test returns accurate measures of student skill. We compare student skills as determined by item response theory to the more traditional measurement of the raw score and show that a comparable measure of learning gain can be computed.
Uncovering underlying processes of semantic priming by correlating item-level effects.

PubMed

Heyman, Tom; Hutchison, Keith A; Storms, Gert

2016-04-01

The current study examines the underlying processes of semantic priming using the largest priming database available (i.e., Semantic Priming Project, Hutchison et al. Behavior Research Methods, 45(4), 1099-1114, 2013). Specifically, it compares priming effects in two tasks: lexical decision and pronunciation. Task similarities were assessed at two different stimulus onset asynchronies (SOAs) (i.e., 200 and 1,200 ms) and for both primary and other associates. To evaluate how consistent priming is across these two tasks, item-level priming effects obtained in each task were correlated for each condition separately. The results revealed significant correlations at the short SOA for both primary and other associates. The correlations at the long SOA were significantly smaller and only reached significance when z-transformed response times were used. Furthermore, this pattern remained essentially the same when only asymmetric forward associates (e.g., panda-bear) were considered, suggesting that the cross-task stability at the short SOA was not merely caused by retrospective processes such as semantic matching. Instead, these findings provide evidence for a rapidly operating, item-based, relational characteristic such as spreading activation.
Sample Invariance of the Structural Equation Model and the Item Response Model: A Case Study.

ERIC Educational Resources Information Center

Breithaupt, Krista; Zumbo, Bruno D.

2002-01-01

Evaluated the sample invariance of item discrimination statistics in a case study using real data, responses of 10 random samples of 500 people to a depression scale. Results lend some support to the hypothesized superiority of a two-parameter item response model over the common form of structural equation modeling, at least when responses are…
A Method for Imputing Response Options for Missing Data on Multiple-Choice Assessments

ERIC Educational Resources Information Center

Wolkowitz, Amanda A.; Skorupski, William P.

2013-01-01

When missing values are present in item response data, there are a number of ways one might impute a correct or incorrect response to a multiple-choice item. There are significantly fewer methods for imputing the actual response option an examinee may have provided if he or she had not omitted the item either purposely or accidentally. This…
The neural dynamics of task context in free recall.

PubMed

Polyn, Sean M; Kragel, James E; Morton, Neal W; McCluey, Joshua D; Cohen, Zachary D

2012-03-01

Multivariate pattern analysis (MVPA) is a powerful tool for relating theories of cognitive function to the neural dynamics observed while people engage in cognitive tasks. Here, we use the Context Maintenance and Retrieval model of free recall (CMR; Polyn et al., 2009a) to interpret variability in the strength of task-specific patterns of distributed neural activity as participants study and recall lists of words. The CMR model describes how temporal and source-related (here, encoding task) information combine in a contextual representation that is responsible for guiding memory search. Each studied word in the free-recall paradigm is associated with one of two encoding tasks (size and animacy) that have distinct neural representations during encoding. We find evidence for the context retrieval hypothesis central to the CMR model: Task-specific patterns of neural activity are reactivated during memory search, as the participant recalls an item previously associated with a particular task. Furthermore, we find that the fidelity of these task representations during study is related to task-shifting, the serial position of the studied item, and variability in the magnitude of the recency effect across participants. The CMR model suggests that these effects may be related to a central parameter of the model that controls the rate that an internal contextual representation integrates information from the surrounding environment. Copyright Â© 2011 Elsevier Ltd. All rights reserved.
Tamper-indicating seal

DOEpatents

Fiarman, S.; Degen, M.F.; Peters, H.F.

1982-08-13

There is disclosed a tamper-indicating seal that permits in the field inspection and detection of tampering. Said seal comprises a shrinkable tube having a visible pattern of markings which is shrunk over th item to be sealed, and a second transparent tube, having a second visible marking pattern, which is shrunk over the item and the first tube. The relationship between the first and second set of markings produces a pattern so that the seal may not be removed without detection. The seal is particularly applicable to UF/sub 6/ cylinder valves.
Gender differences in national assessment of educational progress science items: What does i don't know really mean?

NASA Astrophysics Data System (ADS)

Linn, Marcia C.; de Benedictis, Tina; Delucchi, Kevin; Harris, Abigail; Stage, Elizabeth

The National Assessment of Educational Progress Science Assessment has consistently revealed small gender differences on science content items but not on science inquiry items. This assessment differs from others in that respondents can choose I don't know rather than guessing. This paper examines explanations for the gender differences including (a) differential prior instruction, (b) differential response to uncertainty and use of the I don't know response, (c) differential response to figurally presented items, and (d) different attitudes towards science. Of these possible explanations, the first two received support. Females are more likely to use the I don't know response, especially for items with physical science content or masculine themes such as football. To ameliorate this situation we need more effective science instruction and more gender-neutral assessment items.
A Brief Survey of Patients' First Impression after CPAP Titration Predicts Future CPAP Adherence: A Pilot Study

PubMed Central

Balachandran, Jay S.; Yu, Xiaohong; Wroblewski, Kristen; Mokhlesi, Babak

2013-01-01

Background: CPAP adherence patterns are often established very early in the course of therapy. Our objective was to quantify patients' perception of CPAP therapy using a 6-item questionnaire administered in the morning following CPAP titration. We hypothesized that questionnaire responses would independently predict CPAP adherence during the first 30 days of therapy. Methods: We retrospectively reviewed the CPAP perception questionnaires of 403 CPAP-naïve adults who underwent in-laboratory titration and who had daily CPAP adherence data available for the first 30 days of therapy. Responses to the CPAP perception questionnaire were analyzed for their association with mean CPAP adherence and with changes in daily CPAP adherence over 30 days. Results: Patients were aged 52 ± 14 years, 53% were women, 54% were African American, the mean body mass index (BMI) was 36.3 ± 9.1 kg/m2, and most patients had moderate-severe OSA. Four of 6 items from the CPAP perception questionnaire— regarding difficulty tolerating CPAP, discomfort with CPAP pressure, likelihood of wearing CPAP, and perceived health benefit—were significantly correlated with mean 30-day CPAP adherence, and a composite score from these 4 questions was found to be internally consistent. Stepwise linear regression modeling demonstrated that 3 variables were significant and independent predictors of reduced mean CPAP adherence: worse score on the 4-item questionnaire, African American race, and non-sleep specialist ordering polysomnogram and CPAP therapy. Furthermore, a worse score on the 4-item CPAP perception questionnaire was consistently associated with decreased mean daily CPAP adherence over the first 30 days of therapy. Conclusions: In this pilot study, responses to a 4-item CPAP perception questionnaire administered to patients immediately following CPAP titration independently predicted mean CPAP adherence during the first 30 days. Further prospective validation of this questionnaire in different patient populations is warranted. Commentary: A commentary on this article appears in this issue on page 207. Citation: Balachandran JS; Yu X; Wroblewski K; Mokhlesi B. A brief survey of patients' first impression after CPAP titration predicts future CPAP adherence: a pilot study. J Clin Sleep Med 2013;9(3):199-205. PMID:23493772
A Comparison between Discrimination Indices and Item-Response Theory Using the Rasch Model in a Clinical Course Written Examination of a Medical School.

PubMed

Park, Jong Cook; Kim, Kwang Sig

2012-03-01

The reliability of test is determined by each items' characteristics. Item analysis is achieved by classical test theory and item response theory. The purpose of the study was to compare the discrimination indices with item response theory using the Rasch model. Thirty-one 4th-year medical school students participated in the clinical course written examination, which included 22 A-type items and 3 R-type items. Point biserial correlation coefficient (C(pbs)) was compared to method of extreme group (D), biserial correlation coefficient (C(bs)), item-total correlation coefficient (C(it)), and corrected item-total correlation coeffcient (C(cit)). Rasch model was applied to estimate item difficulty and examinee's ability and to calculate item fit statistics using joint maximum likelihood. Explanatory power (r2) of Cpbs is decreased in the following order: C(cit) (1.00), C(it) (0.99), C(bs) (0.94), and D (0.45). The ranges of difficulty logit and standard error and ability logit and standard error were -0.82 to 0.80 and 0.37 to 0.76, -3.69 to 3.19 and 0.45 to 1.03, respectively. Item 9 and 23 have outfit > or =1.3. Student 1, 5, 7, 18, 26, 30, and 32 have fit > or =1.3. C(pbs), C(cit), and C(it) are good discrimination parameters. Rasch model can estimate item difficulty parameter and examinee's ability parameter with standard error. The fit statistics can identify bad items and unpredictable examinee's responses.
Using Item Response Theory to Describe the Nonverbal Literacy Assessment (NVLA)

ERIC Educational Resources Information Center

Fleming, Danielle; Wilson, Mark; Ahlgrim-Delzell, Lynn

2018-01-01

The Nonverbal Literacy Assessment (NVLA) is a literacy assessment designed for students with significant intellectual disabilities. The 218-item test was initially examined using confirmatory factor analysis. This method showed that the test worked as expected, but the items loaded onto a single factor. This article uses item response theory to…
Measuring Student Learning with Item Response Theory

ERIC Educational Resources Information Center

Lee, Young-Jin; Palazzo, David J.; Warnakulasooriya, Rasil; Pritchard, David E.

2008-01-01

We investigate short-term learning from hints and feedback in a Web-based physics tutoring system. Both the skill of students and the difficulty and discrimination of items were determined by applying item response theory (IRT) to the first answers of students who are working on for-credit homework items in an introductory Newtonian physics…
Higher-Order Item Response Models for Hierarchical Latent Traits

ERIC Educational Resources Information Center

Huang, Hung-Yu; Wang, Wen-Chung; Chen, Po-Hsi; Su, Chi-Ming

2013-01-01

Many latent traits in the human sciences have a hierarchical structure. This study aimed to develop a new class of higher order item response theory models for hierarchical latent traits that are flexible in accommodating both dichotomous and polytomous items, to estimate both item and person parameters jointly, to allow users to specify…
Evaluating Item Fit for Multidimensional Item Response Models

ERIC Educational Resources Information Center

Zhang, Bo; Stone, Clement A.

2008-01-01

This research examines the utility of the s-x[superscript 2] statistic proposed by Orlando and Thissen (2000) in evaluating item fit for multidimensional item response models. Monte Carlo simulation was conducted to investigate both the Type I error and statistical power of this fit statistic in analyzing two kinds of multidimensional test…

An Item Response Theory Model for Test Bias.

ERIC Educational Resources Information Center

Shealy, Robin; Stout, William

This paper presents a conceptualization of test bias for standardized ability tests which is based on multidimensional, non-parametric, item response theory. An explanation of how individually-biased items can combine through a test score to produce test bias is provided. It is contended that bias, although expressed at the item level, should be…
Analysis test of understanding of vectors with the three-parameter logistic model of item response theory and item response curves technique

NASA Astrophysics Data System (ADS)

Rakkapao, Suttida; Prasitpong, Singha; Arayathanitkul, Kwan

2016-12-01

This study investigated the multiple-choice test of understanding of vectors (TUV), by applying item response theory (IRT). The difficulty, discriminatory, and guessing parameters of the TUV items were fit with the three-parameter logistic model of IRT, using the parscale program. The TUV ability is an ability parameter, here estimated assuming unidimensionality and local independence. Moreover, all distractors of the TUV were analyzed from item response curves (IRC) that represent simplified IRT. Data were gathered on 2392 science and engineering freshmen, from three universities in Thailand. The results revealed IRT analysis to be useful in assessing the test since its item parameters are independent of the ability parameters. The IRT framework reveals item-level information, and indicates appropriate ability ranges for the test. Moreover, the IRC analysis can be used to assess the effectiveness of the test's distractors. Both IRT and IRC approaches reveal test characteristics beyond those revealed by the classical analysis methods of tests. Test developers can apply these methods to diagnose and evaluate the features of items at various ability levels of test takers.
A dimensional approach to understanding severity estimates and risk correlates of marijuana abuse and dependence in adults

PubMed Central

WU, LI-TZY; WOODY, GEORGE E.; YANG, CHONGMING; PAN, JENG-JONG; REEVE, BRYCE B.; BLAZER, DAN G.

2012-01-01

While item response theory (IRT) research shows a latent severity trait underlying response patterns of substance abuse and dependence symptoms, little is known about IRT-based severity estimates in relation to clinically relevant measures. In response to increased prevalences of marijuana-related treatment admissions, an elevated level of marijuana potency, and the debate on medical marijuana use, we applied dimensional approaches to understand IRT-based severity estimates for marijuana use disorders (MUDs) and their correlates while simultaneously considering gender- and race/ethnicity-related differential item functioning (DIF). Using adult data from the 2008 National Survey on Drug Use and Health (N=37,897), DSM-IV criteria for MUDs among past-year marijuana users were examined by IRT, logistic regression, and multiple indicators–multiple causes (MIMIC) approaches. Among 6,917 marijuana users, 15% met criteria for a MUD; another 24% exhibited subthreshold dependence. Abuse criteria were highly correlated with dependence criteria (correlation=0.90), indicating unidimensionality; item information curves revealed redundancy in multiple criteria. MIMIC analyses showed that MUD criteria were positively associated with weekly marijuana use, early marijuana use, other substance use disorders, substance abuse treatment, and serious psychological distress. African Americans and Hispanics showed higher levels of MUDs than whites, even after adjusting for race/ethnicity-related DIF. The redundancy in multiple criteria suggests an opportunity to improve efficiency in measuring symptom-level manifestations by removing low-informative criteria. Elevated rates of MUDs among African Americans and Hispanics require research to elucidate risk factors and improve assessments of MUDs for different racial/ethnic groups. PMID:22351489
Neural Overlap in Item Representations Across Episodes Impairs Context Memory.

PubMed

Kim, Ghootae; Norman, Kenneth A; Turk-Browne, Nicholas B

2018-06-12

We frequently encounter the same item in different contexts, and when that happens, memories of earlier encounters can get reactivated. We examined how existing memories are changed as a result of such reactivation. We hypothesized that when an item's initial and subsequent neural representations overlap, this allows the initial item to become associated with novel contextual information, interfering with later retrieval of the initial context. Specifically, we predicted a negative relationship between representational similarity across repeated experiences of an item and subsequent source memory for the initial context. We tested this hypothesis in an fMRI study, in which objects were presented multiple times during different tasks. We measured the similarity of the neural patterns in lateral occipital cortex that were elicited by the first and second presentations of objects, and related this neural overlap score to subsequent source memory. Consistent with our hypothesis, greater item-specific pattern similarity was linked to worse source memory for the initial task. In contrast, greater reactivation of the initial context was associated with better source memory. Our findings suggest that the influence of novel experiences on an existing context memory depends on how reliably a shared component (i.e., item) is represented across these episodes.
Children's patterns of reasoning about reading and addition concepts.

PubMed

Farrington-Flint, Lee; Canobi, Katherine H; Wood, Clare; Faulkner, Dorothy

2010-06-01

Children's reasoning was examined within two educational contexts (word reading and addition) so as to understand the factors that contribute to relational reasoning in the two domains. Sixty-seven 5- to 7-year-olds were given a series of related words to read or single-digit addition items to solve (interspersed with unrelated items). The frequency, accuracy, and response times of children's self-reports on the conceptually related items provided a measure of relational reasoning, while performance on the unrelated addition and reading items provided a measure of procedural skill. The results indicated that the children's ability to use conceptual relations to solve both reading and addition problems enhanced speed and accuracy levels, increased with age, and was related to procedural skill. However, regression analyses revealed that domain-specific competencies can best explain the use of conceptual relations in both reading and addition. Moreover, a cluster analysis revealed that children differ according to the academic domain in which they first apply conceptual relations and these differences are related to individual variation in their procedural skills within these particular domains. These results highlight the developmental significance of relational reasoning in the context of reading and addition and underscore the importance of concept-procedure links in explaining children's literacy and arithmetical development.
Capturing specific abilities as a window into human individuality: the example of face recognition.

PubMed

Wilmer, Jeremy B; Germine, Laura; Chabris, Christopher F; Chatterjee, Garga; Gerbasi, Margaret; Nakayama, Ken

2012-01-01

Proper characterization of each individual's unique pattern of strengths and weaknesses requires good measures of diverse abilities. Here, we advocate combining our growing understanding of neural and cognitive mechanisms with modern psychometric methods in a renewed effort to capture human individuality through a consideration of specific abilities. We articulate five criteria for the isolation and measurement of specific abilities, then apply these criteria to face recognition. We cleanly dissociate face recognition from more general visual and verbal recognition. This dissociation stretches across ability as well as disability, suggesting that specific developmental face recognition deficits are a special case of a broader specificity that spans the entire spectrum of human face recognition performance. Item-by-item results from 1,471 web-tested participants, included as supplementary information, fuel item analyses, validation, norming, and item response theory (IRT) analyses of our three tests: (a) the widely used Cambridge Face Memory Test (CFMT); (b) an Abstract Art Memory Test (AAMT), and (c) a Verbal Paired-Associates Memory Test (VPMT). The availability of this data set provides a solid foundation for interpreting future scores on these tests. We argue that the allied fields of experimental psychology, cognitive neuroscience, and vision science could fuel the discovery of additional specific abilities to add to face recognition, thereby providing new perspectives on human individuality.
A Risk-Based Approach to Variable Load Configuration Validation in Steam Sterilization: Application of PDA Technical Report 1 Load Equivalence Topic.

PubMed

Pavell, Anthony; Hughes, Keith A

2010-01-01

This article describes a method for achieving the load equivalence model, described in Parenteral Drug Association Technical Report 1, using a mass-based approach. The item and load bracketing approach allows for mixed equipment load size variation for operational flexibility along with decreased time to introduce new items to the operation. The article discusses the utilization of approximately 67 items/components (Table IV) identified for routine sterilization with varying quantities required weekly. The items were assessed for worst-case identification using four temperature-related criteria. The criteria were used to provide a data-based identification of worst-case items, and/or item equivalence, to carry forward into cycle validation using a variable load pattern. The mass approach to maximum load determination was used to bracket routine production use and allows for variable loading patterns. The result of the item mapping and load bracketing data is "a proven acceptable range" of sterilizing conditions including loading configuration and location. The application of these approaches, while initially more time/test-intensive than alternate approaches, provides a method of cycle validation with long-term benefit of ease of ongoing qualification, minimizing time and requirements for new equipment qualification for similar loads/use, and for rapid and rigorous assessment of new items for sterilization.
A Diffusion Model Analysis of Decision Biases Affecting Delayed Recognition of Emotional Stimuli

PubMed Central

Bowen, Holly J.; Spaniol, Julia; Patel, Ronak; Voss, Andreas

2016-01-01

Previous empirical work suggests that emotion can influence accuracy and cognitive biases underlying recognition memory, depending on the experimental conditions. The current study examines the effects of arousal and valence on delayed recognition memory using the diffusion model, which allows the separation of two decision biases thought to underlie memory: response bias and memory bias. Memory bias has not been given much attention in the literature but can provide insight into the retrieval dynamics of emotion modulated memory. Participants viewed emotional pictorial stimuli; half were given a recognition test 1-day later and the other half 7-days later. Analyses revealed that emotional valence generally evokes liberal responding, whereas high arousal evokes liberal responding only at a short retention interval. The memory bias analyses indicated that participants experienced greater familiarity with high-arousal compared to low-arousal items and this pattern became more pronounced as study-test lag increased; positive items evoke greater familiarity compared to negative and this pattern remained stable across retention interval. The findings provide insight into the separate contributions of valence and arousal to the cognitive mechanisms underlying delayed emotion modulated memory. PMID:26784108
Psychiatric symptoms and response quality to self-rated personality tests: Evidence from the PsyCoLaus study.

PubMed

Dupuis, Marc; Meier, Emanuele; Rudaz, Dominique; Strippoli, Marie-Pierre F; Castelao, Enrique; Preisig, Martin; Capel, Roland; Vandeleur, Caroline L

2017-06-01

Despite the fact that research has demonstrated consistent associations between self-rated measures of personality dimensions and mental disorders, little has been undertaken to investigate the relation between psychiatric symptoms and response patterns to self-rated tests. The aim of this study was to investigate the association between psychiatric symptoms and response quality using indices from our functional method. A sample of 1,784 participants from a Swiss population-based cohort completed a personality inventory (NEO-FFI) and a symptom checklist of 90 items (SCL-90-R). Different indices of response quality were calculated based on the responses given to the NEO-FFI. Associations among the responses to indices of response quality, sociodemographic characteristics and the SCL-90-R dimensions were then established. Psychiatric symptoms were associated with several important differences in response quality, questioning subjects' ability to provide valid information using self-rated instruments. As suggested by authors, psychiatric symptoms seem associated with differences in personality scores. Nonetheless, our study shows that symptoms are also related to differences in terms of response patterns as sources of differences in personality scores. This could constitute a bias for clinical assessment. Future studies could still determine whether certain subpopulations of subjects are more unable to provide valid information to self-rated questionnaires than others. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Comparison of 7-day recall and daily diary reports of COPD symptoms and impacts.

PubMed

Bennett, Antonia V; Amtmann, Dagmar; Diehr, Paula; Patrick, Donald L

2012-05-01

Patient reporting of symptoms in a questionnaire with a 7-day recall period was expected to differ from symptom reporting in a 7-day symptom diary on the basis of cognitive theory of memory processes and several studies of symptoms and health behaviors. A total of 101 adults with chronic obstructive pulmonary disease (COPD) completed a daily diary of items measuring symptoms and impacts of COPD for 7 days, and on the seventh day they completed a questionnaire of the same items with a 7-day recall period. The analysis examined concordance of 7-day recall with summary descriptors of the daily responses, examined the magnitude and covariates (patient characteristics and response patterns) of the difference between 7-day recall and mean of daily responses, and compared the discriminant ability and ability to detect change of 7-day recall and mean of daily responses. A 7-day recall was moderately concordant with the mean and maximum of daily responses and was 0.34 to 0.50 SDs higher than the mean of daily responses. Only the weekly report itself was a covariate of the difference. The discriminant ability and ability to detect change were equivalent. In measuring the weeklong experience of COPD symptoms and impacts on groups of patients, the 7-day recall scores were higher than the daily diary scores, but equivalent in detecting change over time. Copyright © 2012 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Judgment and judgment latency for freedom and responsibility relatedness as a function of subtle linguistic variations.

PubMed

Wilkerson, Keith; McGahan, Joseph R; Stevens, Rick; Williamson, David; Low, Jean

2009-12-01

The goal of this study was to determine whether differential response formats to covariation problems influence corresponding response latencies. The authors provided participants with 3 trials of 16 statements addressing positive and negative relations between freedom and responsibility. The authors framed half of the items around responsibility given freedom and the other half around freedom given responsibility. Response formats comprised true-false, agree-disagree, and yes-no answers as a between-participants factor. Results indicated that the manipulation of response format did not affect latencies. However, latencies differed according to the framing of the items. For items framed around freedom given responsibility, latencies were shorter. In addition, participants were more likely to report a positive relation between freedom and responsibility when items were framed around freedom given responsibility. The authors discuss implications relative to previous research in this area and give recommendations for future research.
Psychometric properties of the Chinese version of resilience scale specific to cancer: an item response theory analysis.

PubMed

Ye, Zeng Jie; Liang, Mu Zi; Zhang, Hao Wei; Li, Peng Fei; Ouyang, Xue Ren; Yu, Yuan Liang; Liu, Mei Ling; Qiu, Hong Zhong

2018-06-01

Classic theory test has been used to develop and validate the 25-item Resilience Scale Specific to Cancer (RS-SC) in Chinese patients with cancer. This study was designed to provide additional information about the discriminative value of the individual items tested with an item response theory analysis. A two-parameter graded response model was performed to examine whether any of the items of the RS-SC exhibited problems with the ordering and steps of thresholds, as well as the ability of items to discriminate patients with different resilience levels using item characteristic curves. A sample of 214 Chinese patients with cancer diagnosis was analyzed. The established three-dimension structure of the RS-SC was confirmed. Several items showed problematic thresholds or discrimination ability and require further revision. Some problematic items should be refined and a short-form of RS-SC maybe feasible in clinical settings in order to reduce burden on patients. However, the generalizability of these findings warrants further investigations.
Mining for preparatory processes of transfer learning in a blended course

NASA Astrophysics Data System (ADS)

Ng, K.; Hartman, K.; Goodkin, N.; Wai Hoong Andy, K.

2017-12-01

585 undergraduate science students enrolled in a multidisciplinary environmental sustainability course. Each week, students were given the opportunity to read online materials, answer multiple choice and short answer questions, and attend a three-hour lecture. The online materials and questions were released one week prior to the lecture. After each week, we mined the student data logs exported from the course learning management system and used a model-based clustering algorithm to divide the class into six groups according to resource access patterns. The patterns were mostly based on the frequency with which a student accessed the items in the growing set of online resources and whether those resources were relevant to the upcoming exam. Each exam was self-contained—meaning the second exam did not reference content taught during the first half of the course. The exam items themselves were intentionally designed to provide a mix of recall, application, and transfer items. Recall items referenced facts and examples provided during the lectures and course materials. Application items asked students to solve problems using the methods shown during lecture. Transfer items asked students to use what they had learned to analyze new data sets and unfamiliar problems. We then used a log-likelihood analysis to determine if there were differences in item accuracy on the exams by resource pattern clusters. We found students who deviated from the majority of student access patterns by accessing prior material during the recess break before new material had been assigned and introduced performed significantly more accurately on the transfer items than the other cluster groups. This finding fits with the concept of Preparation for Future Learning (Bransford & Schwartz, 1999) which suggests learners can be strategic about their learning to prepare themselves to complete new tasks in the future. Our findings also suggest that using learning analytics to call attention activity during expected lulls in a course might be a productive method of predicting future performance. Bransford, J. D., & Schwartz, D. L. (1999). Rethinking transfer: A simple proposal with multiple implications. In A. Iran-Nejad & P. D. Pearson (Eds.), Review of research in education, 24 (pp. 61-101). Washington, DC: American Educational Research Association
Automatic Scoring of Paper-and-Pencil Figural Responses. Research Report.

ERIC Educational Resources Information Center

Martinez, Michael E.; And Others

Large-scale testing is dominated by the multiple-choice question format. Widespread use of the format is due, in part, to the ease with which multiple-choice items can be scored automatically. This paper examines automatic scoring procedures for an alternative item type: figural response. Figural response items call for the completion or…
Introduction to Multilevel Item Response Theory Analysis: Descriptive and Explanatory Models

ERIC Educational Resources Information Center

Sulis, Isabella; Toland, Michael D.

2017-01-01

Item response theory (IRT) models are the main psychometric approach for the development, evaluation, and refinement of multi-item instruments and scaling of latent traits, whereas multilevel models are the primary statistical method when considering the dependence between person responses when primary units (e.g., students) are nested within…
An Extension of IRT-Based Equating to the Dichotomous Testlet Response Theory Model

ERIC Educational Resources Information Center

Tao, Wei; Cao, Yi

2016-01-01

Current procedures for equating number-correct scores using traditional item response theory (IRT) methods assume local independence. However, when tests are constructed using testlets, one concern is the violation of the local item independence assumption. The testlet response theory (TRT) model is one way to accommodate local item dependence.…
Estimating Ordinal Reliability for Likert-Type and Ordinal Item Response Data: A Conceptual, Empirical, and Practical Guide

ERIC Educational Resources Information Center

Gadermann, Anne M.; Guhn, Martin; Zumbo, Bruno D.

2012-01-01

This paper provides a conceptual, empirical, and practical guide for estimating ordinal reliability coefficients for ordinal item response data (also referred to as Likert, Likert-type, ordered categorical, or rating scale item responses). Conventionally, reliability coefficients, such as Cronbach's alpha, are calculated using a Pearson…
IRTPRO 2.1 for Windows (Item Response Theory for Patient-Reported Outcomes)

ERIC Educational Resources Information Center

Paek, Insu; Han, Kyung T.

2013-01-01

This article reviews a new item response theory (IRT) model estimation program, IRTPRO 2.1, for Windows that is capable of unidimensional and multidimensional IRT model estimation for existing and user-specified constrained IRT models for dichotomously and polytomously scored item response data. (Contains 1 figure and 2 notes.)
The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence.

ERIC Educational Resources Information Center

Ackerman, Terry A.

One of the important underlying assumptions of all item response theory (IRT) models is that of local independence. This assumption requires that the response to an item on a test not be influenced by the response to any other items. This assumption is often taken for granted, with little or no scrutiny of the response process required to answer…
Semantic representation in the white matter pathway

PubMed Central

Fang, Yuxing; Wang, Xiaosha; Zhong, Suyu; Song, Luping; Han, Zaizhu; Gong, Gaolang

2018-01-01

Object conceptual processing has been localized to distributed cortical regions that represent specific attributes. A challenging question is how object semantic space is formed. We tested a novel framework of representing semantic space in the pattern of white matter (WM) connections by extending the representational similarity analysis (RSA) to structural lesion pattern and behavioral data in 80 brain-damaged patients. For each WM connection, a neural representational dissimilarity matrix (RDM) was computed by first building machine-learning models with the voxel-wise WM lesion patterns as features to predict naming performance of a particular item and then computing the correlation between the predicted naming score and the actual naming score of another item in the testing patients. This correlation was used to build the neural RDM based on the assumption that if the connection pattern contains certain aspects of information shared by the naming processes of these two items, models trained with one item should also predict naming accuracy of the other. Correlating the neural RDM with various cognitive RDMs revealed that neural patterns in several WM connections that connect left occipital/middle temporal regions and anterior temporal regions associated with the object semantic space. Such associations were not attributable to modality-specific attributes (shape, manipulation, color, and motion), to peripheral picture-naming processes (picture visual similarity, phonological similarity), to broad semantic categories, or to the properties of the cortical regions that they connected, which tended to represent multiple modality-specific attributes. That is, the semantic space could be represented through WM connection patterns across cortical regions representing modality-specific attributes. PMID:29624578

Cultural differences in responses to a Likert scale.

PubMed

Lee, Jerry W; Jones, Patricia S; Mineyama, Yoshimitsu; Zhang, Xinwei Esther

2002-08-01

Cultural differences in responses to a Likert scale were examined. Self-identified Chinese, Japanese, and Americans (N=136, 323, and 160, respectively) recruited at ethnic or general supermarkets in Southern California completed a 13-question Sense of Coherence scale with a choice of either four, five, or seven responses in either Chinese, Japanese, or English. The Japanese respondents more frequently reported difficulty with the scale, the Chinese more frequently skipped questions, and both these groups selected the midpoint more frequently on items that involved admitting to a positive emotion than did the Americans, who were more likely to indicate a positive emotion. Construct validity of the scale tended to be better for the Chinese and the Americans when there were four response choices and for the Japanese when there were seven. Although culture affected response patterns, the association of sense of coherence and health was positive in all three cultural groups. Copyright 2002 Wiley Periodicals, Inc.
Item response theory scoring and the detection of curvilinear relationships.

PubMed

Carter, Nathan T; Dalal, Dev K; Guan, Li; LoPilato, Alexander C; Withrow, Scott A

2017-03-01

Psychologists are increasingly positing theories of behavior that suggest psychological constructs are curvilinearly related to outcomes. However, results from empirical tests for such curvilinear relations have been mixed. We propose that correctly identifying the response process underlying responses to measures is important for the accuracy of these tests. Indeed, past research has indicated that item responses to many self-report measures follow an ideal point response process-wherein respondents agree only to items that reflect their own standing on the measured variable-as opposed to a dominance process, wherein stronger agreement, regardless of item content, is always indicative of higher standing on the construct. We test whether item response theory (IRT) scoring appropriate for the underlying response process to self-report measures results in more accurate tests for curvilinearity. In 2 simulation studies, we show that, regardless of the underlying response process used to generate the data, using the traditional sum-score generally results in high Type 1 error rates or low power for detecting curvilinearity, depending on the distribution of item locations. With few exceptions, appropriate power and Type 1 error rates are achieved when dominance-based and ideal point-based IRT scoring are correctly used to score dominance and ideal point response data, respectively. We conclude that (a) researchers should be theory-guided when hypothesizing and testing for curvilinear relations; (b) correctly identifying whether responses follow an ideal point versus dominance process, particularly when items are not extreme is critical; and (c) IRT model-based scoring is crucial for accurate tests of curvilinearity. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Validating dental and medical students' evaluations of faculty teaching in an integrated, multi-instructor course.

PubMed

Stratton, Terry D; Witzke, Donald B; Freund, Mary Jane; Wilson, Martha T; Jacob, Robert J

2005-06-01

As more students from various health professions are combined into integrated courses, evaluating the teaching quality of individual faculty in these typically large, multi-instructor contexts becomes increasingly difficult. Indeed, students who lack sufficient recall of a given faculty member or are not committed to the evaluation process may respond by marking identical responses to all evaluation items (e.g., 3-3-3-3-3), regardless of the specific content of the items on the faculty evaluation questionnaire. These "straight-lining" behaviors-more formally referred to as monotonic response patterns (MRPs)-often reflect students' inattention to the task at hand or lack of motivation to be discriminating, which may result in invalid data. This study examines the prevalence of MRP ratings in relation to indicators reflective of students' lack of attention to evaluating the quality of faculty teaching. Dental and medical students in a required, second-year (medicine) basic science course conducted by the medical school and taught primarily by medical school faculty completed seven-item faculty evaluation forms, along with an anonymous questionnaire measuring their need to evaluate, attitudes toward faculty evaluation, and recall of instructors. MRP ratings failed to correlate significantly with students' need to evaluate or their attitudes toward faculty evaluation. However, among medical students, MRP "straight-line" responses were more prevalent for raters who recalled faculty members "very well" (p=.04). For dental students, MRPs were associated with less accurate recall (p=.01). As such, the validity of faculty evaluations within integrated, multi-instructor courses may vary when students rate distinct aspects of a teacher's performance identically. In this case-in which medical students' greater recall of instructors coincides with MRPs-ratings may suffice as global, holistic assessments of an instructor's teaching. For dental students, similar ratings may be less viable. Individual item analysis is cautioned under any circumstances.
Assessing Construct Validity Using Multidimensional Item Response Theory.

ERIC Educational Resources Information Center

Ackerman, Terry A.

The concept of a user-specified validity sector is discussed. The idea of the validity sector combines the work of M. D. Reckase (1986) and R. Shealy and W. Stout (1991). Reckase developed a methodology to represent an item in a multidimensional latent space as a vector. Item vectors are computed using multidimensional item response theory item…
Least Squares Distance Method of Cognitive Validation and Analysis for Binary Items Using Their Item Response Theory Parameters

ERIC Educational Resources Information Center

Dimitrov, Dimiter M.

2007-01-01

The validation of cognitive attributes required for correct answers on binary test items or tasks has been addressed in previous research through the integration of cognitive psychology and psychometric models using parametric or nonparametric item response theory, latent class modeling, and Bayesian modeling. All previous models, each with their…
Mixture Item Response Theory-MIMIC Model: Simultaneous Estimation of Differential Item Functioning for Manifest Groups and Latent Classes

ERIC Educational Resources Information Center

Bilir, Mustafa Kuzey

2009-01-01

This study uses a new psychometric model (mixture item response theory-MIMIC model) that simultaneously estimates differential item functioning (DIF) across manifest groups and latent classes. Current DIF detection methods investigate DIF from only one side, either across manifest groups (e.g., gender, ethnicity, etc.), or across latent classes…
Item Response Theory and Health Outcomes Measurement in the 21st Century

PubMed Central

Hays, Ron D.; Morales, Leo S.; Reise, Steve P.

2006-01-01

Item response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods. PMID:10982088
A Study of General Education Astronomy Students' Understandings of Cosmology. Part III. Evaluating Four Conceptual Cosmology Surveys: An Item Response Theory Approach

ERIC Educational Resources Information Center

Wallace, Colin S.; Prather, Edward E.; Duncan, Douglas K.

2012-01-01

This is the third of five papers detailing our national study of general education astronomy students' conceptual and reasoning difficulties with cosmology. In this paper, we use item response theory to analyze students' responses to three out of the four conceptual cosmology surveys we developed. The specific item response theory model we use is…
Validation of the PTSD Checklist-Civilian Version in survivors of bone marrow transplantation.

PubMed

Smith, M Y; Redd, W; DuHamel, K; Vickberg, S J; Ricketts, P

1999-07-01

Life-threatening illness now qualifies as a precipitating stessor for posttraumatic stress disorder (PTSD). We examined the validity of the PTSD Checklist-Civilian Version (PCL-C; Weathers, Litz, Herman, Juska, & Keane, 1993), a brief 17-item inventory of PTSD-like symptoms, in a sample of 111 adults who had undergone bone marrow transplantation an average of 4.04 years previously. Exploratory factor analysis of the PCL-C identified four distinct patterns of symptom responses: Numbing-Hyperarousal, Dreams-Memories of the Cancer Treatment, General Hyperarousal, Responses to Cancer-Related Reminders and Avoidance-Numbing. Respondents meeting PTSD symptom criteria on the PCL-C had significantly lower physical, role, and social functioning, greater distress and anxiety, and significantly more intrusive and avoidant responses than individuals who did not meet PTSD symptom criteria.
A Comparison of Measurement Equivalence Methods Based on Confirmatory Factor Analysis and Item Response Theory.

ERIC Educational Resources Information Center

Flowers, Claudia P.; Raju, Nambury S.; Oshima, T. C.

Current interest in the assessment of measurement equivalence emphasizes two methods of analysis, linear, and nonlinear procedures. This study simulated data using the graded response model to examine the performance of linear (confirmatory factor analysis or CFA) and nonlinear (item-response-theory-based differential item function or IRT-Based…
Item Response Theory at Subject- and Group-Level. Research Report 90-1.

ERIC Educational Resources Information Center

Tobi, Hilde

This paper reviews the literature about item response models for the subject level and aggregated level (group level). Group-level item response models (IRMs) are used in the United States in large-scale assessment programs such as the National Assessment of Educational Progress and the California Assessment Program. In the Netherlands, these…
The Role of Psychometric Modeling in Test Validation: An Application of Multidimensional Item Response Theory

ERIC Educational Resources Information Center

Schilling, Stephen G.

2007-01-01

In this paper the author examines the role of item response theory (IRT), particularly multidimensional item response theory (MIRT) in test validation from a validity argument perspective. The author provides justification for several structural assumptions and interpretations, taking care to describe the role he believes they should play in any…
Stochastic Approximation Methods for Latent Regression Item Response Models. Research Report. ETS RR-09-09

ERIC Educational Resources Information Center

von Davier, Matthias; Sinharay, Sandip

2009-01-01

This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…
Exploring the Robustness of a Unidimensional Item Response Theory Model with Empirically Multidimensional Data

ERIC Educational Resources Information Center

Anderson, Daniel; Kahn, Joshua D.; Tindal, Gerald

2017-01-01

Unidimensionality and local independence are two common assumptions of item response theory. The former implies that all items measure a common latent trait, while the latter implies that responses are independent, conditional on respondents' location on the latent trait. Yet, few tests are truly unidimensional. Unmodeled dimensions may result in…
The Random Response Technique as an Indicator of Questionnaire Item Social Desirability/Personal Sensitivity.

ERIC Educational Resources Information Center

Crino, Michael D.; And Others

1985-01-01

The random response technique was compared to a direct questionnaire, administered to college students, to investigate whether or not the responses predicted the social desirability of the item. Results suggest support for the hypothesis. A 33-item version of the Marlowe-Crowne Social Desirability Scale which was used is included. (GDC)
Pediatricians' knowledge, attitudes, and practice patterns regarding special education and individualized education programs.

PubMed

Shah, Reshma P; Kunnavakkam, Rangesh; Msall, Michael E

2013-01-01

The medical community has called upon pediatricians to be knowledgeable about an individualized education program (IEP). We sought to: 1) evaluate pediatricians' knowledge and attitudes regarding special education; 2) examine the relationship between perceived responsibilities and practice patterns; and 3) identify barriers that impact pediatricians' ability to provide comprehensive care to children with educational difficulties. Surveys were mailed to a national sample of 1000 randomly selected general pediatricians and pediatric residents from October 2010 to February 2011. The response rate was 47%. Of the knowledge items, respondents answered an average of 59% correctly. The majority of respondents thought pediatricians should be responsible for identifying children who may benefit from special education services and assist families in obtaining services, but less than 50% thought they should assist in the development of an IEP. The majority of pediatricians inquired whether a child is having difficulty at school, but far fewer conducted screening tests or asked parents if they needed assistance obtaining services. Overall, the prevalence of considering a practice a pediatrician's responsibility is significantly higher than examples of such a practice pattern being reported. Financial reimbursement and insufficient training were among the most significant barriers affecting a pediatrician's ability to provide care to children with educational difficulties. In order to provide a comprehensive medical home, pediatricians must be informed about the special education process. This study demonstrates that there are gaps in pediatricians' knowledge and practice patterns regarding special education that must be addressed. Copyright © 2013 Academic Pediatric Association. Published by Elsevier Inc. All rights reserved.
Statistical evaluation of synchronous spike patterns extracted by frequent item set mining

PubMed Central

Torre, Emiliano; Picado-Muiño, David; Denker, Michael; Borgelt, Christian; Grün, Sonja

2013-01-01

We recently proposed frequent itemset mining (FIM) as a method to perform an optimized search for patterns of synchronous spikes (item sets) in massively parallel spike trains. This search outputs the occurrence count (support) of individual patterns that are not trivially explained by the counts of any superset (closed frequent item sets). The number of patterns found by FIM makes direct statistical tests infeasible due to severe multiple testing. To overcome this issue, we proposed to test the significance not of individual patterns, but instead of their signatures, defined as the pairs of pattern size z and support c. Here, we derive in detail a statistical test for the significance of the signatures under the null hypothesis of full independence (pattern spectrum filtering, PSF) by means of surrogate data. As a result, injected spike patterns that mimic assembly activity are well detected, yielding a low false negative rate. However, this approach is prone to additionally classify patterns resulting from chance overlap of real assembly activity and background spiking as significant. These patterns represent false positives with respect to the null hypothesis of having one assembly of given signature embedded in otherwise independent spiking activity. We propose the additional method of pattern set reduction (PSR) to remove these false positives by conditional filtering. By employing stochastic simulations of parallel spike trains with correlated activity in form of injected spike synchrony in subsets of the neurons, we demonstrate for a range of parameter settings that the analysis scheme composed of FIM, PSF and PSR allows to reliably detect active assemblies in massively parallel spike trains. PMID:24167487
Evaluation of Internal Construct Validity and Unidimensionality of the Brachial Assessment Tool, A Patient-Reported Outcome Measure for Brachial Plexus Injury.

PubMed

Hill, Bridget; Pallant, Julie; Williams, Gavin; Olver, John; Ferris, Scott; Bialocerkowski, Andrea

2016-12-01

To evaluate the internal construct validity and dimensionality of a new patient-reported outcome measure for people with traumatic brachial plexus injury (BPI) based on the International Classification of Functioning, Disability and Health definition of activity. Cross-sectional study. Outpatient clinics. Adults (age range, 18-82y) with a traumatic BPI (N=106). There were 106 people with BPI who completed a 51-item 5-response questionnaire. Responses were analyzed in 4 phases (missing responses, item correlations, exploratory factor analysis, and Rasch analysis) to evaluate the properties of fit to the Rasch model, threshold response, local dependency, dimensionality, differential item functioning, and targeting. Not applicable, as this study addresses the development of an outcome measure. Six items were deleted for missing responses, and 10 were deleted for high interitem correlations >.81. The remaining 35 items, while demonstrating fit to the Rasch model, showed evidence of local dependency and multidimensionality. Items were divided into 3 subscales: dressing and grooming (8 items), arm and hand (17 items), and no hand (6 items). All 3 subscales demonstrated fit to the model with no local dependency, minimal disordered thresholds, no unidimensionality or differential item functioning for age, time postinjury, or self-selected dominance. Subscales were combined into 3 subtests and demonstrated fit to the model, no misfit, and unidimensionality, allowing calculation of a summary score. This preliminary analysis supports the internal construct validity of the Brachial Assessment Tool, a unidimensional targeted 4-response patient-reported outcome measure designed to solely assess activity after traumatic BPI regardless of level of injury, age at recruitment, premorbid limb dominance, and time postinjury. Further examination is required to determine test-retest reliability and responsiveness. Copyright Â© 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
The Act of Answering Questions Elicited Differentiated Responses in a Concealed Information Test.

PubMed

Otsuka, Takuro; Mizutani, Mitsuyoshi; Yagi, Akihiro; Katayama, Jun'ichi

2018-04-17

The concealed information test (CIT), a psychophysiological detection of deception test, compares physiological responses between crime-related and crime-unrelated items. In previous studies, whether the act of answering questions affected physiological responses was unclear. This study examined effects of both question-related and answer-related processes on physiological responses. Twenty participants received a modified CIT, in which the interval between presentation of questions and answering them was 27 s. Differentiated respiratory movements and cardiovascular responses between items were observed for both questions (items) and answers, while differentiated skin conductance response was observed only for questions. These results suggest that physiological responses to questions reflected orientation to a crime-related item, while physiological responses during answering reflected inhibition of psychological arousal caused by orienting. Regarding the CIT's accuracy, participants' perception of the questions themselves more strongly influenced physiological responses than answering them. © 2018 American Academy of Forensic Sciences.
Development and validation of an item response theory-based Social Responsiveness Scale short form.

PubMed

Sturm, Alexandra; Kuhfeld, Megan; Kasari, Connie; McCracken, James T

2017-09-01

Research and practice in autism spectrum disorder (ASD) rely on quantitative measures, such as the Social Responsiveness Scale (SRS), for characterization and diagnosis. Like many ASD diagnostic measures, SRS scores are influenced by factors unrelated to ASD core features. This study further interrogates the psychometric properties of the SRS using item response theory (IRT), and demonstrates a strategy to create a psychometrically sound short form by applying IRT results. Social Responsiveness Scale analyses were conducted on a large sample (N = 21,426) of youth from four ASD databases. Items were subjected to item factor analyses and evaluation of item bias by gender, age, expressive language level, behavior problems, and nonverbal IQ. Item selection based on item psychometric properties, DIF analyses, and substantive validity produced a reduced item SRS short form that was unidimensional in structure, highly reliable (α = .96), and free of gender, age, expressive language, behavior problems, and nonverbal IQ influence. The short form also showed strong relationships with established measures of autism symptom severity (ADOS, ADI-R, Vineland). Degree of association between all measures varied as a function of expressive language. Results identified specific SRS items that are more vulnerable to non-ASD-related traits. The resultant 16-item SRS short form may possess superior psychometric properties compared to the original scale and emerge as a more precise measure of ASD core symptom severity, facilitating research and practice. Future research using IRT is needed to further refine existing measures of autism symptomatology. © 2017 Association for Child and Adolescent Mental Health.

Rasch Based Analysis of Oral Proficiency Test Data.

ERIC Educational Resources Information Center

Nakamura, Yuji

2001-01-01

This paper examines the rating scale data of oral proficiency tests analyzed by a Rasch Analysis focusing on an item map and factor analysis. In discussing the item map, the difficulty order of six items and students' answering patterns are analyzed using descriptive statistics and measures of central tendency of test scores. The data ranks the…
Development of a subjective cognitive decline questionnaire using item response theory: a pilot study.

PubMed

Gifford, Katherine A; Liu, Dandan; Romano, Raymond; Jones, Richard N; Jefferson, Angela L

2015-12-01

Subjective cognitive decline (SCD) may indicate unhealthy cognitive changes, but no standardized SCD measurement exists. This pilot study aims to identify reliable SCD questions. 112 cognitively normal (NC, 76±8 years, 63% female), 43 mild cognitive impairment (MCI; 77±7 years, 51% female), and 33 diagnostically ambiguous participants (79±9 years, 58% female) were recruited from a research registry and completed 57 self-report SCD questions. Psychometric methods were used for item-reduction. Factor analytic models assessed unidimensionality of the latent trait (SCD); 19 items were removed with extreme response distribution or trait-fit. Item response theory (IRT) provided information about question utility; 17 items with low information were dropped. Post-hoc simulation using computerized adaptive test (CAT) modeling selected the most commonly used items (n=9 of 21 items) that represented the latent trait well (r=0.94) and differentiated NC from MCI participants (F(1,146)=8.9, p=0.003). Item response theory and computerized adaptive test modeling identified nine reliable SCD items. This pilot study is a first step toward refining SCD assessment in older adults. Replication of these findings and validation with Alzheimer's disease biomarkers will be an important next step for the creation of a SCD screener.
Cross-Cultural Validation of the Quality of Life in Hand Eczema Questionnaire (QOLHEQ).

PubMed

Ofenloch, Robert F; Oosterhaven, Jart A F; Susitaival, Päivikki; Svensson, Åke; Weisshaar, Elke; Minamoto, Keiko; Onder, Meltem; Schuttelaar, Marie Louise A; Bulbul Baskan, Emel; Diepgen, Thomas L; Apfelbacher, Christian

2017-07-01

The Quality of Life in Hand Eczema Questionnaire (QOLHEQ) is the only instrument assessing disease-specific health-related quality of life in patients with hand eczema. It is available in eight language versions. In this study we assessed if the items of different language versions of the QOLHEQ yield comparable values across countries. An international multicenter study was conducted with participating centers in Finland, Germany, Japan, The Netherlands, Sweden, and Turkey. Methods of item response theory were applied to each subscale to assess differential item functioning for items among countries. Overall, 662 hand eczema patients were recruited into the study. Single items were removed or split according to the item response theory model by country to resolve differential item functioning. After this adjustment, none of the four subscales of the QOLHEQ showed significant misfit to the item response theory model (P < 0.01), and a Person Separation Index of greater than 0.7 showed good internal consistency for each subscale. By adapting the scoring of the QOLHEQ using the methods of item response theory, it was possible to obtain QOLHEQ values that are comparable across countries. Cross-cultural variations in the interpretation of single items were resolved. The QOLHEQ is now ready to be used in international studies assessing the health-related quality of life impact of hand eczema. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Development and Evaluation of the Lifestyle History Questionnaire (LHQ) for People Entering Treatment for Substance Addictions.

PubMed

Martin, Linda M; Triscari, Robert; Boisvert, Rosemary; Hipp, Kristi; Gersten, Jennifer; West, Rachel C; Kisling, Elizabeth; Donham, Aaron; Kollar, Naomi; Escobar, Patricia

2015-01-01

We developed and investigated the psychometric properties of the Lifestyle History Questionnaire (LHQ), a self-report instrument designed to measure the extent of occupational dysfunction attributable to substance abuse. The instrument was developed using concepts in the ecological models of occupational therapy and in the work of William L. White, who defined addiction culture in terms of the patterns of life in context. We analyzed data from two field tests using both classical test theory and item response theory. The final version of the instrument has 70 items, 1 unifying construct, and 8 subscales. We found it to be valid and reliable (α=.93) for measuring the extent of occupational dysfunction and specific areas of strengths and weaknesses. The LHQ is a promising new instrument, the first of its kind to measure occupational dysfunction in context for people with substance addictions. Copyright © 2015 by the American Occupational Therapy Association, Inc.
Calibration of the Test of Relational Reasoning.

PubMed

Dumas, Denis; Alexander, Patricia A

2016-10-01

Relational reasoning, or the ability to discern meaningful patterns within a stream of information, is a critical cognitive ability associated with academic and professional success. Importantly, relational reasoning has been described as taking multiple forms, depending on the type of higher order relations being drawn between and among concepts. However, the reliable and valid measurement of such a multidimensional construct of relational reasoning has been elusive. The Test of Relational Reasoning (TORR) was designed to tap 4 forms of relational reasoning (i.e., analogy, anomaly, antinomy, and antithesis). In this investigation, the TORR was calibrated and scored using multidimensional item response theory in a large, representative undergraduate sample. The bifactor model was identified as the best-fitting model, and used to estimate item parameters and construct reliability. To improve the usefulness of the TORR to educators, scaled scores were also calculated and presented. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Environmental Enrichment Effect on Fecal Glucocorticoid Metabolites and Captive Maned Wolf (Chrysocyon brachyurus) Behavior.

PubMed

Coelho, Carlyle Mendes; de Azevedo, Cristiano Schetini; Guimarães, Marcelo Alcino de Barros Vaz; Young, Robert John

2016-01-01

Environmental enrichment is a technique that may reduce the stress of nonhuman animals in captivity. Stress may interfere with normal behavioral expression and affect cognitive decision making. Noninvasive hormonal studies can provide important information about the stress statuses of animals. This study evaluated the effectiveness of different environmental enrichment treatments in the diminution of fecal glucocorticoid metabolites (stress indicators) of three captive maned wolves (Chrysocyon brachyurus). Correlations of the fecal glucocorticoid metabolite levels with expressed behaviors were also determined. Results showed that environmental enrichment reduced fecal glucocorticoid metabolite levels. Furthermore, interspecific and foraging enrichment items were most effective in reducing stress in two of the three wolves. No definite pattern was found between behavioral and physiological responses to stress. In conclusion, these behavioral and physiological data showed that maned wolves responded positively from an animal well being perspective to the enrichment items presented.
Analyzing force concept inventory with item response theory

NASA Astrophysics Data System (ADS)

Wang, Jing; Bao, Lei

2010-10-01

Item response theory is a popular assessment method used in education. It rests on the assumption of a probability framework that relates students' innate ability and their performance on test questions. Item response theory transforms students' raw test scores into a scaled proficiency score, which can be used to compare results obtained with different test questions. The scaled score also addresses the issues of ceiling effects and guessing, which commonly exist in quantitative assessment. We used item response theory to analyze the force concept inventory (FCI). Our results show that item response theory can be useful for analyzing physics concept surveys such as the FCI and produces results about the individual questions and student performance that are beyond the capability of classical statistics. The theory yields detailed measurement parameters regarding the difficulty, discrimination features, and probability of correct guess for each of the FCI questions.
Eye movements provide insight into individual differences in children's analogical reasoning strategies.

PubMed

Starr, Ariel; Vendetti, Michael S; Bunge, Silvia A

2018-05-01

Analogical reasoning is considered a key driver of cognitive development and is a strong predictor of academic achievement. However, it is difficult for young children, who are prone to focusing on perceptual and semantic similarities among items rather than relational commonalities. For example, in a classic A:B::C:? propositional analogy task, children must inhibit attention towards items that are visually or semantically similar to C, and instead focus on finding a relational match to the A:B pair. Competing theories of reasoning development attribute improvements in children's performance to gains in either executive functioning or semantic knowledge. Here, we sought to identify key drivers of the development of analogical reasoning ability by using eye gaze patterns to infer problem-solving strategies used by six-year-old children and adults. Children had a greater tendency than adults to focus on the immediate task goal and constrain their search based on the C item. However, large individual differences existed within children, and more successful reasoners were able to maintain the broader goal in mind and constrain their search by initially focusing on the A:B pair before turning to C and the response choices. When children adopted this strategy, their attention was drawn more readily to the correct response option. Individual differences in children's reasoning ability were also related to rule-guided behavior but not to semantic knowledge. These findings suggest that both developmental improvements and individual differences in performance are driven by the use of more efficient reasoning strategies regarding which information is prioritized from the start, rather than the ability to disengage from attractive lure items. Copyright © 2018 Elsevier B.V. All rights reserved.
Item Response Theory Models for Performance Decline during Testing

ERIC Educational Resources Information Center

Jin, Kuan-Yu; Wang, Wen-Chung

2014-01-01

Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…
The Effect of Error in Item Parameter Estimates on the Test Response Function Method of Linking.

ERIC Educational Resources Information Center

Kaskowitz, Gary S.; De Ayala, R. J.

2001-01-01

Studied the effect of item parameter estimation for computation of linking coefficients for the test response function (TRF) linking/equating method. Simulation results showed that linking was more accurate when there was less error in the parameter estimates, and that 15 or 25 common items provided better results than 5 common items under both…
Standard Errors and Confidence Intervals from Bootstrapping for Ramsay-Curve Item Response Theory Model Item Parameters

ERIC Educational Resources Information Center

Gu, Fei; Skorupski, William P.; Hoyle, Larry; Kingston, Neal M.

2011-01-01

Ramsay-curve item response theory (RC-IRT) is a nonparametric procedure that estimates the latent trait using splines, and no distributional assumption about the latent trait is required. For item parameters of the two-parameter logistic (2-PL), three-parameter logistic (3-PL), and polytomous IRT models, RC-IRT can provide more accurate estimates…
Exploratory factor analysis of the 12-item Functional Assessment of Chronic Illness Therapy-Spiritual Well-Being Scale in people newly diagnosed with advanced cancer.

PubMed

Bai, Mei; Dixon, Jane K

2014-01-01

The purpose of this study was to reexamine the factor pattern of the 12-item Functional Assessment of Chronic Illness Therapy-Spiritual Well-Being Scale (FACIT-Sp-12) using exploratory factor analysis in people newly diagnosed with advanced cancer. Principal components analysis (PCA) and 3 common factor analysis methods were used to explore the factor pattern of the FACIT-Sp-12. Factorial validity was assessed in association with quality of life (QOL). Principal factor analysis (PFA), iterative PFA, and maximum likelihood suggested retrieving 3 factors: Peace, Meaning, and Faith. Both Peace and Meaning positively related to QOL, whereas only Peace uniquely contributed to QOL. This study supported the 3-factor model of the FACIT-Sp-12. Suggestions for revision of items and further validation of the identified factor pattern were provided.
Psychometric analysis of the Generalized Anxiety Disorder scale (GAD-7) in primary care using modern item response theory.

PubMed

Jordan, Pascal; Shedden-Mora, Meike C; Löwe, Bernd

2017-01-01

The Generalized Anxiety Disorder scale (GAD-7) is one of the most frequently used diagnostic self-report scales for screening, diagnosis and severity assessment of anxiety disorder. Its psychometric properties from the view of the Item Response Theory paradigm have rarely been investigated. We aimed to close this gap by analyzing the GAD-7 within a large sample of primary care patients with respect to its psychometric properties and its implications for scoring using Item Response Theory. Robust, nonparametric statistics were used to check unidimensionality of the GAD-7. A graded response model was fitted using a Bayesian approach. The model fit was evaluated using posterior predictive p-values, item information functions were derived and optimal predictions of anxiety were calculated. The sample included N = 3404 primary care patients (60% female; mean age, 52,2; standard deviation 19.2) The analysis indicated no deviations of the GAD-7 scale from unidimensionality and a decent fit of a graded response model. The commonly suggested ultra-brief measure consisting of the first two items, the GAD-2, was supported by item information analysis. The first four items discriminated better than the last three items with respect to latent anxiety. The information provided by the first four items should be weighted more heavily. Moreover, estimates corresponding to low to moderate levels of anxiety show greater variability. The psychometric validity of the GAD-2 was supported by our analysis.
Psychometric analysis of the Generalized Anxiety Disorder scale (GAD-7) in primary care using modern item response theory

PubMed Central

Shedden-Mora, Meike C.; Löwe, Bernd

2017-01-01

Objective The Generalized Anxiety Disorder scale (GAD-7) is one of the most frequently used diagnostic self-report scales for screening, diagnosis and severity assessment of anxiety disorder. Its psychometric properties from the view of the Item Response Theory paradigm have rarely been investigated. We aimed to close this gap by analyzing the GAD-7 within a large sample of primary care patients with respect to its psychometric properties and its implications for scoring using Item Response Theory. Methods Robust, nonparametric statistics were used to check unidimensionality of the GAD-7. A graded response model was fitted using a Bayesian approach. The model fit was evaluated using posterior predictive p-values, item information functions were derived and optimal predictions of anxiety were calculated. Results The sample included N = 3404 primary care patients (60% female; mean age, 52,2; standard deviation 19.2) The analysis indicated no deviations of the GAD-7 scale from unidimensionality and a decent fit of a graded response model. The commonly suggested ultra-brief measure consisting of the first two items, the GAD-2, was supported by item information analysis. The first four items discriminated better than the last three items with respect to latent anxiety. Conclusion The information provided by the first four items should be weighted more heavily. Moreover, estimates corresponding to low to moderate levels of anxiety show greater variability. The psychometric validity of the GAD-2 was supported by our analysis. PMID:28771530
Do large-scale assessments measure students' ability to integrate scientific knowledge?

NASA Astrophysics Data System (ADS)

Lee, Hee-Sun

2010-03-01

Large-scale assessments are used as means to diagnose the current status of student achievement in science and compare students across schools, states, and countries. For efficiency, multiple-choice items and dichotomously-scored open-ended items are pervasively used in large-scale assessments such as Trends in International Math and Science Study (TIMSS). This study investigated how well these items measure secondary school students' ability to integrate scientific knowledge. This study collected responses of 8400 students to 116 multiple-choice and 84 open-ended items and applied an Item Response Theory analysis based on the Rasch Partial Credit Model. Results indicate that most multiple-choice items and dichotomously-scored open-ended items can be used to determine whether students have normative ideas about science topics, but cannot measure whether students integrate multiple pieces of relevant science ideas. Only when the scoring rubric is redesigned to capture subtle nuances of student open-ended responses, open-ended items become a valid and reliable tool to assess students' knowledge integration ability.
Accounting for Local Dependence with the Rasch Model: The Paradox of Information Increase.

PubMed

Andrich, David

Test theories imply statistical, local independence. Where local independence is violated, models of modern test theory that account for it have been proposed. One violation of local independence occurs when the response to one item governs the response to a subsequent item. Expanding on a formulation of this kind of violation between two items in the dichotomous Rasch model, this paper derives three related implications. First, it formalises how the polytomous Rasch model for an item constituted by summing the scores of the dependent items absorbs the dependence in its threshold structure. Second, it shows that as a consequence the unit when the dependence is accounted for is not the same as if the items had no response dependence. Third, it explains the paradox, known, but not explained in the literature, that the greater the dependence of the constituent items the greater the apparent information in the constituted polytomous item when it should provide less information.
The associative memory deficit in aging is related to reduced selectivity of brain activity during encoding

PubMed Central

Saverino, Cristina; Fatima, Zainab; Sarraf, Saman; Oder, Anita; Strother, Stephen C.; Grady, Cheryl L.

2016-01-01

Human aging is characterized by reductions in the ability to remember associations between items, despite intact memory for single items. Older adults also show less selectivity in task-related brain activity, such that patterns of activation become less distinct across multiple experimental tasks. This reduced selectivity, or dedifferentiation, has been found for episodic memory, which is often reduced in older adults, but not for semantic memory, which is maintained with age. We used functional magnetic resonance imaging (fMRI) to investigate whether there is a specific reduction in selectivity of brain activity during associative encoding in older adults, but not during item encoding, and whether this reduction predicts associative memory performance. Healthy young and older adults were scanned while performing an incidental-encoding task for pictures of objects and houses under item or associative instructions. An old/new recognition test was administered outside the scanner. We used agnostic canonical variates analysis and split-half resampling to detect whole brain patterns of activation that predicted item vs. associative encoding for stimuli that were later correctly recognized. Older adults had poorer memory for associations than did younger adults, whereas item memory was comparable across groups. Associative encoding trials, but not item encoding trials, were predicted less successfully in older compared to young adults, indicating less distinct patterns of associative-related activity in the older group. Importantly, higher probability of predicting associative encoding trials was related to better associative memory after accounting for age and performance on a battery of neuropsychological tests. These results provide evidence that neural distinctiveness at encoding supports associative memory and that a specific reduction of selectivity in neural recruitment underlies age differences in associative memory. PMID:27082043
Item Order, Response Format, and Examinee Sex and Handedness and Performance on a Multiple-Choice Test.

ERIC Educational Resources Information Center

Kleinke, David J.

Four forms of a 36-item adaptation of the Stanford Achievement Test were administered to 484 fourth graders. External factors potentially influencing test performance were examined, namely: (1) item order (easy-to-difficult vs. uniform); (2) response location (left column vs. right column); (3) handedness which may interact with response location;…
Person Response Functions and the Definition of Units in the Social Sciences

ERIC Educational Resources Information Center

Engelhard, George, Jr.; Perkins, Aminah F.

2011-01-01

Humphry (this issue) has written a thought-provoking piece on the interpretation of item discrimination parameters as scale units in item response theory. One of the key features of his work is the description of an item response theory (IRT) model that he calls the logistic measurement function that combines aspects of two traditions in IRT that…
On the Relationship between Classical Test Theory and Item Response Theory: From One to the Other and Back

ERIC Educational Resources Information Center

Raykov, Tenko; Marcoulides, George A.

2016-01-01

The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete…

Applications of Multidimensional Item Response Theory Models with Covariates to Longitudinal Test Data. Research Report. ETS RR-16-21

ERIC Educational Resources Information Center

Fu, Jianbin

2016-01-01

The multidimensional item response theory (MIRT) models with covariates proposed by Haberman and implemented in the "mirt" program provide a flexible way to analyze data based on item response theory. In this report, we discuss applications of the MIRT models with covariates to longitudinal test data to measure skill differences at the…
Bayesian Analysis of Item Response Curves. Research Report 84-1. Mathematical Sciences Technical Report No. 132.

ERIC Educational Resources Information Center

Tsutakawa, Robert K.; Lin, Hsin Ying

Item response curves for a set of binary responses are studied from a Bayesian viewpoint of estimating the item parameters. For the two-parameter logistic model with normally distributed ability, restricted bivariate beta priors are used to illustrate the computation of the posterior mode via the EM algorithm. The procedure is illustrated by data…
Item response theory analysis of Working Alliance Inventory, revised response format, and new Brief Alliance Inventory.

PubMed

Mallinckrodt, Brent; Tekie, Yacob T

2016-11-01

The Working Alliance Inventory (WAI) has made great contributions to psychotherapy research. However, studies suggest the 7-point response format and 3-factor structure of the client version may have psychometric problems. This study used Rasch item response theory (IRT) to (a) improve WAI response format, (b) compare two brief 12-item versions (WAI-sr; WAI-s), and (c) develop a new 16-item Brief Alliance Inventory (BAI). Archival data from 1786 counseling center and community clients were analyzed. IRT findings suggested problems with crossed category thresholds. A rescoring scheme that combines neighboring responses to create 5- and 4-point scales sharply reduced these problems. Although subscale variance was reduced by 11-26%, rescoring yielded improved reliability and generally higher correlations with therapy process (session depth and smoothness) and outcome measures (residual gain symptom improvement). The 16-item BAI was designed to maximize "bandwidth" of item difficulty and preserve a broader range of WAI sensitivity than WAI-s or WAI-sr. Comparisons suggest the BAI performed better in several respects than the WAI-s or WAI-sr and equivalent to the full WAI on several performance indicators.
Assessment of self-reported sexual behavior and condom use among female sex workers in India using a polling box approach: a preliminary report.

PubMed

Hanck, Sarah E; Blankenship, Kim M; Irwin, Kevin S; West, Brooke S; Kershaw, Trace

2008-05-01

The accuracy of behavioral data related to risk for HIV and other sexually transmitted infections is prone to misreporting because of social desirability effects. Because computer-assisted approaches are not always feasible, a noncomputerized interview method for reducing social desirability effects is needed. The previous performance of alternative methods has been limited to aggregate data or constrained by the simplicity of dichotomous-only responses. We designed and tested a "polling box" method for case-attributable, multiple-response survey items in a low literacy population. A cross-sectional survey was conducted with 812 female sex workers in Andhra Pradesh, India. For a subset of questions embedded in a face-to-face survey questionnaire, every third participant was provided graphical response cards upon which to mark their answer and place in a polling box outside the view of the interviewer. Multiple logistic regression analysis was used to test for response differences to questions about socially undesirable, socially desirable, or sensitivity-neutral behaviors in the 2 interview methods. Polling box participants demonstrated higher reporting of risky sexual behaviors and lower reporting of condom use, with no conclusive response patterns among sensitivity-neutral items. Our findings suggest that the polling box approach provides a promising technique for improving the accurate reporting of sensitive behaviors among a low-literacy population in a resource poor setting. Additional research is needed to test logistical adaptations of the polling box approach.
Dissociative effects of orthographic distinctiveness in pure and mixed lists: an item-order account.

PubMed

McDaniel, Mark A; Cahill, Michael; Bugg, Julie M; Meadow, Nathaniel G

2011-10-01

We apply the item-order theory of list composition effects in free recall to the orthographic distinctiveness effect. The item-order account assumes that orthographically distinct items advantage item-specific encoding in both mixed and pure lists, but at the expense of exploiting relational information present in the list. Experiment 1 replicated the typical free recall advantage of orthographically distinct items in mixed lists and the elimination of that advantage in pure lists. Supporting the item-order account, recognition performances indicated that orthographically distinct items received greater item-specific encoding than did orthographically common items in mixed and pure lists (Experiments 1 and 2). Furthermore, order memory (input-output correspondence and sequential contiguity effects) was evident in recall of pure unstructured common lists, but not in recall of unstructured distinct lists (Experiment 1). These combined patterns, although not anticipated by prevailing views, are consistent with an item-order account.
SERENITY in e-Business and Smart Item Scenarios

NASA Astrophysics Data System (ADS)

Benameur, Azzedine; Khoury, Paul El; Seguran, Magali; Sinha, Smriti Kumar

SERENITY Artefacts, like Class, Patterns, Implementations and Executable Components for Security & Dependability (S&D) in addition to Serenity Runtime Framework (SRF) are discussed in previous chapters. How to integrate these artefacts with applications in Serenity approach is discussed here with two scenarios. The e-Business scenario is a standard loan origination process in a bank. The Smart Item scenario is an Ambient intelligence case study where we take advantage of Smart Items to provide an electronic healthcare infrastructure for remote healthcare assistance. In both cases, we detail how the prototype implementations of the scenarios select proper executable components through Serenity Runtime Framework and then demonstrate how these executable components of the S&D Patterns are deployed.
Measuring Response Styles Across the Big Five: A Multiscale Extension of an Approach Using Multinomial Processing Trees.

PubMed

Khorramdel, Lale; von Davier, Matthias

2014-01-01

This study shows how to address the problem of trait-unrelated response styles (RS) in rating scales using multidimensional item response theory. The aim is to test and correct data for RS in order to provide fair assessments of personality. Expanding on an approach presented by Böckenholt (2012), observed rating data are decomposed into multiple response processes based on a multinomial processing tree. The data come from a questionnaire consisting of 50 items of the International Personality Item Pool measuring the Big Five dimensions administered to 2,026 U.S. students with a 5-point rating scale. It is shown that this approach can be used to test if RS exist in the data and that RS can be differentiated from trait-related responses. Although the extreme RS appear to be unidimensional after exclusion of only 1 item, a unidimensional measure for the midpoint RS is obtained only after exclusion of 10 items. Both RS measurements show high cross-scale correlations and item response theory-based (marginal) reliabilities. Cultural differences could be found in giving extreme responses. Moreover, it is shown how to score rating data to correct for RS after being proved to exist in the data.
An item response theory evaluation of the young mania rating scale and the montgomery-asberg depression rating scale in the systematic treatment enhancement program for bipolar disorder (STEP-BD).

PubMed

Prisciandaro, James J; Tolliver, Bryan K

2016-11-15

The Young Mania Rating Scale (YMRS) and Montgomery-Asberg Depression Rating Scale (MADRS) are among the most widely used outcome measures for clinical trials of medications for Bipolar Disorder (BD). Nonetheless, very few studies have examined the measurement characteristics of the YMRS and MADRS in individuals with BD using modern psychometric methods. The present study evaluated the YMRS and MADRS in the Systematic Treatment Enhancement Program for BD (STEP-BD) study using Item Response Theory (IRT). Baseline data from 3716 STEP-BD participants were available for the present analysis. The Graded Response Model (GRM) was fit separately to YMRS and MADRS item responses. Differential item functioning (DIF) was examined by regressing a variety of clinically relevant covariates (e.g., sex, substance dependence) on all test items and on the latent symptom severity dimension, within each scale. Both scales: 1) contained several items that provided little or no psychometric information, 2) were inefficient, in that the majority of item response categories did not provide incremental psychometric information, 3) poorly measured participants outside of a narrow band of severity, 4) evidenced DIF for nearly all items, suggesting that item responses were, in part, determined by factors other than symptom severity. Limited to outpatients; DIF analysis only sensitive to certain forms of DIF. The present study provides evidence for significant measurement problems involving the YMRS and MADRS. More work is needed to refine these measures and/or develop suitable alternative measures of BD symptomatology for clinical trials research. Copyright © 2016 Elsevier B.V. All rights reserved.
Better assessment of physical function: item improvement is neglected but essential

PubMed Central

2009-01-01

Introduction Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. Methods The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. Results We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Conclusions Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes. PMID:20015354
Better assessment of physical function: item improvement is neglected but essential.

PubMed

Bruce, Bonnie; Fries, James F; Ambrosini, Debbie; Lingala, Bharathi; Gandek, Barbara; Rose, Matthias; Ware, John E

2009-01-01

Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90. Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes.
A Graphical Approach to Item Analysis. Research Report. ETS RR-04-10

ERIC Educational Resources Information Center

Livingston, Samuel A.; Dorans, Neil J.

2004-01-01

This paper describes an approach to item analysis that is based on the estimation of a set of response curves for each item. The response curves show, at a glance, the difficulty and the discriminating power of the item and the popularity of each distractor, at any level of the criterion variable (e.g., total score). The curves are estimated by…
Development and Standardization of the Diagnostic Adaptive Behavior Scale: Application of Item Response Theory to the Assessment of Adaptive Behavior

ERIC Educational Resources Information Center

Tassé, Marc J.; Schalock, Robert L.; Thissen, David; Balboni, Giulia; Bersani, Henry, Jr.; Borthwick-Duffy, Sharon A.; Spreat, Scott; Widaman, Keith F.; Zhang, Dalun; Navas, Patricia

2016-01-01

The Diagnostic Adaptive Behavior Scale (DABS) was developed using item response theory (IRT) methods and was constructed to provide the most precise and valid adaptive behavior information at or near the cutoff point of making a decision regarding a diagnosis of intellectual disability. The DABS initial item pool consisted of 260 items. Using IRT…
Dynamic Testing of Analogical Reasoning in 5- to 6-Year-Olds: Multiple-Choice versus Constructed-Response Training Items

ERIC Educational Resources Information Center

Stevenson, Claire E.; Heiser, Willem J.; Resing, Wilma C. M.

2016-01-01

Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC items leads to differences in the strategy…
The Relationship of Item-Level Response Times with Test-Taker and Item Variables in an Operational CAT Environment. LSAC Research Report Series.

ERIC Educational Resources Information Center

Swygert, Kimberly A.

In this study, data from an operational computerized adaptive test (CAT) were examined in order to gather information concerning item response times in a CAT environment. The CAT under study included multiple-choice items measuring verbal, quantitative, and analytical reasoning. The analyses included the fitting of regression models describing the…
Measuring pain phenomena after spinal cord injury: Development and psychometric properties of the SCI-QOL Pain Interference and Pain Behavior assessment tools.

PubMed

Cohen, Matthew L; Kisala, Pamela A; Dyson-Hudson, Trevor A; Tulsky, David S

2018-05-01

To develop modern patient-reported outcome measures that assess pain interference and pain behavior after spinal cord injury (SCI). Grounded-theory based qualitative item development; large-scale item calibration field-testing; confirmatory factor analyses; graded response model item response theory analyses; statistical linking techniques to transform scores to the Patient Reported Outcome Measurement Information System (PROMIS) metric. Five SCI Model Systems centers and one Department of Veterans Affairs medical center in the United States. Adults with traumatic SCI. N/A. Spinal Cord Injury - Quality of Life (SCI-QOL) Pain Interference item bank, SCI-QOL Pain Interference short form, and SCI-QOL Pain Behavior scale. Seven hundred fifty-seven individuals with traumatic SCI completed 58 items addressing various aspects of pain. Items were then separated by whether they assessed pain interference or pain behavior, and poorly functioning items were removed. Confirmatory factor analyses confirmed that each set of items was unidimensional, and item response theory analyses were used to estimate slopes and thresholds for the items. Ultimately, 7 items (4 from PROMIS) comprised the Pain Behavior scale and 25 items (18 from PROMIS) comprised the Pain Interference item bank. Ten of these 25 items were selected to form the Pain Interference short form. The SCI-QOL Pain Interference item bank and the SCI-QOL Pain Behavior scale demonstrated robust psychometric properties. The Pain Interference item bank is available as a computer adaptive test or short form for research and clinical applications, and scores are transformed to the PROMIS metric.
Reliability and validity of a short form household food security scale in a Caribbean community.

PubMed

Gulliford, Martin C; Mahabir, Deepak; Rocke, Brian

2004-06-16

We evaluated the reliability and validity of the short form household food security scale in a different setting from the one in which it was developed. The scale was interview administered to 531 subjects from 286 households in north central Trinidad in Trinidad and Tobago, West Indies. We evaluated the six items by fitting item response theory models to estimate item thresholds, estimating agreement among respondents in the same households and estimating the slope index of income-related inequality (SII) after adjusting for age, sex and ethnicity. Item-score correlations ranged from 0.52 to 0.79 and Cronbach's alpha was 0.87. Item responses gave within-household correlation coefficients ranging from 0.70 to 0.78. Estimated item thresholds (standard errors) from the Rasch model ranged from -2.027 (0.063) for the 'balanced meal' item to 2.251 (0.116) for the 'hungry' item. The 'balanced meal' item had the lowest threshold in each ethnic group even though there was evidence of differential functioning for this item by ethnicity. Relative thresholds of other items were generally consistent with US data. Estimation of the SII, comparing those at the bottom with those at the top of the income scale, gave relative odds for an affirmative response of 3.77 (95% confidence interval 1.40 to 10.2) for the lowest severity item, and 20.8 (2.67 to 162.5) for highest severity item. Food insecurity was associated with reduced consumption of green vegetables after additionally adjusting for income and education (0.52, 0.28 to 0.96). The household food security scale gives reliable and valid responses in this setting. Differing relative item thresholds compared with US data do not require alteration to the cut-points for classification of 'food insecurity without hunger' or 'food insecurity with hunger'. The data provide further evidence that re-evaluation of the 'balanced meal' item is required.
Network analysis of online bidding activity

NASA Astrophysics Data System (ADS)

Yang, I.; Oh, E.; Kahng, B.

2006-07-01

With the advent of digital media, people are increasingly resorting to online channels for commercial transactions. The online auction is a prototypical example. In such online transactions, the pattern of bidding activity is more complex than traditional offline transactions; this is because the number of bidders participating in a given transaction is not bounded and the bidders can also easily respond to the bidding instantaneously. By using the recently developed network theory, we study the interaction patterns between bidders (items) who (that) are connected when they bid for the same item (if the item is bid by the same bidder). The resulting network is analyzed by using the hierarchical clustering algorithm, which is used for clustering analysis for expression data from DNA microarrays. A dendrogram is constructed for the item subcategories; this dendrogram is compared to a traditional classification scheme. The implication of the difference between the two is discussed.
Computerized Adaptive Testing with Item Clones. Research Report.

ERIC Educational Resources Information Center

Glas, Cees A. W.; van der Linden, Wim J.

To reduce the cost of item writing and to enhance the flexibility of item presentation, items can be generated by item-cloning techniques. An important consequence of cloning is that it may cause variability on the item parameters. Therefore, a multilevel item response model is presented in which it is assumed that the item parameters of a…
Slower is not always better: Response-time evidence clarifies the limited role of miserly information processing in the Cognitive Reflection Test

PubMed Central

Pitchford, Melanie; Ball, Linden J.; Hunt, Thomas E.; Steel, Richard

2017-01-01

We report a study examining the role of ‘cognitive miserliness’ as a determinant of poor performance on the standard three-item Cognitive Reflection Test (CRT). The cognitive miserliness hypothesis proposes that people often respond incorrectly on CRT items because of an unwillingness to go beyond default, heuristic processing and invest time and effort in analytic, reflective processing. Our analysis (N = 391) focused on people’s response times to CRT items to determine whether predicted associations are evident between miserly thinking and the generation of incorrect, intuitive answers. Evidence indicated only a weak correlation between CRT response times and accuracy. Item-level analyses also failed to demonstrate predicted response-time differences between correct analytic and incorrect intuitive answers for two of the three CRT items. We question whether participants who give incorrect intuitive answers on the CRT can legitimately be termed cognitive misers and whether the three CRT items measure the same general construct. PMID:29099840
Development of the Contact Lens User Experience: CLUE Scales

PubMed Central

Wirth, R. J.; Edwards, Michael C.; Henderson, Michael; Henderson, Terri; Olivares, Giovanna; Houts, Carrie R.

2016-01-01

ABSTRACT Purpose The field of optometry has become increasingly interested in patient-reported outcomes, reflecting a common trend occurring across the spectrum of healthcare. This article reviews the development of the Contact Lens User Experience: CLUE system designed to assess patient evaluations of contact lenses. CLUE was built using modern psychometric methods such as factor analysis and item response theory. Methods The qualitative process through which relevant domains were identified is outlined as well as the process of creating initial item banks. Psychometric analyses were conducted on the initial item banks and refinements were made to the domains and items. Following this data-driven refinement phase, a second round of data was collected to further refine the items and obtain final item response theory item parameters estimates. Results Extensive qualitative work identified three key areas patients consider important when describing their experience with contact lenses. Based on item content and psychometric dimensionality assessments, the developing CLUE instruments were ultimately focused around four domains: comfort, vision, handling, and packaging. Item response theory parameters were estimated for the CLUE item banks (377 items), and the resulting scales were found to provide precise and reliable assignment of scores detailing users’ subjective experiences with contact lenses. Conclusions The CLUE family of instruments, as it currently exists, exhibits excellent psychometric properties. PMID:27383257

The influence of item order on intentional response distortion in the assessment of high potentials: assessing pilot applicants.

PubMed

Khorramdel, Lale; Kubinger, Klaus D; Uitz, Alexander

2014-04-01

An experiment was conducted to investigate the effects of item order and questionnaire content on faking good or intentional response distortion. It was hypothesized that intentional response distortion would either increase towards the end of a long questionnaire, as learning effects might make it easier to adjust responses to a faking good schema, or decrease because applicants' will to distort responses is reduced if the questionnaire lasts long enough. Furthermore, it was hypothesized that certain types of questionnaire content are especially vulnerable to response distortion. Eighty-four pre-selected pilot applicants filled out a questionnaire consisting of 516 items including items from the NEO five factor inventory (NEO FFI), NEO personality inventory revised (NEO PI-R) and business-focused inventory of personality (BIP). The positions of the items were varied within the applicant sample to test if responses are affected by item order, and applicants' response behaviour was additionally compared to that of volunteers. Applicants reported significantly higher mean scores than volunteers, and results provide some evidence of decreased faking tendencies towards the end of the questionnaire. Furthermore, it could be demonstrated that lower variances or standard deviations in combination with appropriate (often higher) mean scores can serve as an indicator for faking tendencies in group comparisons, even if effects are not significant. © 2013 International Union of Psychological Science.
Item Discrimination and Type I Error in the Detection of Differential Item Functioning

ERIC Educational Resources Information Center

Li, Yanju; Brooks, Gordon P.; Johanson, George A.

2012-01-01

In 2009, DeMars stated that when impact exists there will be Type I error inflation, especially with larger sample sizes and larger discrimination parameters for items. One purpose of this study is to present the patterns of Type I error rates using Mantel-Haenszel (MH) and logistic regression (LR) procedures when the mean ability between the…
The e-MSWS-12: improving the multiple sclerosis walking scale using item response theory.

PubMed

Engelhard, Matthew M; Schmidt, Karen M; Engel, Casey E; Brenton, J Nicholas; Patek, Stephen D; Goldman, Myla D

2016-12-01

The Multiple Sclerosis Walking Scale (MSWS-12) is the predominant patient-reported measure of multiple sclerosis (MS) -elated walking ability, yet it had not been analyzed using item response theory (IRT), the emerging standard for patient-reported outcome (PRO) validation. This study aims to reduce MSWS-12 measurement error and facilitate computerized adaptive testing by creating an IRT model of the MSWS-12 and distributing it online. MSWS-12 responses from 284 subjects with MS were collected by mail and used to fit and compare several IRT models. Following model selection and assessment, subpopulations based on age and sex were tested for differential item functioning (DIF). Model comparison favored a one-dimensional graded response model (GRM). This model met fit criteria and explained 87 % of response variance. The performance of each MSWS-12 item was characterized using category response curves (CRCs) and item information. IRT-based MSWS-12 scores correlated with traditional MSWS-12 scores (r = 0.99) and timed 25-foot walk (T25FW) speed (r = -0.70). Item 2 showed DIF based on age (χ 2 = 19.02, df = 5, p < 0.01), and Item 11 showed DIF based on sex (χ 2 = 13.76, df = 5, p = 0.02). MSWS-12 measurement error depends on walking ability, but could be lowered by improving or replacing items with low information or DIF. The e-MSWS-12 includes IRT-based scoring, error checking, and an estimated T25FW derived from MSWS-12 responses. It is available at https://ms-irt.shinyapps.io/e-MSWS-12 .
Fixed or mixed: a comparison of three, four and mixed-option multiple-choice tests in a Fetal Surveillance Education Program

PubMed Central

2013-01-01

Background Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. Methods The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Results Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. Conclusions The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information. PMID:23453056
Fixed or mixed: a comparison of three, four and mixed-option multiple-choice tests in a Fetal Surveillance Education Program.

PubMed

Zoanetti, Nathan; Beaves, Mark; Griffin, Patrick; Wallace, Euan M

2013-03-04

Despite the widespread use of multiple-choice assessments in medical education assessment, current practice and published advice concerning the number of response options remains equivocal. This article describes an empirical study contrasting the quality of three 60 item multiple-choice test forms within the Royal Australian and New Zealand College of Obstetricians and Gynaecologists (RANZCOG) Fetal Surveillance Education Program (FSEP). The three forms are described below. The first form featured four response options per item. The second form featured three response options, having removed the least functioning option from each item in the four-option counterpart. The third test form was constructed by retaining the best performing version of each item from the first two test forms. It contained both three and four option items. Psychometric and educational factors were taken into account in formulating an approach to test construction for the FSEP. The four-option test performed better than the three-option test overall, but some items were improved by the removal of options. The mixed-option test demonstrated better measurement properties than the fixed-option tests, and has become the preferred test format in the FSEP program. The criteria used were reliability, errors of measurement and fit to the item response model. The position taken is that decisions about the number of response options be made at the item level, with plausible options being added to complete each item on both psychometric and educational grounds rather than complying with a uniform policy. The point is to construct the better performing item in providing the best psychometric and educational information.
Measuring the quality of life in hypertension according to Item Response Theory

PubMed Central

Borges, José Wicto Pereira; Moreira, Thereza Maria Magalhães; Schmitt, Jeovani; de Andrade, Dalton Francisco; Barbetta, Pedro Alberto; de Souza, Ana Célia Caetano; Lima, Daniele Braz da Silva; Carvalho, Irialda Saboia

2017-01-01

ABSTRACT OBJECTIVE To analyze the Miniquestionário de Qualidade de Vida em Hipertensão Arterial (MINICHAL – Mini-questionnaire of Quality of Life in Hypertension) using the Item Response Theory. METHODS This is an analytical study conducted with 712 persons with hypertension treated in thirteen primary health care units of Fortaleza, State of Ceará, Brazil, in 2015. The steps of the analysis by the Item Response Theory were: evaluation of dimensionality, estimation of parameters of items, and construction of scale. The study of dimensionality was carried out on the polychoric correlation matrix and confirmatory factor analysis. To estimate the item parameters, we used the Gradual Response Model of Samejima. The analyses were conducted using the free software R with the aid of psych and mirt. RESULTS The analysis has allowed the visualization of item parameters and their individual contributions in the measurement of the latent trait, generating more information and allowing the construction of a scale with an interpretative model that demonstrates the evolution of the worsening of the quality of life in five levels. Regarding the item parameters, the items related to the somatic state have had a good performance, as they have presented better power to discriminate individuals with worse quality of life. The items related to mental state have been those which contributed with less psychometric data in the MINICHAL. CONCLUSIONS We conclude that the instrument is suitable for the identification of the worsening of the quality of life in hypertension. The analysis of the MINICHAL using the Item Response Theory has allowed us to identify new sides of this instrument that have not yet been addressed in previous studies. PMID:28492764
Development and Application of Methods for Estimating Operating Characteristics of Discrete Test Item Responses without Assuming any Mathematical Form.

ERIC Educational Resources Information Center

Samejima, Fumiko

In latent trait theory the latent space, or space of the hypothetical construct, is usually represented by some unidimensional or multi-dimensional continuum of real numbers. Like the latent space, the item response can either be treated as a discrete variable or as a continuous variable. Latent trait theory relates the item response to the latent…
Application of Group-Level Item Response Models in the Evaluation of Consumer Reports about Health Plan Quality

ERIC Educational Resources Information Center

Reise, Steven P.; Meijer, Rob R.; Ainsworth, Andrew T.; Morales, Leo S.; Hays, Ron D.

2006-01-01

Group-level parametric and non-parametric item response theory models were applied to the Consumer Assessment of Healthcare Providers and Systems (CAHPS[R]) 2.0 core items in a sample of 35,572 Medicaid recipients nested within 131 health plans. Results indicated that CAHPS responses are dominated by within health plan variation, and only weakly…
Bifactor and Item Response Theory Analyses of Interviewer Report Scales of Cognitive Impairment in Schizophrenia

PubMed Central

Reise, Steven P.; Ventura, Joseph; Keefe, Richard S. E.; Baade, Lyle E.; Gold, James M.; Green, Michael F.; Kern, Robert S.; Mesholam-Gately, Raquelle; Nuechterlein, Keith H.; Seidman, Larry J.; Bilder, Robert

2011-01-01

We conducted psychometric analyses of two interview-based measures of cognitive deficits: the 21-item Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS; Ventura et al., 2008), and the 20-item Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006), which were administered on two occasions to a sample of people with schizophrenia. Traditional psychometrics, bifactor analysis, and item response theory (IRT) methods were used to explore item functioning, dimensionality, and to compare instruments. Despite containing similar item content, responses to the CGI-CogS demonstrated superior psychometric properties (e.g., higher item-intercorrelations, better spread of ratings across response categories), relative to the SCoRS. We argue that these differences arise mainly from the differential use of prompts and how the items are phrased and scored. Bifactor analysis demonstrated that although both measures capture a broad range of cognitive functioning (e.g., working memory, social cognition), the common variance on each is overwhelmingly explained by a single general factor. IRT analyses of the combined pool of 41 items showed that measurement precision is peaked in the mild to moderate range of cognitive impairment. Finally, simulated adaptive testing revealed that only about 10 to 12 items are necessary to achieve latent trait level estimates with reasonably small standard errors for most individuals. This suggests that these interview-based measures of cognitive deficits could be shortened without loss of measurement precision. PMID:21381848
Validation of a clinical critical thinking skills test in nursing.

PubMed

Shin, Sujin; Jung, Dukyoo; Kim, Sungeun

2015-01-27

The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability.
Validation of a clinical critical thinking skills test in nursing

PubMed Central

2015-01-01

Purpose: The purpose of this study was to develop a revised version of the clinical critical thinking skills test (CCTS) and to subsequently validate its performance. Methods: This study is a secondary analysis of the CCTS. Data were obtained from a convenience sample of 284 college students in June 2011. Thirty items were analyzed using item response theory and test reliability was assessed. Test-retest reliability was measured using the results of 20 nursing college and graduate school students in July 2013. The content validity of the revised items was analyzed by calculating the degree of agreement between instrument developer intention in item development and the judgments of six experts. To analyze response process validity, qualitative data related to the response processes of nine nursing college students obtained through cognitive interviews were analyzed. Results: Out of initial 30 items, 11 items were excluded after the analysis of difficulty and discrimination parameter. When the 19 items of the revised version of the CCTS were analyzed, levels of item difficulty were found to be relatively low and levels of discrimination were found to be appropriate or high. The degree of agreement between item developer intention and expert judgments equaled or exceeded 50%. Conclusion: From above results, evidence of the response process validity was demonstrated, indicating that subjects respondeds as intended by the test developer. The revised 19-item CCTS was found to have sufficient reliability and validity and will therefore represents a more convenient measurement of critical thinking ability. PMID:25622716
Cross-informant and cross-national equivalence using item-response theory (IRT) linking: A case study using the behavioral assessment for children of African heritage in the United States and Jamaica.

PubMed

Lambert, Michael Canute; Ferguson, Gail M; Rowan, George T

2016-03-01

Cross-national study of adolescents' psychological adjustment requires measures that permit reliable and valid assessment across informants and nations, but such measures are virtually nonexistent. Item-response-theory-based linking is a promising yet underutilized methodological procedure that permits more accurate assessment across informants and nations. To demonstrate this procedure, the Resilience Scale of the Behavioral Assessment for Children of African Heritage (Lambert et al., 2005) was administered to 250 African American and 294 Jamaican nonreferred adolescents and their caregivers. Multiple items without significant differential item functioning emerged, allowing scale linking across informants and nations. Calibrating item parameters via item response theory linking can permit cross-informant cross-national assessment of youth. (c) 2016 APA, all rights reserved).
Combining agreement and frequency rating scales to optimize psychometrics in measuring behavioral health functioning.

PubMed

Marfeo, Elizabeth E; Ni, Pengsheng; Chan, Leighton; Rasch, Elizabeth K; Jette, Alan M

2014-07-01

The goal of this article was to investigate optimal functioning of using frequency vs. agreement rating scales in two subdomains of the newly developed Work Disability Functional Assessment Battery: the Mood & Emotions and Behavioral Control scales. A psychometric study comparing rating scale performance embedded in a cross-sectional survey used for developing a new instrument to measure behavioral health functioning among adults applying for disability benefits in the United States was performed. Within the sample of 1,017 respondents, the range of response category endorsement was similar for both frequency and agreement item types for both scales. There were fewer missing values in the frequency items than the agreement items. Both frequency and agreement items showed acceptable reliability. The frequency items demonstrated optimal effectiveness around the mean ± 1-2 standard deviation score range; the agreement items performed better at the extreme score ranges. Findings suggest an optimal response format requires a mix of both agreement-based and frequency-based items. Frequency items perform better in the normal range of responses, capturing specific behaviors, reactions, or situations that may elicit a specific response. Agreement items do better for those whose scores are more extreme and capture subjective content related to general attitudes, behaviors, or feelings of work-related behavioral health functioning. Copyright © 2014 Elsevier Inc. All rights reserved.
Translation, adaptation and validation of the American short form Patient Activation Measure (PAM13) in a Danish version.

PubMed

Maindal, Helle Terkildsen; Sokolowski, Ineta; Vedsted, Peter

2009-06-29

The Patient Activation Measure (PAM) is a measure that assesses patient knowledge, skill, and confidence for self-management. This study validates the Danish translation of the 13-item Patient Activation Measure (PAM13) in a Danish population with dysglycaemia. 358 people with screen-detected dysglycaemia participating in a primary care health education study responded to PAM13. The PAM13 was translated into Danish by a standardised forward-backward translation. Data quality was assessed by mean, median, item response, missing values, floor and ceiling effects, internal consistency (Cronbach's alpha and average inter-item correlation) and item-rest correlations. Scale properties were assessed by Rasch Rating Scale models. The item response was high with a small number of missing values (0.8-4.2%). Floor effect was small (range 0.6-3.6%), but the ceiling effect was above 15% for all items (range 18.6-62.7%). The alpha-coefficient was 0.89 and the average inter-item correlation 0.38. The Danish version formed a unidimensional, probabilistic Guttman-like scale explaining 43.2% of the variance. We did however, find a different item sequence compared to the original scale. A Danish version of PAM13 with acceptable validity and reliability is now available. Further development should focus on single items, response categories in relation to ceiling effects and further validation of reproducibility and responsiveness.
Detection of Differential Item Functioning Using the Lasso Approach

ERIC Educational Resources Information Center

Magis, David; Tuerlinckx, Francis; De Boeck, Paul

2015-01-01

This article proposes a novel approach to detect differential item functioning (DIF) among dichotomously scored items. Unlike standard DIF methods that perform an item-by-item analysis, we propose the "LR lasso DIF method": logistic regression (LR) model is formulated for all item responses. The model contains item-specific intercepts,…
Item response theory analysis of Centers for Disease Control and Prevention Health-Related Quality of Life (CDC HRQOL) items in adults with arthritis.

PubMed

Mielenz, Thelma J; Callahan, Leigh F; Edwards, Michael C

2016-03-12

Examine the feasibility of performing an item response theory (IRT) analysis on two of the Centers for Disease Control and Prevention health-related quality of life (CDC HRQOL) modules - the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM). Previous principal components analyses confirm that the two scales both assess a mix of mental (CDC-MH) and physical health (CDC-PH). The purpose is to conduct item response theory (IRT) analysis on the CDC-MH and CDC-PH scales separately. 2182 patients with self-reported or physician-diagnosed arthritis completed a cross-sectional survey including HDCM and HDSM items. Besides global health, the other 8 items ask the number of days that some statement was true; we chose to recode the data into 8 categories based on observed clustering. The IRT assumptions were assessed using confirmatory factor analysis and the data could be modeled using an unidimensional IRT model. The graded response model was used for IRT analyses and CDC-MH and CDC-PH scales were analyzed separately in flexMIRT. The IRT parameter estimates for the five-item CDC-PH all appeared reasonable. The three-item CDC-MH did not have reasonable parameter estimates. The CDC-PH scale is amenable to IRT analysis but the existing The CDC-MH scale is not. We suggest either using the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM) as they currently stand or the CDC-PH scale alone if the primary goal is to measure physical health related HRQOL.
The Vulvar Pain Assessment Questionnaire inventory.

PubMed

Dargie, Emma; Holden, Ronald R; Pukall, Caroline F

2016-12-01

Millions suffer from chronic vulvar pain (ie, vulvodynia). Vulvodynia represents the intersection of 2 difficult subjects for health care professionals to tackle: sexuality and chronic pain. Those with chronic vulvar pain are often uncomfortable seeking help, and many who do so fail to receive proper diagnoses. The current research developed a multidimensional assessment questionnaire, the Vulvar Pain Assessment Questionnaire (VPAQ) inventory, to assist in the assessment and diagnosis of those with vulvar pain. A large pool of items was created to capture pain characteristics, emotional/cognitive functioning, physical functioning, coping skills, and partner factors. The item pool was subsequently administered online to 288 participants with chronic vulvar pain. Of those, 248 participants also completed previously established questionnaires that were used to evaluate the convergent and discriminant validity of the VPAQ. Exploratory factor analyses of the item pool established 6 primary scales: Pain Severity, Emotional Response, Cognitive Response, and Interference with Life, Sexual Function, and Self-Stimulation/Penetration. A brief screening version accompanies a more detailed version. In addition, 3 supplementary scales address pain quality characteristics, coping skills, and the impact on one's romantic relationship. When relationships among VPAQ scales and previously researched scales were examined, evidence of convergent and discriminant validity was observed. These patterns of findings are consistent with the literature on the multidimensional nature of vulvodynia. The VPAQ can be used for assessment, diagnosis, treatment formulation, and treatment monitoring. In addition, the VPAQ could potentially be used to promote communication between patients and providers, and point toward helpful treatment options and/or referrals.
Using EEG and stimulus context to probe the modelling of auditory-visual speech.

PubMed

Paris, Tim; Kim, Jeesun; Davis, Chris

2016-02-01

We investigated whether internal models of the relationship between lip movements and corresponding speech sounds [Auditory-Visual (AV) speech] could be updated via experience. AV associations were indexed by early and late event related potentials (ERPs) and by oscillatory power and phase locking. Different AV experience was produced via a context manipulation. Participants were presented with valid (the conventional pairing) and invalid AV speech items in either a 'reliable' context (80% AVvalid items) or an 'unreliable' context (80% AVinvalid items). The results showed that for the reliable context, there was N1 facilitation for AV compared to auditory only speech. This N1 facilitation was not affected by AV validity. Later ERPs showed a difference in amplitude between valid and invalid AV speech and there was significant enhancement of power for valid versus invalid AV speech. These response patterns did not change over the context manipulation, suggesting that the internal models of AV speech were not updated by experience. The results also showed that the facilitation of N1 responses did not vary as a function of the salience of visual speech (as previously reported); in post-hoc analyses, it appeared instead that N1 facilitation varied according to the relative time of the acoustic onset, suggesting for AV events N1 may be more sensitive to the relationship of AV timing than form. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.
Sequential Computerized Mastery Tests--Three Simulation Studies

ERIC Educational Resources Information Center

Wiberg, Marie

2006-01-01

A simulation study of a sequential computerized mastery test is carried out with items modeled with the 3 parameter logistic item response theory model. The examinees' responses are either identically distributed, not identically distributed, or not identically distributed together with estimation errors in the item characteristics. The…
Distinguishing Fast and Slow Processes in Accuracy - Response Time Data.

PubMed

Coomans, Frederik; Hofman, Abe; Brinkhuis, Matthieu; van der Maas, Han L J; Maris, Gunter

2016-01-01

We investigate the relation between speed and accuracy within problem solving in its simplest non-trivial form. We consider tests with only two items and code the item responses in two binary variables: one indicating the response accuracy, and one indicating the response speed. Despite being a very basic setup, it enables us to study item pairs stemming from a broad range of domains such as basic arithmetic, first language learning, intelligence-related problems, and chess, with large numbers of observations for every pair of problems under consideration. We carry out a survey over a large number of such item pairs and compare three types of psychometric accuracy-response time models present in the literature: two 'one-process' models, the first of which models accuracy and response time as conditionally independent and the second of which models accuracy and response time as conditionally dependent, and a 'two-process' model which models accuracy contingent on response time. We find that the data clearly violates the restrictions imposed by both one-process models and requires additional complexity which is parsimoniously provided by the two-process model. We supplement our survey with an analysis of the erroneous responses for an example item pair and demonstrate that there are very significant differences between the types of errors in fast and slow responses.

What can we learn from PISA?: Investigating PISA's approach to scientific literacy

NASA Astrophysics Data System (ADS)

Schwab, Cheryl Jean

This dissertation is an investigation of the relationship between the multidimensional conception of scientific literacy and its assessment. The Programme for International Student Assessment (PISA), developed under the auspices of the Organization for Economic Cooperation and Development (OECD), offers a unique opportunity to evaluate the assessment of scientific literacy. PISA developed a continuum of performance for scientific literacy across three competencies (i.e., process, content, and situation). Foundational to the interpretation of PISA science assessment is PISA's definition of scientific literacy, which I argue incorporates three themes drawn from history: (a) scientific way of thinking, (b) everyday relevance of science, and (c) scientific literacy for all students. Three coordinated studies were conducted to investigate the validity of PISA science assessment and offer insight into the development of items to assess scientific 2 literacy. Multidimensional models of the internal structure of the PISA 2003 science items were found not to reflect the complex character of PISA's definition of scientific literacy. Although the multidimensional models across the three competencies significantly decreased the G2 statistic from the unidimensional model, high correlations between the dimensions suggest that the dimensions are similar. A cognitive analysis of student verbal responses to PISA science items revealed that students were using competencies of scientific literacy, but the competencies were not elicited by the PISA science items at the depth required by PISA's definition of scientific literacy. Although student responses contained only knowledge of scientific facts and simple scientific concepts, students were using more complex skills to interpret and communicate their responses. Finally the investigation of different scoring approaches and item response models illustrated different ways to interpret student responses to assessment items. These analyses highlighted the complexities of students' responses to the PISA science items and the use of the ordered partition model to accommodate different but equal item responses. The results of the three investigations are used to discuss ways to improve the development and interpretation of PISA's science items.
Older and Wiser: Older Adults’ Episodic Word Memory Benefits from Sentence Study Contexts

PubMed Central

Matzen, Laura E.; Benjamin, Aaron S.

2013-01-01

A hallmark of adaptive cognition is the ability to modulate learning in response to the demands posed by different types of tests and different types of materials. Here we evaluate how older adults process words and sentences differently by examining patterns of memory errors. In two experiments, we explored younger and older adults’ sensitivity to lures on a recognition test following study of words in these two types of contexts. Among the studied words were compound words such as “blackmail” and “jailbird” that were related to conjunction lures (e.g. “blackbird”) and semantic lures (e.g. “criminal”). Participants engaged in a recognition test that included old items, conjunction lures, semantic lures, and unrelated new items. In both experiments, younger and older adults had the same general pattern of memory errors: more incorrect endorsements of semantic than conjunction lures following sentence study and more incorrect endorsements of conjunction than semantic lures following list study. The similar pattern reveals that older and younger adults responded to the constraints of the two different study contexts in similar ways. However, while younger and older adults showed similar levels of memory performance for the list study context, the sentence study context elicited superior memory performance in the older participants. It appears as though memory tasks that take advantage of greater expertise in older adults--in this case, greater experience with sentence processing--can reveal superior memory performance in the elderly. PMID:23834493
Mechanisms supporting superior source memory for familiar items: a multi-voxel pattern analysis study.

PubMed

Poppenk, Jordan; Norman, Kenneth A

2012-11-01

Recent cognitive research has revealed better source memory performance for familiar relative to novel stimuli. Here we consider two possible explanations for this finding. The source memory advantage for familiar stimuli could arise because stimulus novelty induces attention to stimulus features at the expense of contextual processing, resulting in diminished overall levels of contextual processing at study for novel (vs. familiar) stimuli. Another possibility is that stimulus information retrieved from long-term memory (LTM) provides scaffolding that facilitates the formation of item-context associations. If contextual features are indeed more effectively bound to familiar (vs. novel) items, the relationship between contextual processing at study and subsequent source memory should be stronger for familiar items. We tested these possibilities by applying multi-voxel pattern analysis (MVPA) to a recently collected functional magnetic resonance imaging (fMRI) dataset, with the goal of measuring contextual processing at study and relating it to subsequent source memory performance. Participants were scanned with fMRI while viewing novel proverbs, repeated proverbs (previously novel proverbs that were shown in a pre-study phase), and previously known proverbs in the context of one of two experimental tasks. After scanning was complete, we evaluated participants' source memory for the task associated with each proverb. Drawing upon fMRI data from the study phase, we trained a classifier to detect on-task processing (i.e., how strongly was the correct task set activated). On-task processing was greater for previously known than novel proverbs and similar for repeated and novel proverbs. However, both within and across participants, the relationship between on-task processing and subsequent source memory was stronger for repeated than novel proverbs and similar for previously known and novel proverbs. Finally, focusing on the repeated condition, we found that higher levels of hippocampal activity during the pre-study phase, which we used as an index of episodic encoding, led to a stronger relationship between on-task processing at study and subsequent memory. Together, these findings suggest different mechanisms may be primarily responsible for superior source memory for repeated and previously known stimuli. Specifically, they suggest that prior stimulus knowledge enhances memory by boosting the overall level of contextual processing, whereas stimulus repetition enhances the probability that contextual features will be successfully bound to item features. Several possible theoretical explanations for this pattern are discussed. Copyright © 2012 Elsevier Ltd. All rights reserved.
Measuring ability to assess claims about treatment effects: a latent trait analysis of items from the ‘Claim Evaluation Tools’ database using Rasch modelling

PubMed Central

Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D

2017-01-01

Background The Claim Evaluation Tools database contains multiple-choice items for measuring people’s ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. Objectives To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. Participants We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Results Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Conclusion Most of the items conformed well to the Rasch model’s expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. PMID:28550019
Using Rasch Analysis to Evaluate the Reliability and Validity of the Swallowing Quality of Life Questionnaire: An Item Response Theory Approach.

PubMed

Cordier, Reinie; Speyer, Renée; Schindler, Antonio; Michou, Emilia; Heijnen, Bas Joris; Baijens, Laura; Karaduman, Ayşe; Swan, Katina; Clavé, Pere; Joosten, Annette Veronica

2018-02-01

The Swallowing Quality of Life questionnaire (SWAL-QOL) is widely used clinically and in research to evaluate quality of life related to swallowing difficulties. It has been described as a valid and reliable tool, but was developed and tested using classic test theory. This study describes the reliability and validity of the SWAL-QOL using item response theory (IRT; Rasch analysis). SWAL-QOL data were gathered from 507 participants at risk of oropharyngeal dysphagia (OD) across four European countries. OD was confirmed in 75.7% of participants via videofluoroscopy and/or fiberoptic endoscopic evaluation, or a clinical diagnosis based on meeting selected criteria. Patients with esophageal dysphagia were excluded. Data were analysed using Rasch analysis. Item and person reliability was good for all the items combined. However, person reliability was poor for 8 subscales and item reliability was poor for one subscale. Eight subscales exhibited poor person separation and two exhibited poor item separation. Overall item and person fit statistics were acceptable. However, at an individual item fit level results indicated unpredictable item responses for 28 items, and item redundancy for 10 items. The item-person dimensionality map confirmed these findings. Results from the overall Rasch model fit and Principal Component Analysis were suggestive of a second dimension. For all the items combined, none of the item categories were 'category', 'threshold' or 'step' disordered; however, all subscales demonstrated category disordered functioning. Findings suggest an urgent need to further investigate the underlying structure of the SWAL-QOL and its psychometric characteristics using IRT.
Threats to Validity When Using Open-Ended Items in International Achievement Studies: Coding Responses to the PISA 2012 Problem-Solving Test in Finland

ERIC Educational Resources Information Center

Arffman, Inga

2016-01-01

Open-ended (OE) items are widely used to gather data on student performance in international achievement studies. However, several factors may threaten validity when using such items. This study examined Finnish coders' opinions about threats to validity when coding responses to OE items in the PISA 2012 problem-solving test. A total of 6…
Effect of Item Response Theory (IRT) Model Selection on Testlet-Based Test Equating. Research Report. ETS RR-14-19

ERIC Educational Resources Information Center

Cao, Yi; Lu, Ru; Tao, Wei

2014-01-01

The local item independence assumption underlying traditional item response theory (IRT) models is often not met for tests composed of testlets. There are 3 major approaches to addressing this issue: (a) ignore the violation and use a dichotomous IRT model (e.g., the 2-parameter logistic [2PL] model), (b) combine the interdependent items to form a…
Kernel-Smoothing Estimation of Item Characteristic Functions for Continuous Personality Items: An Empirical Comparison with the Linear and the Continuous-Response Models

ERIC Educational Resources Information Center

Ferrando, Pere J.

2004-01-01

This study used kernel-smoothing procedures to estimate the item characteristic functions (ICFs) of a set of continuous personality items. The nonparametric ICFs were compared with the ICFs estimated (a) by the linear model and (b) by Samejima's continuous-response model. The study was based on a conditioned approach and used an error-in-variables…
Innovative Application of a Multidimensional Item Response Model in Assessing the Influence of Social Desirability on the Pseudo-Relationship between Self-Efficacy and Behavior

ERIC Educational Resources Information Center

Watson, Kathy; Baranowski, Tom; Thompson, Debbe; Jago, Russell; Baranowski, Janice; Klesges, Lisa M.

2006-01-01

This study examined multidimensional item response theory (MIRT) modeling to assess social desirability (SocD) influences on self-reported physical activity self-efficacy (PASE) and fruit and vegetable self-efficacy (FVSE). The observed sample included 473 Houston-area adolescent males (10-14 years). SocD (nine items), PASE (19 items) and FVSE (21…
The Structure of the Narcissistic Personality Inventory With Binary and Rating Scale Items.

PubMed

Boldero, Jennifer M; Bell, Richard C; Davies, Richard C

2015-01-01

Narcissistic Personality Inventory (NPI) items typically have a forced-choice format, comprising a narcissistic and a nonnarcissistic statement. Recently, some have presented the narcissistic statements and asked individuals to either indicate whether they agree or disagree that the statements are self-descriptive (i.e., a binary response format) or to rate the extent to which they agree or disagree that these statements are self-descriptive on a Likert scale (i.e., a rating response format). The current research demonstrates that when NPI items have a binary or a rating response format, the scale has a bifactor structure (i.e., the items load on a general factor and on 6 specific group factors). Indexes of factor strength suggest that the data are unidimensional enough for the NPI's general factor to be considered a measure of a narcissism latent trait. However, the rating item general factor assessed more narcissism components than the binary item one. The positive correlations of the NPI's general factor, assessed when items have a rating response format, were moderate with self-esteem, strong with a measure of narcissistic grandiosity, and weak with 2 measures of narcissistic vulnerability. Together, the results suggest that using a rating format for items enhances the information provided by the NPI.
Memory consolidation by replay of stimulus-specific neural activity.

PubMed

Deuker, Lorena; Olligs, Jan; Fell, Juergen; Kranz, Thorsten A; Mormann, Florian; Montag, Christian; Reuter, Martin; Elger, Christian E; Axmacher, Nikolai

2013-12-04

Memory consolidation transforms initially labile memory traces into more stable representations. One putative mechanism for consolidation is the reactivation of memory traces after their initial encoding during subsequent sleep or waking state. However, it is still unknown whether consolidation of individual memory contents relies on reactivation of stimulus-specific neural representations in humans. Investigating stimulus-specific representations in humans is particularly difficult, but potentially feasible using multivariate pattern classification analysis (MVPA). Here, we show in healthy human participants that stimulus-specific activation patterns can indeed be identified with MVPA, that these patterns reoccur spontaneously during postlearning resting periods and sleep, and that the frequency of reactivation predicts subsequent memory for individual items. We conducted a paired-associate learning task with items and spatial positions and extracted stimulus-specific activity patterns by MVPA in a simultaneous electroencephalography and functional magnetic resonance imaging (fMRI) study. As a first step, we investigated the amount of fMRI volumes during rest that resembled either one of the items shown before or one of the items shown as a control after the resting period. Reactivations during both awake resting state and sleep predicted subsequent memory. These data are first evidence that spontaneous reactivation of stimulus-specific activity patterns during resting state can be investigated using MVPA. They show that reactivation occurs in humans and is behaviorally relevant for stabilizing memory traces against interference. They move beyond previous studies because replay was investigated on the level of individual stimuli and because reactivations were not evoked by sensory cues but occurred spontaneously.
Item Banks for Substance Use from the Patient-Reported Outcomes Measurement Information System (PROMIS®): Severity of Use and Positive Appeal of Use*

PubMed Central

Pilkonis, Paul A.; Yu, Lan; Dodds, Nathan E.; Johnston, Kelly L.; Lawrence, Suzanne; Hilton, Thomas F.; Daley, Dennis C.; Patkar, Ashwin A.; McCarty, Dennis

2015-01-01

Background Two item banks for substance use were developed as part of the Patient-Reported Outcomes Measurement Information System (PROMIS®): severity of substance use and positive appeal of substance use. Methods Qualitative item analysis (including focus groups, cognitive interviewing, expert review, and item revision) reduced an initial pool of more than 5,300 items for substance use to 119 items included in field testing. Items were written in a first-person, past-tense format, with 5 response options reflecting frequency or severity. Both 30-day and 3-month time frames were tested. The calibration sample of 1,336 respondents included 875 individuals from the general population (ascertained through an internet panel) and 461patients from addiction treatment centers participating in the National Drug Abuse Treatment Clinical Trials Network. Results Final banks of 37 and 18 items were calibrated for severity of substance use and positive appeal of substance use, respectively, using the two-parameter graded response model from item response theory (IRT). Initial calibrations were similar for the 30-day and 3-month time frames, and final calibrations used data combined across the time frames, making the items applicable with either interval. Seven-item static short forms were also developed from each item bank. Conclusions Test information curves showed that the PROMIS item banks provided substantial information in a broad range of severity, making them suitable for treatment, observational, and epidemiological research in both clinical and community settings. PMID:26423364
Emotional content enhances true but not false memory for categorized stimuli.

PubMed

Choi, Hae-Yoon; Kensinger, Elizabeth A; Rajaram, Suparna

2013-04-01

Past research has shown that emotion enhances true memory, but that emotion can either increase or decrease false memory. Two theoretical possibilities-the distinctiveness of emotional stimuli and the conceptual relatedness of emotional content-have been implicated as being responsible for influencing both true and false memory for emotional content. In the present study, we sought to identify the mechanisms that underlie these mixed findings by equating the thematic relatedness of the study materials across each type of valence used (negative, positive, or neutral). In three experiments, categorically bound stimuli (e.g., funeral, pets, and office items) were used for this purpose. When the encoding task required the processing of thematic relatedness, a significant true-memory enhancement for emotional content emerged in recognition memory, but no emotional boost to false memory (exp. 1). This pattern persisted for true memory with a longer retention interval between study and test (24 h), and false recognition was reduced for emotional items (exp. 2). Finally, better recognition memory for emotional items once again emerged when the encoding task (arousal ratings) required the processing of the emotional aspect of the study items, with no emotional boost to false recognition (EXP. 3). Together, these findings suggest that when emotional and neutral stimuli are equivalently high in thematic relatedness, emotion continues to improve true memory, but it does not override other types of grouping to increase false memory.
The breastfeeding self-efficacy scale: psychometric assessment of the short form.

PubMed

Dennis, Cindy-Lee

2003-01-01

The purpose of this study was to reduce the number of items on the original Breastfeeding Self-Efficacy Scale (BSES) and psychometrically assess the revised BSES-Short Form (BSES-SF). As part of a longitudinal study, participants completed mailed questionnaires at 1, 4, and 8 weeks postpartum. Health region in British Columbia. A population-based sample of 491 breastfeeding mothers. BSES, Edinburgh Postnatal Depression Scale, Rosenberg Self-Esteem Scale, and Perceived Stress Scale. Internal consistency statistics with the original BSES suggested item redundancy. As such, 18 items were deleted, using explicit reduction criteria. Based on the encouraging reliability analysis of the new 14-item BSES-SF, construct validity was assessed using principal components factor analysis, comparison of contrasted groups, and correlations with measures of similar constructs. Support for predictive validity was demonstrated through significant mean differences between breastfeeding and bottle feeding mothers at 4 (p < .001) and 8 (p < .001) weeks postpartum. Demographic response patterns suggested the BSES-SF is a unique tool to identify mothers at risk of prematurely discontinuing breastfeeding. These psychometric results indicate the BSES-SF is an excellent measure of breastfeeding self-efficacy and considered ready for clinical use to (a) identify breastfeeding mothers at high risk, (b) assess breastfeeding behaviors and cognitions to individualize confidence-building strategies, and (c) evaluate the effectiveness of various interventions and guide program development.
Capturing specific abilities as a window into human individuality: The example of face recognition

PubMed Central

Wilmer, Jeremy B.; Germine, Laura; Chabris, Christopher F.; Chatterjee, Garga; Gerbasi, Margaret; Nakayama, Ken

2013-01-01

Proper characterization of each individual's unique pattern of strengths and weaknesses requires good measures of diverse abilities. Here, we advocate combining our growing understanding of neural and cognitive mechanisms with modern psychometric methods in a renewed effort to capture human individuality through a consideration of specific abilities. We articulate five criteria for the isolation and measurement of specific abilities, then apply these criteria to face recognition. We cleanly dissociate face recognition from more general visual and verbal recognition. This dissociation stretches across ability as well as disability, suggesting that specific developmental face recognition deficits are a special case of a broader specificity that spans the entire spectrum of human face recognition performance. Item-by-item results from 1,471 web-tested participants, included as supplementary information, fuel item analyses, validation, norming, and item response theory (IRT) analyses of our three tests: (a) the widely used Cambridge Face Memory Test (CFMT); (b) an Abstract Art Memory Test (AAMT), and (c) a Verbal Paired-Associates Memory Test (VPMT). The availability of this data set provides a solid foundation for interpreting future scores on these tests. We argue that the allied fields of experimental psychology, cognitive neuroscience, and vision science could fuel the discovery of additional specific abilities to add to face recognition, thereby providing new perspectives on human individuality. PMID:23428079
Practical Guide to Conducting an Item Response Theory Analysis

ERIC Educational Resources Information Center

Toland, Michael D.

2014-01-01

Item response theory (IRT) is a psychometric technique used in the development, evaluation, improvement, and scoring of multi-item scales. This pedagogical article provides the necessary information needed to understand how to conduct, interpret, and report results from two commonly used ordered polytomous IRT models (Samejima's graded…
Analyzing Longitudinal Item Response Data via the Pairwise Fitting Method

ERIC Educational Resources Information Center

Fu, Zhi-Hui; Tao, Jian; Shi, Ning-Zhong; Zhang, Ming; Lin, Nan

2011-01-01

Multidimensional item response theory (MIRT) models can be applied to longitudinal educational surveys where a group of individuals are administered different tests over time with some common items. However, computational problems typically arise as the dimension of the latent variables increases. This is especially true when the latent variable…
Item Construction and Psychometric Models Appropriate for Constructed Responses

DTIC Science & Technology

1991-08-01

which involve only one attribute per item. This is especially true when we are dealing with constructed-response items, we have to measure much more...Service University of Ilinois Educacional Testing Service Rosedal Road Capign. IL 61801 Princeton. K3 08541 Princeton. N3 08541 Dr. Charles LeiS Dr
Different Approaches to Covariate Inclusion in the Mixture Rasch Model

ERIC Educational Resources Information Center

Li, Tongyun; Jiao, Hong; Macready, George B.

2016-01-01

The present study investigates different approaches to adding covariates and the impact in fitting mixture item response theory models. Mixture item response theory models serve as an important methodology for tackling several psychometric issues in test development, including the detection of latent differential item functioning. A Monte Carlo…
Classification Consistency and Accuracy for Complex Assessments Using Item Response Theory

ERIC Educational Resources Information Center

Lee, Won-Chan

2010-01-01

In this article, procedures are described for estimating single-administration classification consistency and accuracy indices for complex assessments using item response theory (IRT). This IRT approach was applied to real test data comprising dichotomous and polytomous items. Several different IRT model combinations were considered. Comparisons…

Robust Estimation of Latent Ability in Item Response Models

ERIC Educational Resources Information Center

Schuster, Christof; Yuan, Ke-Hai

2011-01-01

Because of response disturbances such as guessing, cheating, or carelessness, item response models often can only approximate the "true" individual response probabilities. As a consequence, maximum-likelihood estimates of ability will be biased. Typically, the nature and extent to which response disturbances are present is unknown, and, therefore,…
Theoretical and Empirical Comparisons between Two Models for Continuous Item Responses.

ERIC Educational Resources Information Center

Ferrando, Pere J.

2002-01-01

Analyzed the relations between two continuous response models intended for typical response items: the linear congeneric model and Samejima's continuous response model (CRM). Illustrated the relations described using an empirical example and assessed the relations through a simulation study. (SLD)
On the dynamic nature of response criterion in recognition memory: effects of base rate, awareness, and feedback.

PubMed

Rhodes, Matthew G; Jacoby, Larry L

2007-03-01

The authors examined whether participants can shift their criterion for recognition decisions in response to the probability that an item was previously studied. Participants in 3 experiments were given recognition tests in which the probability that an item was studied was correlated with its location during the test. Results from all 3 experiments indicated that participants' response criteria were sensitive to the probability that an item was previously studied and that shifts in criterion were robust. In addition, awareness of the bases for criterion shifts and feedback on performance were key factors contributing to the observed shifts in decision criteria. These data suggest that decision processes can operate in a dynamic fashion, shifting from item to item.
Item Banking. ERIC/AE Digest.

ERIC Educational Resources Information Center

Rudner, Lawrence

This digest discusses the advantages and disadvantages of using item banks, and it provides useful information for those who are considering implementing an item banking project in their school districts. The primary advantage of item banking is in test development. Using an item response theory method, such as the Rasch model, items from multiple…
An item-response theory approach to safety climate measurement: The Liberty Mutual Safety Climate Short Scales.

PubMed

Huang, Yueng-Hsiang; Lee, Jin; Chen, Zhuo; Perry, MacKenna; Cheung, Janelle H; Wang, Mo

2017-06-01

Zohar and Luria's (2005) safety climate (SC) scale, measuring organization- and group- level SC each with 16 items, is widely used in research and practice. To improve the utility of the SC scale, we shortened the original full-length SC scales. Item response theory (IRT) analysis was conducted using a sample of 29,179 frontline workers from various industries. Based on graded response models, we shortened the original scales in two ways: (1) selecting items with above-average discriminating ability (i.e. offering more than 6.25% of the original total scale information), resulting in 8-item organization-level and 11-item group-level SC scales; and (2) selecting the most informative items that together retain at least 30% of original scale information, resulting in 4-item organization-level and 4-item group-level SC scales. All four shortened scales had acceptable reliability (≥0.89) and high correlations (≥0.95) with the original scale scores. The shortened scales will be valuable for academic research and practical survey implementation in improving occupational safety. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Recovery of components of memory in post-traumatic amnesia.

PubMed

Leach, Kathleen; Kinsella, Glynda; Jackson, Martin; Matyas, Tom

2006-11-01

Post-traumatic amnesia by definition indicates significant impairment of new learning ability, however very few studies have, examined the natural history and resolution of memory and new learning during PTA. Those studies which have, tended to examine orientation separately from the memory processes required to achieve orientation. Analysis of the order of recovery of the items of the Westmead PTA scale was used to examine recovery of memory and new learning capacity. The results of daily assessment of 34 patients with traumatic brain injury (TBI) on the Westmead PTA scale were analysed for order of recovery. The pattern of rank order of item recovery indicated that Date of Birth recovered consistently first. There was variability in the remaining items, however items reflecting long-term memory tended to recover second and items reflecting simple new learning followed. Recall of all three pictures reflecting complex new learning recovered last. The pattern of recovery of memory and new learning during PTA reflects a number of complex, inter-related variables including; the familiarity with the information, amount of rehearsal both before and since the accident and the number of cues available in the environment.
Factors that influence search termination decisions in free recall: an examination of response type and confidence.

PubMed

Unsworth, Nash; Brewer, Gene A; Spillers, Gregory J

2011-09-01

In three experiments search termination decisions were examined as a function of response type (correct vs. incorrect) and confidence. It was found that the time between the last retrieved item and the decision to terminate search (exit latency) was related to the type of response and confidence in the last item retrieved. Participants were willing to search longer when the last retrieved item was a correct item vs. an incorrect item and when the confidence was high in the last retrieved item. It was also found that the number of errors retrieved during the recall period was related to search termination decisions such that the more errors retrieved, the more likely participants were to terminate the search. Finally, it was found that knowledge of overall search set size influenced the time needed to search for items, but did not influence search termination decisions. Copyright © 2011 Elsevier B.V. All rights reserved.
Item response theory analysis of the life orientation test-revised: age and gender differential item functioning analyses.

PubMed

Steca, Patrizia; Monzani, Dario; Greco, Andrea; Chiesi, Francesca; Primi, Caterina

2015-06-01

This study is aimed at testing the measurement properties of the Life Orientation Test-Revised (LOT-R) for the assessment of dispositional optimism by employing item response theory (IRT) analyses. The LOT-R was administered to a large sample of 2,862 Italian adults. First, confirmatory factor analyses demonstrated the theoretical conceptualization of the construct measured by the LOT-R as a single bipolar dimension. Subsequently, IRT analyses for polytomous, ordered response category data were applied to investigate the items' properties. The equivalence of the items across gender and age was assessed by analyzing differential item functioning. Discrimination and severity parameters indicated that all items were able to distinguish people with different levels of optimism and adequately covered the spectrum of the latent trait. Additionally, the LOT-R appears to be gender invariant and, with minor exceptions, age invariant. Results provided evidence that the LOT-R is a reliable and valid measure of dispositional optimism. © The Author(s) 2014.
Using a Multivariate Multilevel Polytomous Item Response Theory Model to Study Parallel Processes of Change: The Dynamic Association between Adolescents' Social Isolation and Engagement with Delinquent Peers in the National Youth Survey

ERIC Educational Resources Information Center

Hsieh, Chueh-An; von Eye, Alexander A.; Maier, Kimberly S.

2010-01-01

The application of multidimensional item response theory models to repeated observations has demonstrated great promise in developmental research. It allows researchers to take into consideration both the characteristics of item response and measurement error in longitudinal trajectory analysis, which improves the reliability and validity of the…
Applying mixed methods to pretest the Pressure Ulcer Quality of Life (PU-QOL) instrument.

PubMed

Gorecki, C; Lamping, D L; Nixon, J; Brown, J M; Cano, S

2012-04-01

Pretesting is key in the development of patient-reported outcome (PRO) instruments. We describe a mixed-methods approach based on interviews and Rasch measurement methods in the pretesting of the Pressure Ulcer Quality of Life (PU-QOL) instrument. We used cognitive interviews to pretest the PU-QOL in 35 patients with pressure ulcers with the view to identifying problematic items, followed by Rasch analysis to examine response options, appropriateness of the item series and biases due to question ordering (item fit). We then compared findings in an interactive and iterative process to identify potential strengths and weaknesses of PU-QOL items, and guide decision-making about further revisions to items and design/layout. Although cognitive interviews largely supported items, they highlighted problems with layout, response options and comprehension. Findings from the Rasch analysis identified problems with response options through reversed thresholds. The use of a mixed-methods approach in pretesting the PU-QOL instrument proved beneficial for identifying problems with scale layout, response options and framing/wording of items. Rasch measurement methods are a useful addition to standard qualitative pretesting for evaluating strengths and weaknesses of early stage PRO instruments.
Response-restriction analysis: I. Assessment of activity preferences.

PubMed

Hanley, Gregory P; Iwata, Brian A; Lindberg, Jana S; Conners, Juliet

2003-01-01

We used procedures based on response-restriction (RR) analysis to assess vocational and leisure activity preferences for 3 adults with developmental disabilities. To increase the efficiency of the analysis relative to that reported in previous research, we used criteria that allowed activities to be restricted at the earliest point at which a preference could be determined. Results obtained across two consecutive RR assessments showed some variability in overall preference rankings but a high degree of consistency for highly ranked items. Finally, we compared results of the RR assessment with those of an extended free-operant assessment and found that the RR assessment yielded (a) more differentiated patterns of preference and (b) more complete information about engagement with all of the target activities.
HIV/AIDS knowledge among men who have sex with men: applying the item response theory.

PubMed

Gomes, Raquel Regina de Freitas Magalhães; Batista, José Rodrigues; Ceccato, Maria das Graças Braga; Kerr, Lígia Regina Franco Sansigolo; Guimarães, Mark Drew Crosland

2014-04-01

To evaluate the level of HIV/AIDS knowledge among men who have sex with men in Brazil using the latent trait model estimated by Item Response Theory. Multicenter, cross-sectional study, carried out in ten Brazilian cities between 2008 and 2009. Adult men who have sex with men were recruited (n = 3,746) through Respondent Driven Sampling. HIV/AIDS knowledge was ascertained through ten statements by face-to-face interview and latent scores were obtained through two-parameter logistic modeling (difficulty and discrimination) using Item Response Theory. Differential item functioning was used to examine each item characteristic curve by age and schooling. Overall, the HIV/AIDS knowledge scores using Item Response Theory did not exceed 6.0 (scale 0-10), with mean and median values of 5.0 (SD = 0.9) and 5.3, respectively, with 40.7% of the sample with knowledge levels below the average. Some beliefs still exist in this population regarding the transmission of the virus by insect bites, by using public restrooms, and by sharing utensils during meals. With regard to the difficulty and discrimination parameters, eight items were located below the mean of the scale and were considered very easy, and four items presented very low discrimination parameter (< 0.34). The absence of difficult items contributed to the inaccuracy of the measurement of knowledge among those with median level and above. Item Response Theory analysis, which focuses on the individual properties of each item, allows measures to be obtained that do not vary or depend on the questionnaire, which provides better ascertainment and accuracy of knowledge scores. Valid and reliable scales are essential for monitoring HIV/AIDS knowledge among the men who have sex with men population over time and in different geographic regions, and this psychometric model brings this advantage.
Calibrating Item Families and Summarizing the Results Using Family Expected Response Functions

ERIC Educational Resources Information Center

Sinharay, Sandip; Johnson, Matthew S.; Williamson, David M.

2003-01-01

Item families, which are groups of related items, are becoming increasingly popular in complex educational assessments. For example, in automatic item generation (AIG) systems, a test may consist of multiple items generated from each of a number of item models. Item calibration or scoring for such an assessment requires fitting models that can…
Using Rasch rating scale model to reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales in school children.

PubMed

Jafari, Peyman; Bagheri, Zahra; Ayatollahi, Seyyed Mohamad Taghi; Soltani, Zahra

2012-03-13

Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales. The PedsQL™ 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients.
A semi-parametric within-subject mixture approach to the analyses of responses and response times.

PubMed

Molenaar, Dylan; Bolsinova, Maria; Vermunt, Jeroen K

2018-05-01

In item response theory, modelling the item response times in addition to the item responses may improve the detection of possible between- and within-subject differences in the process that resulted in the responses. For instance, if respondents rely on rapid guessing on some items but not on all, the joint distribution of the responses and response times will be a multivariate within-subject mixture distribution. Suitable parametric methods to detect these within-subject differences have been proposed. In these approaches, a distribution needs to be assumed for the within-class response times. In this paper, it is demonstrated that these parametric within-subject approaches may produce false positives and biased parameter estimates if the assumption concerning the response time distribution is violated. A semi-parametric approach is proposed which resorts to categorized response times. This approach is shown to hardly produce false positives and parameter bias. In addition, the semi-parametric approach results in approximately the same power as the parametric approach. © 2017 The British Psychological Society.
Differential item functioning magnitude and impact measures from item response theory models.

PubMed

Kleinman, Marjorie; Teresi, Jeanne A

2016-01-01

Measures of magnitude and impact of differential item functioning (DIF) at the item and scale level, respectively are presented and reviewed in this paper. Most measures are based on item response theory models. Magnitude refers to item level effect sizes, whereas impact refers to differences between groups at the scale score level. Reviewed are magnitude measures based on group differences in the expected item scores and impact measures based on differences in the expected scale scores. The similarities among these indices are demonstrated. Various software packages are described that provide magnitude and impact measures, and new software presented that computes all of the available statistics conveniently in one program with explanations of their relationships to one another.
Integrating Entropy and Closed Frequent Pattern Mining for Social Network Modelling and Analysis

NASA Astrophysics Data System (ADS)

Adnan, Muhaimenul; Alhajj, Reda; Rokne, Jon

The recent increase in the explicitly available social networks has attracted the attention of the research community to investigate how it would be possible to benefit from such a powerful model in producing effective solutions for problems in other domains where the social network is implicit; we argue that social networks do exist around us but the key issue is how to realize and analyze them. This chapter presents a novel approach for constructing a social network model by an integrated framework that first preparing the data to be analyzed and then applies entropy and frequent closed patterns mining for network construction. For a given problem, we first prepare the data by identifying items and transactions, which arc the basic ingredients for frequent closed patterns mining. Items arc main objects in the problem and a transaction is a set of items that could exist together at one time (e.g., items purchased in one visit to the supermarket). Transactions could be analyzed to discover frequent closed patterns using any of the well-known techniques. Frequent closed patterns have the advantage that they successfully grab the inherent information content of the dataset and is applicable to a broader set of domains. Entropies of the frequent closed patterns arc used to keep the dimensionality of the feature vectors to a reasonable size; it is a kind of feature reduction process. Finally, we analyze the dynamic behavior of the constructed social network. Experiments were conducted on a synthetic dataset and on the Enron corpus email dataset. The results presented in the chapter show that social networks extracted from a feature set as frequent closed patterns successfully carry the community structure information. Moreover, for the Enron email dataset, we present an analysis to dynamically indicate the deviations from each user's individual and community profile. These indications of deviations can be very useful to identify unusual events.
Using Response-Time Constraints in Item Selection To Control for Differential Speededness in Computerized Adaptive Testing. LSAC Research Report Series.

ERIC Educational Resources Information Center

van der Linden, Wim J.; Scrams, David J.; Schnipke, Deborah L.

This paper proposes an item selection algorithm that can be used to neutralize the effect of time limits in computer adaptive testing. The method is based on a statistical model for the response-time distributions of the test takers on the items in the pool that is updated each time a new item has been administered. Predictions from the model are…
Mining Co-Location Patterns with Clustering Items from Spatial Data Sets

NASA Astrophysics Data System (ADS)

Zhou, G.; Li, Q.; Deng, G.; Yue, T.; Zhou, X.

2018-05-01

The explosive growth of spatial data and widespread use of spatial databases emphasize the need for the spatial data mining. Co-location patterns discovery is an important branch in spatial data mining. Spatial co-locations represent the subsets of features which are frequently located together in geographic space. However, the appearance of a spatial feature C is often not determined by a single spatial feature A or B but by the two spatial features A and B, that is to say where A and B appear together, C often appears. We note that this co-location pattern is different from the traditional co-location pattern. Thus, this paper presents a new concept called clustering terms, and this co-location pattern is called co-location patterns with clustering items. And the traditional algorithm cannot mine this co-location pattern, so we introduce the related concept in detail and propose a novel algorithm. This algorithm is extended by join-based approach proposed by Huang. Finally, we evaluate the performance of this algorithm.
Chronic disease management items in general practice: a population-based study of variation in claims by claimant characteristics.

PubMed

Douglas, Kirsty A; Yen, Laurann E; Korda, Rosemary J; Kljakovic, Marjan; Glasgow, Nicholas J

2011-08-15

To describe how Medical Benefits Schedule (MBS) chronic disease (CD) item claims vary by sociodemographic and health characteristics in people with heart disease, asthma or diabetes. A cross-sectional analysis of linked unit-level MBS and survey data from the first 102,934 participants enrolled in the 45 and Up Study, a large-scale cohort study in New South Wales, who completed the baseline survey between January 2006 and July 2008. Claim for any general practitioner CD item within 18 months before enrolment, ascertained from MBS records. The proportion of individuals making claims for MBS CD items was 18.5% for asthma, 22.3% for heart disease, and 44.9% for diabetes. Associations between participant characteristics and a claim for a CD item showed similar patterns across the three diseases. For heart disease and asthma, people most likely to claim a CD item were women, older, of low income and education levels, with multiple chronic conditions, fair or poor self-rated health, obesity and low physical activity levels. The pattern of claims was slightly different for participants with diabetes in that there was no significant association with number of chronic conditions, smoking or physical activity. Many individuals with self-reported CD do not claim CD items. People with diabetes and individuals with greatest need based on health, socioeconomic and lifestyle risk factors are the most likely to claim CD items.

Evaluation of the Multiple Sclerosis Walking Scale-12 (MSWS-12) in a Dutch sample: Application of item response theory.

PubMed

Mokkink, Lidwine Brigitta; Galindo-Garre, Francisca; Uitdehaag, Bernard Mj

2016-12-01

The Multiple Sclerosis Walking Scale-12 (MSWS-12) measures walking ability from the patients' perspective. We examined the quality of the MSWS-12 using an item response theory model, the graded response model (GRM). A total of 625 unique Dutch multiple sclerosis (MS) patients were included. After testing for unidimensionality, monotonicity, and absence of local dependence, a GRM was fit and item characteristics were assessed. Differential item functioning (DIF) for the variables gender, age, duration of MS, type of MS and severity of MS, reliability, total test information, and standard error of the trait level (θ) were investigated. Confirmatory factor analysis showed a unidimensional structure of the 12 items of the scale, explaining 88% of the variance. Item 2 did not fit into the GRM model. Reliability was 0.93. Items 8 and 9 (of the 11 and 12 item version respectively) showed DIF on the variable severity, based on the Expanded Disability Status Scale (EDSS). However, the EDSS is strongly related to the content of both items. Our results confirm the good quality of the MSWS-12. The trait level (θ) scores and item parameters of both the 12- and 11-item versions were highly comparable, although we do not suggest to change the content of the MSWS-12. © The Author(s), 2016.
Location contexts of user check-ins to model urban geo life-style patterns.

PubMed

Hasan, Samiul; Ukkusuri, Satish V

2015-01-01

Geo-location data from social media offers us information, in new ways, to understand people's attitudes and interests through their activity choices. In this paper, we explore the idea of inferring individual life-style patterns from activity-location choices revealed in social media. We present a model to understand life-style patterns using the contextual information (e. g. location categories) of user check-ins. Probabilistic topic models are developed to infer individual geo life-style patterns from two perspectives: i) to characterize the patterns of user interests to different types of places and ii) to characterize the patterns of user visits to different neighborhoods. The method is applied to a dataset of Foursquare check-ins of the users from New York City. The co-existence of several location contexts and the corresponding probabilities in a given pattern provide useful information about user interests and choices. It is found that geo life-style patterns have similar items-either nearby neighborhoods or similar location categories. The semantic and geographic proximity of the items in a pattern reflects the hidden regularity in user preferences and location choice behavior.
The Meal Pattern Questionnaire: A psychometric evaluation using the Eating Disorder Examination.

PubMed

Alfonsson, S; Sewall, A; Lidholm, H; Hursti, T

2016-04-01

Meal pattern is an important variable in both obesity treatment and treatment for eating disorders. Momentary assessment and eating diaries are highly valid measurement methods but often cumbersome and not always feasible to use in clinical practice. The aim of this study was to design and evaluate a self-report instrument for measuring meal patterns. The Pattern of eating item from the Eating Disorder Examination (EDE) interview was adapted to self-report format to follow the same overall structure as the Eating Disorder Examination Questionnaire. The new instrument was named the Meal Patterns Questionnaire (MPQ) and was compared with the EDE in a student sample (n=105) and an obese sample (n=111). The individual items of the MPQ and the EDE showed moderate to high correlations (rho=.63-89) in the two samples. Significant differences between the MPQ and EDE were only found for two items in the obese sample. The total scores correlated to a high degree (rho=.87/.74) in both samples and no significant differences were found in this variable. The MPQ can provide an overall picture of a person's eating patterns and is a valid way to collect data regarding meal patterns. The MPQ may be a useable tool in clinical practice and research studies when more extensive instruments cannot be used. Future studies should evaluate the MPQ in diverse cultural populations and with more ecological assessment methods. Copyright © 2015 Elsevier Ltd. All rights reserved.
Using Data Augmentation and Markov Chain Monte Carlo for the Estimation of Unfolding Response Models

ERIC Educational Resources Information Center

Johnson, Matthew S.; Junker, Brian W.

2003-01-01

Unfolding response models, a class of item response theory (IRT) models that assume a unimodal item response function (IRF), are often used for the measurement of attitudes. Verhelst and Verstralen (1993)and Andrich and Luo (1993) independently developed unfolding response models by relating the observed responses to a more common monotone IRT…
A Study of Bayesian Estimation and Comparison of Response Time Models in Item Response Theory

ERIC Educational Resources Information Center

Suh, Hongwook

2010-01-01

Response time has been regarded as an important source for investigating the relationship between human performance and response speed. It is important to examine the relationship between response time and item characteristics, especially in the perspective of the relationship between response time and various factors that affect examinee's…
North American Veterinary Licensing Examination pacing study.

PubMed

Subhiyah, Raja G; Boyce, John R

2010-01-01

The National Board of Veterinary Medical Examiners was interested in the possible effects of word count on the outcomes of the North American Veterinary Licensing Examination. In this study, the authors investigated the effects of increasing word count on the pacing of examinees during each section of the examination and on the performance of examinees on the items. Specifically, the authors analyzed the effect of item word count on the average time spent on each item within a section of the examination, the average number of items omitted at the end of a section, and the average difficulty of items as a function of presentation order. The average word count per item increased from 2001 to 2008. As expected, there was a relationship between word count and time spent on the item. No significant relationship was found between word count and item difficulty, and an analysis of omitted items and pacing patterns showed no indication of overall pacing problems.
Item development process and analysis of 50 case-based items for implementation on the Korean Nursing Licensing Examination.

PubMed

Park, In Sook; Suh, Yeon Ok; Park, Hae Sook; Kang, So Young; Kim, Kwang Sung; Kim, Gyung Hee; Choi, Yeon-Hee; Kim, Hyun-Ju

2017-01-01

The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge. We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory. A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%-100% for 12 items; 60%-80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below -2.0), easy for 8 items (-2.0 to -0.5), medium for 6 items (-0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate. We surmised that the low reliability of the mock examination was influenced by the timing of the test for the examinees and the inappropriate difficulty of the items. Our study suggested a methodology for the development of future case-based items for the Korean Nursing Licensing Examination.
Fitting measurement models to vocational interest data: are dominance models ideal?

PubMed

Tay, Louis; Drasgow, Fritz; Rounds, James; Williams, Bruce A

2009-09-01

In this study, the authors examined the item response process underlying 3 vocational interest inventories: the Occupational Preference Inventory (C.-P. Deng, P. I. Armstrong, & J. Rounds, 2007), the Interest Profiler (J. Rounds, T. Smith, L. Hubert, P. Lewis, & D. Rivkin, 1999; J. Rounds, C. M. Walker, et al., 1999), and the Interest Finder (J. E. Wall & H. E. Baker, 1997; J. E. Wall, L. L. Wise, & H. E. Baker, 1996). Item response theory (IRT) dominance models, such as the 2-parameter and 3-parameter logistic models, assume that item response functions (IRFs) are monotonically increasing as the latent trait increases. In contrast, IRT ideal point models, such as the generalized graded unfolding model, have IRFs that peak where the latent trait matches the item. Ideal point models are expected to fit better because vocational interest inventories ask about typical behavior, as opposed to requiring maximal performance. Results show that across all 3 interest inventories, the ideal point model provided better descriptions of the response process. The importance of specifying the correct item response model for precise measurement is discussed. In particular, scores computed by a dominance model were shown to be sometimes illogical: individuals endorsing mostly realistic or mostly social items were given similar scores, whereas scores based on an ideal point model were sensitive to which type of items respondents endorsed.
Measuring ability to assess claims about treatment effects: a latent trait analysis of items from the 'Claim Evaluation Tools' database using Rasch modelling.

PubMed

Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D

2017-05-25

The Claim Evaluation Tools database contains multiple-choice items for measuring people's ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Most of the items conformed well to the Rasch model's expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Assessing items on the SF-8 Japanese version for health-related quality of life: a psychometric analysis based on the nominal categories model of item response theory.

PubMed

Tokuda, Yasuharu; Okubo, Tomoya; Ohde, Sachiko; Jacobs, Joshua; Takahashi, Osamu; Omata, Fumio; Yanai, Haruo; Hinohara, Shigeaki; Fukui, Tsuguya

2009-06-01

The Short Form-8 (SF-8) questionnaire is a commonly used 8-item instrument of health-related quality of life (QOL) and provides a health profile of eight subdimensions. Our aim was to examine the psychometric properties of the Japanese version of the SF-8 instrument using methodology based on nominal categories model. Using data from an adjusted random sample from a nationally representative panel, the nominal categories modeling was applied to SF-8 items to characterize coverage of the latent trait (theta). Probabilities for response choices were described as functions on the latent trait. Information functions were generated based on the estimated item parameters. A total of 3344 participants (53%, women; median age, 35 years) provided responses. One factor was retained (eigenvalue, 4.65; variance proportion of 0.58) and used as theta. All item response category characteristic curves satisfied the monotonicity assumption in accurate order with corresponding ordinal responses. Four items (general health, bodily pain, vitality, and mental health) cover most of the spectrum of theta, while the other four items (physical function, role physical [role limitations because of physical health], social functioning, and role emotional [role limitations because of emotional problems] ) cover most of the negative range of theta. Information function for all items combined peaked at -0.7 of theta (information = 18.5) and decreased with increasing theta. The SF-8 instrument performs well among those with poor QOL across the continuum of the latent trait and thus can recognize more effectively persons with relatively poorer QOL than those with relatively better QOL.
Update on the Child's Challenging Behaviour Scale following evaluation using Rasch analysis.

PubMed

Bourke-Taylor, H M; Pallant, J F; Law, M

2014-03-01

The Child's Challenging Behaviour Scale (CCBS) was designed to measure a mother's rating of her child's challenging behaviours. The CCBS was initially developed for mothers of school-aged children with developmental disability and has previously been shown to have good psychometric properties using classical test theory techniques. The aim of this study was to use Rasch analysis to fully evaluate all aspects of the scale, including response format, item fit, dimensionality and targeting. The sample consisted of 152 mothers of a school-aged child (aged 5-18 years) with a disability. Mothers were recruited via websites and mail-out newsletters through not-for-profit organizations that supported families with disabilities. Respondents completed a survey which included the 11 items of the CCBS. Rasch analysis was conducted on these responses using the RUMM2030 package. Rasch analysis of the CCBS revealed serious threshold disordering for nine of the 11 items, suggesting problems with the 5-point response format used for the scale. The neutral midpoint of the response format was subsequently removed to create a 4-point scale. High levels of local dependency were detected among two pairs of items, resulting in the removal of two items (item 7 and item 1). The final nine-item version of the scale (CCBS Version 2) was unidimensional, well targeted, showed good fit to the Rasch model, and strong internal consistency. To achieve fit to the Rasch model it was necessary to make two modifications to the CCBS scale. The resulting nine-item scale with a 4-point response format showed excellent psychometric properties, supporting its internal validity. © 2013 John Wiley & Sons Ltd.
Using Explanatory Item Response Models to Evaluate Complex Scientific Tasks Designed for the Next Generation Science Standards

NASA Astrophysics Data System (ADS)

Chiu, Tina

This dissertation includes three studies that analyze a new set of assessment tasks developed by the Learning Progressions in Middle School Science (LPS) Project. These assessment tasks were designed to measure science content knowledge on the structure of matter domain and scientific argumentation, while following the goals from the Next Generation Science Standards (NGSS). The three studies focus on the evidence available for the success of this design and its implementation, generally labelled as "validity" evidence. I use explanatory item response models (EIRMs) as the overarching framework to investigate these assessment tasks. These models can be useful when gathering validity evidence for assessments as they can help explain student learning and group differences. In the first study, I explore the dimensionality of the LPS assessment by comparing the fit of unidimensional, between-item multidimensional, and Rasch testlet models to see which is most appropriate for this data. By applying multidimensional item response models, multiple relationships can be investigated, and in turn, allow for a more substantive look into the assessment tasks. The second study focuses on person predictors through latent regression and differential item functioning (DIF) models. Latent regression models show the influence of certain person characteristics on item responses, while DIF models test whether one group is differentially affected by specific assessment items, after conditioning on latent ability. Finally, the last study applies the linear logistic test model (LLTM) to investigate whether item features can help explain differences in item difficulties.
Contingent capture of involuntary visual attention interferes with detection of auditory stimuli

PubMed Central

Kamke, Marc R.; Harris, Jill

2014-01-01

The involuntary capture of attention by salient visual stimuli can be influenced by the behavioral goals of an observer. For example, when searching for a target item, irrelevant items that possess the target-defining characteristic capture attention more strongly than items not possessing that feature. Such contingent capture involves a shift of spatial attention toward the item with the target-defining characteristic. It is not clear, however, if the associated decrements in performance for detecting the target item are entirely due to involuntary orienting of spatial attention. To investigate whether contingent capture also involves a non-spatial interference, adult observers were presented with streams of visual and auditory stimuli and were tasked with simultaneously monitoring for targets in each modality. Visual and auditory targets could be preceded by a lateralized visual distractor that either did, or did not, possess the target-defining feature (a specific color). In agreement with the contingent capture hypothesis, target-colored distractors interfered with visual detection performance (response time and accuracy) more than distractors that did not possess the target color. Importantly, the same pattern of results was obtained for the auditory task: visual target-colored distractors interfered with sound detection. The decrement in auditory performance following a target-colored distractor suggests that contingent capture involves a source of processing interference in addition to that caused by a spatial shift of attention. Specifically, we argue that distractors possessing the target-defining characteristic enter a capacity-limited, serial stage of neural processing, which delays detection of subsequently presented stimuli regardless of the sensory modality. PMID:24920945
Contingent capture of involuntary visual attention interferes with detection of auditory stimuli.

PubMed

Kamke, Marc R; Harris, Jill

2014-01-01

The involuntary capture of attention by salient visual stimuli can be influenced by the behavioral goals of an observer. For example, when searching for a target item, irrelevant items that possess the target-defining characteristic capture attention more strongly than items not possessing that feature. Such contingent capture involves a shift of spatial attention toward the item with the target-defining characteristic. It is not clear, however, if the associated decrements in performance for detecting the target item are entirely due to involuntary orienting of spatial attention. To investigate whether contingent capture also involves a non-spatial interference, adult observers were presented with streams of visual and auditory stimuli and were tasked with simultaneously monitoring for targets in each modality. Visual and auditory targets could be preceded by a lateralized visual distractor that either did, or did not, possess the target-defining feature (a specific color). In agreement with the contingent capture hypothesis, target-colored distractors interfered with visual detection performance (response time and accuracy) more than distractors that did not possess the target color. Importantly, the same pattern of results was obtained for the auditory task: visual target-colored distractors interfered with sound detection. The decrement in auditory performance following a target-colored distractor suggests that contingent capture involves a source of processing interference in addition to that caused by a spatial shift of attention. Specifically, we argue that distractors possessing the target-defining characteristic enter a capacity-limited, serial stage of neural processing, which delays detection of subsequently presented stimuli regardless of the sensory modality.
Linking Measures of Adult Nicotine Dependence to a Common Latent Continuum and a Comparison with Adolescent Patterns

PubMed Central

Strong, David R.; Schonbrun, Yael Chatav; Schaffran, Christine; Griesler, Pamela C.; Kandel, Denise

2012-01-01

Background An ongoing debate regarding the nature of Nicotine Dependence (ND) is whether the same instrument can be applied to measure ND among adults and adolescents. Using a hierarchical item response model (IRM), we examined evidence for a common continuum underlying ND symptoms among adults and adolescents. Method The analyses are based on two waves of interviews with subsamples of parents and adolescents from a multi-ethnic longitudinal cohort of 1,039 6th–10th graders from the Chicago Public Schools (CPS). Adults and adolescents who reported smoking cigarettes the last 30 days prior to waves 3 and 5 completed three common instruments measuring ND symptoms and one item measuring loss of autonomy. Results A stable continuum of ND, first identified among adolescents, was replicated among adults. However, some symptoms, such as tolerance and withdrawal, differed markedly across adults and adolescents. The majority of mFTQ items were observed within the highest levels of ND, the NDSS items within the lowest levels, and the DSM-IV items were arrayed in the middle and upper third of the continuum of dependence severity. Loss of Autonomy was positioned at the lower end of the continuum. We propose a ten-symptom measure of ND for adolescents and adults. Conclusions Despite marked differences in the relative severity of specific ND symptoms in each group, common instrumentation of ND can apply to adults and adolescents. The results increase confidence in the ability to describe phenotypic heterogeneity in ND across important developmental periods. PMID:21855236
Can you ask? We just did! Assessing sexual function and concerns in patients presenting for initial gynecologic oncology consultation

PubMed Central

Kennedy, Vanessa; Abramsohn, Emily; Makelarski, Jennifer; Barber, Rachel; Wroblewski, Kristen; Tenney, Meaghan; Lee, Nita Karnik; Yamada, S. Diane; Lindau, Stacy Tessler

2015-01-01

Objectives To describe patterns of response to, and assess sexual function and activity elicited by, a self-administered assessment incorporated into a new patient intake form for gynecologic oncology consultation. Methods A cross-sectional study of patients presenting to a single urban academic medical center between January 2010 and September 2012. New patients completed a self-administered intake form, including six brief sexual activity and function items. These items, along with abstracted medical record data, were descriptively analyzed. Logistic regression was used to assess the association between sexual activity and function and disease status, adjusting for age. Results Median age was 50 years (range 18–91, N = 499); more than half had a final diagnosis of cancer. Most patients completed all sex-related items on the intake form; 98% answered at least one. Among patients who were sexually active in the prior 12 months (57% with cancer, 64% with benign disease), 52% indicated on the intake form having, during that period, a sexual problem lasting several months or more. Of these, 15% had physician documentation of the sexual problem. Eighteen women were referred for care. Providers reported no patient complaints about the inclusion of sexual items on the intake form. Conclusions Nearly all new patients presenting for gynecologic oncology consultation answered self-administered items to assess sexual activity and function. Further study is needed to determine the role of pretreatment identification of sexual function concerns in improving sexual outcomes associated with cancer diagnosis and treatment. PMID:25582823
Can you ask? We just did! Assessing sexual function and concerns in patients presenting for initial gynecologic oncology consultation.

PubMed

Kennedy, Vanessa; Abramsohn, Emily; Makelarski, Jennifer; Barber, Rachel; Wroblewski, Kristen; Tenney, Meaghan; Lee, Nita Karnik; Yamada, S Diane; Lindau, Stacy Tessler

2015-04-01

To describe patterns of response to, and assess sexual function and activity elicited by, a self-administered assessment incorporated into a new patient intake form for gynecologic oncology consultation. A cross-sectional study of patients presenting to a single urban academic medical center between January 2010 and September 2012. New patients completed a self-administered intake form, including six brief sexual activity and function items. These items, along with abstracted medical record data, were descriptively analyzed. Logistic regression was used to assess the association between sexual activity and function and disease status, adjusting for age. Median age was 50 years (range 18-91, N=499); more than half had a final diagnosis of cancer. Most patients completed all sex-related items on the intake form; 98% answered at least one. Among patients who were sexually active in the prior 12 months (57% with cancer, 64% with benign disease), 52% indicated on the intake form having, during that period, a sexual problem lasting several months or more. Of these, 15% had physician documentation of the sexual problem. Eighteen women were referred for care. Providers reported no patient complaints about the inclusion of sexual items on the intake form. Nearly all new patients presenting for gynecologic oncology consultation answered self-administered items to assess sexual activity and function. Further study is needed to determine the role of pre-treatment identification of sexual function concerns in improving sexual outcomes associated with cancer diagnosis and treatment. Copyright © 2015 Elsevier Inc. All rights reserved.
A Stepwise Test Characteristic Curve Method to Detect Item Parameter Drift

ERIC Educational Resources Information Center

Guo, Rui; Zheng, Yi; Chang, Hua-Hua

2015-01-01

An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the…
Optimal Item Selection with Credentialing Examinations.

ERIC Educational Resources Information Center

Hambleton, Ronald K.; And Others

The study compared two promising item response theory (IRT) item-selection methods, optimal and content-optimal, with two non-IRT item selection methods, random and classical, for use in fixed-length certification exams. The four methods were used to construct 20-item exams from a pool of approximately 250 items taken from a 1985 certification…
An HIV/AIDS Knowledge Scale for Adolescents: Item Response Theory Analyses Based on Data from a Study in South Africa and Tanzania

ERIC Educational Resources Information Center

Aaro, Leif E.; Breivik, Kyrre; Klepp, Knut-Inge; Kaaya, Sylvia; Onya, Hans E.; Wubs, Annegreet; Helleve, Arnfinn; Flisher, Alan J.

2011-01-01

A 14-item human immunodeficiency virus/acquired immunodeficiency syndrome knowledge scale was used among school students in 80 schools in 3 sites in Sub-Saharan Africa (Cape Town and Mankweng, South Africa, and Dar es Salaam, Tanzania). For each item, an incorrect or don't know response was coded as 0 and correct response as 1. Exploratory factor…

Using SAS PROC MCMC for Item Response Theory Models

PubMed Central

Samonte, Kelli

2014-01-01

Interest in using Bayesian methods for estimating item response theory models has grown at a remarkable rate in recent years. This attentiveness to Bayesian estimation has also inspired a growth in available software such as WinBUGS, R packages, BMIRT, MPLUS, and SAS PROC MCMC. This article intends to provide an accessible overview of Bayesian methods in the context of item response theory to serve as a useful guide for practitioners in estimating and interpreting item response theory (IRT) models. Included is a description of the estimation procedure used by SAS PROC MCMC. Syntax is provided for estimation of both dichotomous and polytomous IRT models, as well as a discussion on how to extend the syntax to accommodate more complex IRT models. PMID:29795834
Effects of Aging and IQ on Item and Associative Memory

ERIC Educational Resources Information Center

Ratcliff, Roger; Thapar, Anjali; McKoon, Gail

2011-01-01

The effects of aging and IQ on performance were examined in 4 memory tasks: item recognition, associative recognition, cued recall, and free recall. For item and associative recognition, accuracy and the response time (RT) distributions for correct and error responses were explained by Ratcliff's (1978) diffusion model at the level of individual…
Item Response Theory Modeling of the Philadelphia Naming Test

ERIC Educational Resources Information Center

Fergadiotis, Gerasimos; Kellough, Stacey; Hula, William D.

2015-01-01

Purpose: In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating…
An Evaluation of Three Approximate Item Response Theory Models for Equating Test Scores.

ERIC Educational Resources Information Center

Marco, Gary L.; And Others

Three item response models were evaluated for estimating item parameters and equating test scores. The models, which approximated the traditional three-parameter model, included: (1) the Rasch one-parameter model, operationalized in the BICAL computer program; (2) an approximate three-parameter logistic model based on coarse group data divided…
Empirical Histograms in Item Response Theory with Ordinal Data

ERIC Educational Resources Information Center

Woods, Carol M.

2007-01-01

The purpose of this research is to describe, test, and illustrate a new implementation of the empirical histogram (EH) method for ordinal items. The EH method involves the estimation of item response model parameters simultaneously with the approximation of the distribution of the random latent variable (theta) as a histogram. Software for the EH…
Discussion of David Thissen's Bad Questions: An Essay Involving Item Response Theory

ERIC Educational Resources Information Center

Ackerman, Terry

2016-01-01

In this commentary, University of North Carolina's associate dean of research and assessment at the School of Education Terry Ackerman poses questions and shares his thoughts on David Thissen's essay, "Bad Questions: An Essay Involving Item Response Theory" (this issue). Ackerman begins by considering the two purposes of Item Response…
Data Visualization of Item-Total Correlation by Median Smoothing

ERIC Educational Resources Information Center

Yu, Chong Ho; Douglas, Samantha; Lee, Anna; An, Min

2016-01-01

This paper aims to illustrate how data visualization could be utilized to identify errors prior to modeling, using an example with multi-dimensional item response theory (MIRT). MIRT combines item response theory and factor analysis to identify a psychometric model that investigates two or more latent traits. While it may seem convenient to…
Some Issues in Item Response Theory: Dimensionality Assessment and Models for Guessing

ERIC Educational Resources Information Center

Smith, Jessalyn

2009-01-01

Currently, standardized tests are widely used as a method to measure how well schools and students meet academic standards. As a result, measurement issues have become an increasingly popular topic of study. Unidimensional item response models are used to model latent abilities and specific item characteristics. This class of models makes…
The Definition of Difficulty and Discrimination for Multidimensional Item Response Theory Models.

ERIC Educational Resources Information Center

Reckase, Mark D.; McKinley, Robert L.

A study was undertaken to develop guidelines for the interpretation of the parameters of three multidimensional item response theory models and to determine the relationship between the parameters and traditional concepts of item difficulty and discrimination. The three models considered were multidimensional extensions of the one-, two-, and…
Measurement Properties of Two Innovative Item Formats in a Computer-Based Test

ERIC Educational Resources Information Center

Wan, Lei; Henly, George A.

2012-01-01

Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…
Semiparametric Item Response Functions in the Context of Guessing

ERIC Educational Resources Information Center

Falk, Carl F.; Cai, Li

2016-01-01

We present a logistic function of a monotonic polynomial with a lower asymptote, allowing additional flexibility beyond the three-parameter logistic model. We develop a maximum marginal likelihood-based approach to estimate the item parameters. The new item response model is demonstrated on math assessment data from a state, and a computationally…
A Bayesian Beta-Mixture Model for Nonparametric IRT (BBM-IRT)

ERIC Educational Resources Information Center

Arenson, Ethan A.; Karabatsos, George

2017-01-01

Item response models typically assume that the item characteristic (step) curves follow a logistic or normal cumulative distribution function, which are strictly monotone functions of person test ability. Such assumptions can be overly-restrictive for real item response data. We propose a simple and more flexible Bayesian nonparametric IRT model…
Nonparametric Item Response Curve Estimation with Correction for Measurement Error

ERIC Educational Resources Information Center

Guo, Hongwen; Sinharay, Sandip

2011-01-01

Nonparametric or kernel regression estimation of item response curves (IRCs) is often used in item analysis in testing programs. These estimates are biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. Accuracy of this estimation is a concern theoretically and operationally.…
A New Procedure for Detection of Students' Rapid Guessing Responses Using Response Time

ERIC Educational Resources Information Center

Guo, Hongwen; Rios, Joseph A.; Haberman, Shelby; Liu, Ou Lydia; Wang, Jing; Paek, Insu

2016-01-01

Unmotivated test takers using rapid guessing in item responses can affect validity studies and teacher and institution performance evaluation negatively, making it critical to identify these test takers. The authors propose a new nonparametric method for finding response-time thresholds for flagging item responses that result from rapid-guessing…
Item Analyses of Memory Differences

PubMed Central

Salthouse, Timothy A.

2017-01-01

Objective Although performance on memory and other cognitive tests is usually assessed with a score aggregated across multiple items, potentially valuable information is also available at the level of individual items. Method The current study illustrates how analyses of variance with item as one of the factors, and memorability analyses in which item accuracy in one group is plotted as a function of item accuracy in another group, can provide a more detailed characterization of the nature of group differences in memory. Data are reported for two memory tasks, word recall and story memory, across age, ability, repetition, delay, and longitudinal contrasts. Results The item-level analyses revealed evidence for largely uniform differences across items in the age, ability, and longitudinal contrasts, but differential patterns across items in the repetition contrast, and unsystematic item relations in the delay contrast. Conclusion Analyses at the level of individual items have the potential to indicate the manner by which group differences in the aggregate test score are achieved. PMID:27618285
Effects of Anchor Item Methods on the Detection of Differential Item Functioning within the Family of Rasch Models

ERIC Educational Resources Information Center

Wang, Wen-Chung

2004-01-01

Scale indeterminacy in analysis of differential item functioning (DIF) within the framework of item response theory can be resolved by imposing 3 anchor item methods: the equal-mean-difficulty method, the all-other anchor item method, and the constant anchor item method. In this article, applicability and limitations of these 3 methods are…
Examining Differential Item Functions of Different Item Ordered Test Forms According to Item Difficulty Levels

ERIC Educational Resources Information Center

Çokluk, Ömay; Gül, Emrah; Dogan-Gül, Çilem

2016-01-01

The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the…
Extreme Response Style: Which Model Is Best?

ERIC Educational Resources Information Center

Leventhal, Brian

2017-01-01

More robust and rigorous psychometric models, such as multidimensional Item Response Theory models, have been advocated for survey applications. However, item responses may be influenced by construct-irrelevant variance factors such as preferences for extreme response options. Through empirical and simulation methods, this study evaluates the use…
Preschool children's interests in science

NASA Astrophysics Data System (ADS)

Coulson, R. I.

1991-12-01

Studies of children's attitudes towards science indicate that a tendency for girls and boys to have different patterns of interest in science is established by upper primary school level. It is not know when these interest patterns develop. This paper presents the results of part of a project designed to investigate preschool children's interests in science. Individual 4 5 year-old children were asked to say what they would prefer to do from each of a series of paired drawings showing either a science and a non-science activity, or activities from two different areas of science. Girls and boys were very similar in their overall patterns of choice for science and non-science items. Within science, the average number of physical science items chosen by boys was significantly greater than the average number chosen by girls (p=.026). Girls tended to choose more biology items than did boys, but this difference was not quite significant at the .05 level (p=.054). The temporal stability of these choices was explored.
Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers

PubMed Central

2012-01-01

Background Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related) Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Methods Scalability of data from 1) a cross-sectional health survey (the Scottish Health Education Population Survey) and 2) a general population birth cohort study (the National Child Development Study) illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. Results and conclusions After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items) we show that all items from the 12-item General Health Questionnaire (GHQ-12) – when binary scored – were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech’s “well-being” and “distress” clinical scales). An illustration of ordinal item analysis confirmed that all 14 positively worded items of the Warwick-Edinburgh Mental Well-being Scale (WEMWBS) met criteria for the monotone homogeneity model but four items violated double monotonicity with respect to a single underlying dimension. Software availability and commands used to specify unidimensionality and reliability analysis and graphical displays for diagnosing monotone homogeneity and double monotonicity are discussed, with an emphasis on current implementations in freeware. PMID:22686586

Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers.

PubMed

Stochl, Jan; Jones, Peter B; Croudace, Tim J

2012-06-11

Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related) Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Scalability of data from 1) a cross-sectional health survey (the Scottish Health Education Population Survey) and 2) a general population birth cohort study (the National Child Development Study) illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items) we show that all items from the 12-item General Health Questionnaire (GHQ-12)--when binary scored--were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech's "well-being" and "distress" clinical scales). An illustration of ordinal item analysis confirmed that all 14 positively worded items of the Warwick-Edinburgh Mental Well-being Scale (WEMWBS) met criteria for the monotone homogeneity model but four items violated double monotonicity with respect to a single underlying dimension.Software availability and commands used to specify unidimensionality and reliability analysis and graphical displays for diagnosing monotone homogeneity and double monotonicity are discussed, with an emphasis on current implementations in freeware.
Distinguishing Fast and Slow Processes in Accuracy - Response Time Data

PubMed Central

Coomans, Frederik; Hofman, Abe; Brinkhuis, Matthieu; van der Maas, Han L. J.; Maris, Gunter

2016-01-01

We investigate the relation between speed and accuracy within problem solving in its simplest non-trivial form. We consider tests with only two items and code the item responses in two binary variables: one indicating the response accuracy, and one indicating the response speed. Despite being a very basic setup, it enables us to study item pairs stemming from a broad range of domains such as basic arithmetic, first language learning, intelligence-related problems, and chess, with large numbers of observations for every pair of problems under consideration. We carry out a survey over a large number of such item pairs and compare three types of psychometric accuracy-response time models present in the literature: two ‘one-process’ models, the first of which models accuracy and response time as conditionally independent and the second of which models accuracy and response time as conditionally dependent, and a ‘two-process’ model which models accuracy contingent on response time. We find that the data clearly violates the restrictions imposed by both one-process models and requires additional complexity which is parsimoniously provided by the two-process model. We supplement our survey with an analysis of the erroneous responses for an example item pair and demonstrate that there are very significant differences between the types of errors in fast and slow responses. PMID:27167518
An introduction to Item Response Theory and Rasch Analysis of the Eating Assessment Tool (EAT-10).

PubMed

Kean, Jacob; Brodke, Darrel S; Biber, Joshua; Gross, Paul

2018-03-01

Item response theory has its origins in educational measurement and is now commonly applied in health-related measurement of latent traits, such as function and symptoms. This application is due in large part to gains in the precision of measurement attributable to item response theory and corresponding decreases in response burden, study costs, and study duration. The purpose of this paper is twofold: introduce basic concepts of item response theory and demonstrate this analytic approach in a worked example, a Rasch model (1PL) analysis of the Eating Assessment Tool (EAT-10), a commonly used measure for oropharyngeal dysphagia. The results of the analysis were largely concordant with previous studies of the EAT-10 and illustrate for brain impairment clinicians and researchers how IRT analysis can yield greater precision of measurement.
Item Banks for Measuring Emotional Distress From the Patient-Reported Outcomes Measurement Information System (PROMIS®): Depression, Anxiety, and Anger

PubMed Central

Pilkonis, Paul A.; Choi, Seung W.; Reise, Steven P.; Stover, Angela M.; Riley, William T.; Cella, David

2011-01-01

The authors report on the development and calibration of item banks for depression, anxiety, and anger as part of the Patient-Reported Outcomes Measurement Information System (PROMIS®). Comprehensive literature searches yielded an initial bank of 1,404 items from 305 instruments. After qualitative item analysis (including focus groups and cognitive interviewing), 168 items (56 for each construct) were written in a first person, past tense format with a 7-day time frame and five response options reflecting frequency. The calibration sample included nearly 15,000 respondents. Final banks of 28, 29, and 29 items were calibrated for depression, anxiety, and anger, respectively, using item response theory. Test information curves showed that the PROMIS item banks provided more information than conventional measures in a range of severity from approximately −1 to +3 standard deviations (with higher scores indicating greater distress). Short forms consisting of seven to eight items provided information comparable to legacy measures containing more items. PMID:21697139
Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS®): depression, anxiety, and anger.

PubMed

Pilkonis, Paul A; Choi, Seung W; Reise, Steven P; Stover, Angela M; Riley, William T; Cella, David

2011-09-01

The authors report on the development and calibration of item banks for depression, anxiety, and anger as part of the Patient-Reported Outcomes Measurement Information System (PROMIS®). Comprehensive literature searches yielded an initial bank of 1,404 items from 305 instruments. After qualitative item analysis (including focus groups and cognitive interviewing), 168 items (56 for each construct) were written in a first person, past tense format with a 7-day time frame and five response options reflecting frequency. The calibration sample included nearly 15,000 respondents. Final banks of 28, 29, and 29 items were calibrated for depression, anxiety, and anger, respectively, using item response theory. Test information curves showed that the PROMIS item banks provided more information than conventional measures in a range of severity from approximately -1 to +3 standard deviations (with higher scores indicating greater distress). Short forms consisting of seven to eight items provided information comparable to legacy measures containing more items.
Harmonizing Measures of Cognitive Performance Across International Surveys of Aging Using Item Response Theory.

PubMed

Chan, Kitty S; Gross, Alden L; Pezzin, Liliana E; Brandt, Jason; Kasper, Judith D

2015-12-01

To harmonize measures of cognitive performance using item response theory (IRT) across two international aging studies. Data for persons ≥65 years from the Health and Retirement Study (HRS, N = 9,471) and the English Longitudinal Study of Aging (ELSA, N = 5,444). Cognitive performance measures varied (HRS fielded 25, ELSA 13); 9 were in common. Measurement precision was examined for IRT scores based on (a) common items, (b) common items adjusted for differential item functioning (DIF), and (c) DIF-adjusted all items. Three common items (day of date, immediate word recall, and delayed word recall) demonstrated DIF by survey. Adding survey-specific items improved precision but mainly for HRS respondents at lower cognitive levels. IRT offers a feasible strategy for harmonizing cognitive performance measures across other surveys and for other multi-item constructs of interest in studies of aging. Practical implications depend on sample distribution and the difficulty mix of in-common and survey-specific items. © The Author(s) 2015.
Item response theory, computerized adaptive testing, and PROMIS: assessment of physical function.

PubMed

Fries, James F; Witter, James; Rose, Matthias; Cella, David; Khanna, Dinesh; Morgan-DeWitt, Esi

2014-01-01

Patient-reported outcome (PRO) questionnaires record health information directly from research participants because observers may not accurately represent the patient perspective. Patient-reported Outcomes Measurement Information System (PROMIS) is a US National Institutes of Health cooperative group charged with bringing PRO to a new level of precision and standardization across diseases by item development and use of item response theory (IRT). With IRT methods, improved items are calibrated on an underlying concept to form an item bank for a "domain" such as physical function (PF). The most informative items can be combined to construct efficient "instruments" such as 10-item or 20-item PF static forms. Each item is calibrated on the basis of the probability that a given person will respond at a given level, and the ability of the item to discriminate people from one another. Tailored forms may cover any desired level of the domain being measured. Computerized adaptive testing (CAT) selects the best items to sharpen the estimate of a person's functional ability, based on prior responses to earlier questions. PROMIS item banks have been improved with experience from several thousand items, and are calibrated on over 21,000 respondents. In areas tested to date, PROMIS PF instruments are superior or equal to Health Assessment Questionnaire and Medical Outcome Study Short Form-36 Survey legacy instruments in clarity, translatability, patient importance, reliability, and sensitivity to change. Precise measures, such as PROMIS, efficiently incorporate patient self-report of health into research, potentially reducing research cost by lowering sample size requirements. The advent of routine IRT applications has the potential to transform PRO measurement.
Measuring anxiety after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Anxiety item bank and linkage with GAD-7.

PubMed

Kisala, Pamela A; Tulsky, David S; Kalpakjian, Claire Z; Heinemann, Allen W; Pohlig, Ryan T; Carle, Adam; Choi, Seung W

2015-05-01

To develop a calibrated item bank and computer adaptive test to assess anxiety symptoms in individuals with spinal cord injury (SCI), transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a statistical linkage with the Generalized Anxiety Disorder (GAD)-7, a widely used anxiety measure. Grounded-theory based qualitative item development methods; large-scale item calibration field testing; confirmatory factor analysis; graded response model item response theory analyses; statistical linking techniques to transform scores to a PROMIS metric; and linkage with the GAD-7. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Spinal Cord Injury-Quality of Life (SCI-QOL) Anxiety Item Bank Seven hundred sixteen individuals with traumatic SCI completed 38 items assessing anxiety, 17 of which were PROMIS items. After 13 items (including 2 PROMIS items) were removed, factor analyses confirmed unidimensionality. Item response theory analyses were used to estimate slopes and thresholds for the final 25 items (15 from PROMIS). The observed Pearson correlation between the SCI-QOL Anxiety and GAD-7 scores was 0.67. The SCI-QOL Anxiety item bank demonstrates excellent psychometric properties and is available as a computer adaptive test or short form for research and clinical applications. SCI-QOL Anxiety scores have been transformed to the PROMIS metric and we provide a method to link SCI-QOL Anxiety scores with those of the GAD-7.
A signal detection-item response theory model for evaluating neuropsychological measures.

PubMed

Thomas, Michael L; Brown, Gregory G; Gur, Ruben C; Moore, Tyler M; Patt, Virginie M; Risbrough, Victoria B; Baker, Dewleen G

2018-02-05

Models from signal detection theory are commonly used to score neuropsychological test data, especially tests of recognition memory. Here we show that certain item response theory models can be formulated as signal detection theory models, thus linking two complementary but distinct methodologies. We then use the approach to evaluate the validity (construct representation) of commonly used research measures, demonstrate the impact of conditional error on neuropsychological outcomes, and evaluate measurement bias. Signal detection-item response theory (SD-IRT) models were fitted to recognition memory data for words, faces, and objects. The sample consisted of U.S. Infantry Marines and Navy Corpsmen participating in the Marine Resiliency Study. Data comprised item responses to the Penn Face Memory Test (PFMT; N = 1,338), Penn Word Memory Test (PWMT; N = 1,331), and Visual Object Learning Test (VOLT; N = 1,249), and self-report of past head injury with loss of consciousness. SD-IRT models adequately fitted recognition memory item data across all modalities. Error varied systematically with ability estimates, and distributions of residuals from the regression of memory discrimination onto self-report of past head injury were positively skewed towards regions of larger measurement error. Analyses of differential item functioning revealed little evidence of systematic bias by level of education. SD-IRT models benefit from the measurement rigor of item response theory-which permits the modeling of item difficulty and examinee ability-and from signal detection theory-which provides an interpretive framework encompassing the experimentally validated constructs of memory discrimination and response bias. We used this approach to validate the construct representation of commonly used research measures and to demonstrate how nonoptimized item parameters can lead to erroneous conclusions when interpreting neuropsychological test data. Future work might include the development of computerized adaptive tests and integration with mixture and random-effects models.
The Usability of CAT System for Assessing the Depressive Level of Japanese-A Study on Psychometric Properties and Response Behavior.

PubMed

Iwata, Noboru; Kikuchi, Kenichi; Fujihara, Yuya

2016-08-01

An innovative measurement system using a computerized adaptive testing technique based on the item response theory (CAT) has been expanding to measure mental health status. However, little is known about details in its measurement properties based on the empirical data. Moreover, the response time (RT) data, which are not available by a paper-and-pencil measurement but available by a computerized measurement, would be worth investigating for exploring the response behavior. We aimed at constructing the CAT to measure depressive symptomatology in a community population and exploring its measurement properties. Also, we examined the relationships between RTs, individual item responses, and depressive levels. For constructing the CAT system, responses of 2061 workers and university students to 24 depression scale plus four negatively revised positive affect items were subjected to a polytomous IRT analysis. The stopping rule was set for standard error of estimation < 0.30 or the maximum 15 items displayed. The CAT and non-adaptive computer-based test (CBT) were administered to 209 undergraduates, and 168 of them administered again after 1 week. On average, the CAT was converged by 10.4 items. The θ values estimated by CAT and CBT were highly correlated (r = 0.94 and 0.95 for the 1st and 2nd measurements) and with the traditional scoring procedures (r's > 0.90). The test-retest reliability was at a satisfactory level (r = 0.86). RTs to some items significantly correlated with the θ estimates. The mean RT varied by the item contents and wording, i.e., the RT to positive affect items required additional 2 s or longer than the other subscale items. The CAT would be a reliable and practical measurement tool for various purposes including stress check at workplace.
Development and Standardization of the Diagnostic Adaptive Behavior Scale: Application of Item Response Theory to the Assessment of Adaptive Behavior.

PubMed

Tassé, Marc J; Schalock, Robert L; Thissen, David; Balboni, Giulia; Bersani, Henry Hank; Borthwick-Duffy, Sharon A; Spreat, Scott; Widaman, Keith F; Zhang, Dalun; Navas, Patricia

2016-03-01

The Diagnostic Adaptive Behavior Scale (DABS) was developed using item response theory (IRT) methods and was constructed to provide the most precise and valid adaptive behavior information at or near the cutoff point of making a decision regarding a diagnosis of intellectual disability. The DABS initial item pool consisted of 260 items. Using IRT modeling and a nationally representative standardization sample, the item set was reduced to 75 items that provide the most precise adaptive behavior information at the cutoff area determining the presence or not of significant adaptive behavior deficits across conceptual, social, and practical skills. The standardization of the DABS is described and discussed.
Learning by strategies and learning by drill--evidence from an fMRI study.

PubMed

Delazer, M; Ischebeck, A; Domahs, F; Zamarian, L; Koppelstaetter, F; Siedentopf, C M; Kaufmann, L; Benke, T; Felber, S

2005-04-15

The present fMRI study investigates, first, whether learning new arithmetic operations is reflected by changing cerebral activation patterns, and second, whether different learning methods lead to differential modifications of brain activation. In a controlled design, subjects were trained over a week on two new complex arithmetic operations, one operation trained by the application of back-up strategies, i.e., a sequence of arithmetic operations, the other by drill, i.e., by learning the association between the operands and the result. In the following fMRI session, new untrained items, items trained by strategy and items trained by drill, were assessed using an event-related design. Untrained items as compared to trained showed large bilateral parietal activations, with the focus of activation along the right intraparietal sulcus. Further foci of activation were found in both inferior frontal gyri. The reverse contrast, trained vs. untrained, showed a more focused activation pattern with activation in both angular gyri. As suggested by the specific activation patterns, newly acquired expertise was implemented in previously existing networks of arithmetic processing and memory. Comparisons between drill and strategy conditions suggest that successful retrieval was associated with different brain activation patterns reflecting the underlying learning methods. While the drill condition more strongly activated medial parietal regions extending to the left angular gyrus, the strategy condition was associated to the activation of the precuneus which may be accounted for by visual imagery in memory retrieval.
Oscillatory patterns in temporal lobe reveal context reinstatement during memory search.

PubMed

Manning, Jeremy R; Polyn, Sean M; Baltuch, Gordon H; Litt, Brian; Kahana, Michael J

2011-08-02

Psychological theories of memory posit that when people recall a past event, they not only recover the features of the event itself, but also recover information associated with other events that occurred nearby in time. The events surrounding a target event, and the thoughts they evoke, may be considered to represent a context for the target event, helping to distinguish that event from similar events experienced at different times. The ability to reinstate this contextual information during memory search has been considered a hallmark of episodic, or event-based, memory. We sought to determine whether context reinstatement may be observed in electrical signals recorded from the human brain during episodic recall. Analyzing electrocorticographic recordings taken as 69 neurosurgical patients studied and recalled lists of words, we uncovered a neural signature of context reinstatement. Upon recalling a studied item, we found that the recorded patterns of brain activity were not only similar to the patterns observed when the item was studied, but were also similar to the patterns observed during study of neighboring list items, with similarity decreasing reliably with positional distance. The degree to which individual patients displayed this neural signature of context reinstatement was correlated with their tendency to recall neighboring list items successively. These effects were particularly strong in temporal lobe recordings. Our findings show that recalling a past event evokes a neural signature of the temporal context in which the event occurred, thus pointing to a neural basis for episodic memory.
Modelling Mathematics Problem Solving Item Responses Using a Multidimensional IRT Model

ERIC Educational Resources Information Center

Wu, Margaret; Adams, Raymond

2006-01-01

This research examined students' responses to mathematics problem-solving tasks and applied a general multidimensional IRT model at the response category level. In doing so, cognitive processes were identified and modelled through item response modelling to extract more information than would be provided using conventional practices in scoring…
Bayesian Estimation of Multi-Unidimensional Graded Response IRT Models

ERIC Educational Resources Information Center

Kuo, Tzu-Chun

2015-01-01

Item response theory (IRT) has gained an increasing popularity in large-scale educational and psychological testing situations because of its theoretical advantages over classical test theory. Unidimensional graded response models (GRMs) are useful when polytomous response items are designed to measure a unified latent trait. They are limited in…
Neural correlates of economic value and valuation context: an event-related potential study.

PubMed

Tyson-Carr, John; Kokmotou, Katerina; Soto, Vicente; Cook, Stephanie; Fallon, Nicholas; Giesbrecht, Timo; Stancak, Andrej

2018-05-01

The value of environmental cues and internal states is continuously evaluated by the human brain, and it is this subjective value that largely guides decision making. The present study aimed to investigate the initial value attribution process, specifically the spatiotemporal activation patterns associated with values and valuation context, using electroencephalographic event-related potentials (ERPs). Participants completed a stimulus rating task in which everyday household items marketed up to a price of £4 were evaluated with respect to their desirability or material properties. The subjective values of items were evaluated as willingness to pay (WTP) in a Becker-DeGroot-Marschak auction. On the basis of the individual's subjective WTP values, the stimuli were divided into high- and low-value items. Source dipole modeling was applied to estimate the cortical sources underlying ERP components modulated by subjective values (high vs. low WTP) and the evaluation condition (value-relevant vs. value-irrelevant judgments). Low-WTP items and value-relevant judgments both led to a more pronounced N2 visual evoked potential at right frontal scalp electrodes. Source activity in right anterior insula and left orbitofrontal cortex was larger for low vs. high WTP at ∼200 ms. At a similar latency, source activity in right anterior insula and right parahippocampal gyrus was larger for value-relevant vs. value-irrelevant judgments. A stronger response for low- than high-value items in anterior insula and orbitofrontal cortex appears to reflect aversion to low-valued item acquisition, which in an auction experiment would be perceived as a relative loss. This initial low-value bias occurs automatically irrespective of the valuation context. NEW & NOTEWORTHY We demonstrate the spatiotemporal characteristics of the brain valuation process using event-related potentials and willingness to pay as a measure of subjective value. The N2 component resolves values of objects with a bias toward low-value items. The value-related changes of the N2 component are part of an automatic valuation process.
The impact of perceived intensity and frequency of police work occupational stressors on the cortisol awakening response (CAR): Findings from the BCOPS study.

PubMed

Violanti, John M; Fekedulegn, Desta; Andrew, Michael E; Hartley, Tara A; Charles, Luenda E; Miller, Diane B; Burchfiel, Cecil M

2017-01-01

Police officers encounter unpredictable, evolving, and escalating stressful demands in their work. Utilizing the Spielberger Police Stress Survey (60-item instrument for assessing specific conditions or events considered to be stressors in police work), the present study examined the association of the top five highly rated and bottom five least rated work stressors among police officers with their awakening cortisol pattern. Participants were police officers enrolled in the Buffalo Cardio-Metabolic Occupational Police Stress (BCOPS) study (n=338). For each group, the total stress index (product of rating and frequency of the stressor) was calculated. Participants collected saliva by means of Salivettes at four time points: on awakening, 15, 30 and 45min after waking to examine the cortisol awakening response (CAR). Saliva samples were analyzed for free cortisol concentrations. A slope reflecting the awakening pattern of cortisol over time was estimated by fitting a linear regression model relating cortisol in log-scale to time of collection. The slope served as the outcome variable. Analysis of covariance, regression, and repeated measures models were used to determine if there was an association of the stress index with the waking cortisol pattern. There was a significant negative linear association between total stress index of the five highest stressful events and slope of the awakening cortisol regression line (trend p-value=0.0024). As the stress index increased, the pattern of the awakening cortisol regression line tended to flatten. Officers with a zero stress index showed a steep and steady increase in cortisol from baseline (which is often observed) while officers with a moderate or high stress index showed a dampened or flatter response over time. Conversely, the total stress index of the five least rated events was not significantly associated with the awakening cortisol pattern. The study suggests that police events or conditions considered highly stressful by the officers may be associated with disturbances of the typical awakening cortisol pattern. The results are consistent with previous research where chronic exposure to stressors is associated with a diminished awakening cortisol response pattern. Copyright Â© 2016 Elsevier Ltd. All rights reserved.
The impact of perceived intensity and frequency of police work occupational stressors on the cortisol awakening response (CAR): Findings from the BCOPS study

PubMed Central

Violanti, John M.; Fekedulegn, Desta; Andrew, Michael E.; Hartley, Tara A.; Charles, Luenda E.; Miller, Diane B.; Burchfiel, Cecil M.

2016-01-01

Police officers encounter unpredictable, evolving, and escalating stressful demands in their work. Utilizing the Spielberger Police Stress Survey (60-item instrument for assessing specific conditions or events considered to be stressors in police work), the present study examined the association of the top five highly rated and bottom five least rated work stressors among police officers with their awakening cortisol pattern. Participants were police officers enrolled in the Buffalo Cardio-Metabolic Occupational Police Stress (BCOPS) study (n = 338). For each group, the total stress index (product of rating and frequency of the stressor) was calculated. Participants collected saliva by means of Salivettes at four time points: on awakening, 15, 30 and 45 min after waking to examine the cortisol awakening response (CAR). Saliva samples were analyzed for free cortisol concentrations. A slope reflecting the awakening pattern of cortisol over time was estimated by fitting a linear regression model relating cortisol in log-scale to time of collection. The slope served as the outcome variable. Analysis of covariance, regression, and repeated measures models were used to determine if there was an association of the stress index with the waking cortisol pattern. There was a significant negative linear association between total stress index of the five highest stressful events and slope of the awakening cortisol regression line (trend p-value = 0.0024). As the stress index increased, the pattern of the awakening cortisol regression line tended to flatten. Officers with a zero stress index showed a steep and steady increase in cortisol from baseline (which is often observed) while officers with a moderate or high stress index showed a dampened or flatter response over time. Conversely, the total stress index of the five least rated events was not significantly associated with the awakening cortisol pattern. The study suggests that police events or conditions considered highly stressful by the officers may be associated with disturbances of the typical awakening cortisol pattern. The results are consistent with previous research where chronic exposure to stressors is associated with a diminished awakening cortisol response pattern. PMID:27816820
Calibration and Validation of the Dutch-Flemish PROMIS Pain Interference Item Bank in Patients with Chronic Pain

PubMed Central

Crins, Martine H. P.; Roorda, Leo D.; Smits, Niels; de Vet, Henrica C. W.; Westhovens, Rene; Cella, David; Cook, Karon F.; Revicki, Dennis; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Terwee, Caroline B.

2015-01-01

The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT) model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA). Items were calibrated using the graded response model (GRM), an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF) for language (Dutch vs. English) was examined. Reliability was evaluated based on standard errors and Cronbach’s alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire) were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986). Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44). The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF), good reliability (Cronbach’s alpha = 0.98), and good construct validity (Pearson correlations between 0.62 and 0.75). A computer adaptive test (CAT) and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed. PMID:26214178
Calibration and Validation of the Dutch-Flemish PROMIS Pain Interference Item Bank in Patients with Chronic Pain.

PubMed

Crins, Martine H P; Roorda, Leo D; Smits, Niels; de Vet, Henrica C W; Westhovens, Rene; Cella, David; Cook, Karon F; Revicki, Dennis; van Leeuwen, Jaap; Boers, Maarten; Dekker, Joost; Terwee, Caroline B

2015-01-01

The Dutch-Flemish PROMIS Group translated the adult PROMIS Pain Interference item bank into Dutch-Flemish. The aims of the current study were to calibrate the parameters of these items using an item response theory (IRT) model, to evaluate the cross-cultural validity of the Dutch-Flemish translations compared to the original English items, and to evaluate their reliability and construct validity. The 40 items in the bank were completed by 1085 Dutch chronic pain patients. Before calibrating the items, IRT model assumptions were evaluated using confirmatory factor analysis (CFA). Items were calibrated using the graded response model (GRM), an IRT model appropriate for items with more than two response options. To evaluate cross-cultural validity, differential item functioning (DIF) for language (Dutch vs. English) was examined. Reliability was evaluated based on standard errors and Cronbach's alpha. To evaluate construct validity correlations with scores on legacy instruments (e.g., the Disabilities of the Arm, Shoulder and Hand Questionnaire) were calculated. Unidimensionality of the Dutch-Flemish PROMIS Pain Interference item bank was supported by CFA tests of model fit (CFI = 0.986, TLI = 0.986). Furthermore, the data fit the GRM and showed good coverage across the pain interference continuum (threshold-parameters range: -3.04 to 3.44). The Dutch-Flemish PROMIS Pain Interference item bank has good cross-cultural validity (only two out of 40 items showing DIF), good reliability (Cronbach's alpha = 0.98), and good construct validity (Pearson correlations between 0.62 and 0.75). A computer adaptive test (CAT) and Dutch-Flemish PROMIS short forms of the Dutch-Flemish PROMIS Pain Interference item bank can now be developed.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.